How to migrate from drupal to wordpress (and clean up all the nasty leftovers)

So, after three years of Drupal, I finally got rid of it. By the last version of Drupal (it was version 5.1) there was just too much stuff in there I didn’t need and lots of features missing that make up a first class blog management system or BMS (*omg* I so created a new word!!! – oh wait, someone came up with that already, whatever). I wanted to change to Wordpress. Let me show you how I did it…

The database tweakery

Anyways, the migration process to Wordpress is kinda rocky. First I noticed that MySQL was running on latin1 as the character set and latin1_swedish_ci as the collation, which led to broken characters when I re-Imported the dump. Everything should run on UTF8 as a default. So you might want to check if your MySQL installation is okay. Just go

mysql> show variables like "character%";
+————————–+—————————-+
| Variable_name | Value |
+————————–+—————————-+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+————————–+—————————-+
7 rows in set (0.00 sec)

Server, System and Database run on UTF8 now. You need not be worried about the rest of these latin1 charsets, because they are session dependent anyways and it’s the programmer of the php script who needs to set these variables on the db connection. If it says something else in there, chances are your database, apache and php need some tweaking for full UTF8 support.  Note that I use Ubuntu Linux, your config files may be located elsewhere. 

  1. Make a backup dump of the MySQL database
    mysqldump -u username -p databasename >>drupal.sqldump
  2. Add this to the mysqld section of /etc/mysql/my.cnf

    character-set-server=utf8
    default-collation=utf8_unicode_ci

  3. Add to /etc/php4/php.ini

    default_charset = “utf-8″

  4. Add to /etc/apache2/apache2.conf

    AddCharset UTF-8 .utf8
    AddDefaultCharset UTF-8

  5. Restart your webserver and database

    /etc/init.d/mysql restart
    /etc/init.d/apache2 restart

  6. Create a new Database

    CREATE DATABASE wordpress DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

  7. Import the old drupal tables into our new Database

    mysql -u username -p databasename < drupal.sqldump

  8. Download and Install install Wordpress and choose the wp_ prefix for all wordpress tables.
  9. Delete Wordpress defaults

    use wordpress;
    delete from wp_categories;
    delete from wp_posts;    
    delete from wp_post2cat;
    delete from wp_comments

  10. Import Drupal Categories into Wordpress

    INSERT INTO
        wp_categories (cat_ID, cat_name, category_nicename, category_description, category_parent)
    SELECT term_data.tid, name, name, description, parent
    FROM term_data, term_hierarchy
    WHERE term_data.tid=term_hierarchy.tid;

  11. Import Drupal Posts into Wordpress (please note that I used the ‘blog’ node only!)

    INSERT INTO
        wp_posts (id, post_date, post_content, post_title,
        post_excerpt, post_name, post_modified)
    SELECT DISTINCT
        n.nid, FROM_UNIXTIME(created), body, n.title, teaser,
        REPLACE(REPLACE(REPLACE(REPLACE(LOWER(n.title),’ ‘, ‘_’),’.', ‘_’),’,', ‘_’),’+', ‘_’),
        FROM_UNIXTIME(changed)
    FROM node n, node_revisions r
    WHERE n.vid = r.vid
        AND type=’blog’

  12. Import category assignment

    INSERT INTO wp_post2cat (post_id,category_id) SELECT nid,tid FROM term_node ;

  13. Import comments from drupal.

    INSERT INTO
        wp_comments
        (comment_post_ID, comment_date, comment_content, comment_parent)
    SELECT
        nid, FROM_UNIXTIME(timestamp), comment, thread
    FROM comments;

  14. Since the Articles now have no author assigned, I set myself as the default author, you can safely skip this step if you want.

    UPDATE wp_posts set post_author=1;

Now go to your website and check it out. At least the frontpage should work just fine ;)

The URL tweakery

By now all your Articles should be accessible. However since there is no standard for permalinks, chances are that your articles URLs need work too. It’s actually worse than you imagine, because we need to fix not one not two but probably three kinds of url formats. This is where it gets kinda insane:

  • If you used drupal custom links, you probably had urls like http://sitename.tld/articlename.html, but if you are using the wordpress permalink feature your articles have moved to a format of http://sitename.tld/year/month/day/article_titles_with_underscores/ this is sure to be a problem for all the links to your site – not to mention the search engine spiders. Your Pagerank etc can actually decrease from this kind of thing.
  • Also, there is an alternative format of each drupal article that gets exposed however hard you may try. Additionally to the format above there is always a synonymous url that goes something like: http://sitename.tld/node/<article_node_id>
  • Finally there are drupals taxonomy (big headed word for category) urls that go like: http://sitename.tld/taxonomy/page/or/2

Now I now there is no easy way to break it to you, but there could be even more trouble ahead. If you use non standard latin characters, like umlauts or accents in the titles of your articles – wordpress won’t generate valid urls for you by default. Your articles just won’t be accessible from your website.You will probably have to open them in the editor and save them again. Please turn off the visual editor for that, since that thing is known to chew your HTML pretty badly. You can turn it off in your wordpress profile on the upper right hand side in your browser.

Now y ou need to take a look at the url_alias table where all drupal permalink aliases were stored. Just issue a quick

SELECT src, dst FROM url_alias order by src;

And you have a list of all node ids and their mappings. Now, you can whip out your favourite text editor and create INSERT statements for each article (or code a small script that does this for you if you have LOADS of aliases). From each pair of url and source you can create a mod_rewrite rule for apache, which might require you to search for the article to find out its real url. You want to create rules like this one:

RewriteRule     ^/kerry_edwards_big_in_germany.html http://%{SERVER_NAME}/2004/10/30/kerry__edwards_big_in_germany/
RewriteRule     ^/node/168 http://%{SERVER_NAME}/2004/10/30/kerry__edwards_big_in_germany/

It may be necessary for you to check, whether you have articles on your site that have NO url_alias whatsoever, because otherwise they would just be forgotten. Restart Apache after you have inserted these mod_rewrite rules into your .htaccess or context file. Now is the time to monitor your server. Are all your URLs intact? There is a nice WP Plugin called 404 Notifier written by Alex King, that tells you whenever a document is not found via Email or RSS. Keep in mind that Spambots may keep attacking your sites comments, thus causing bogus 404 alerts.

After all that, remember to go out, take a deep breath, yell as loud as you can, go back inside and write flaming emails with petitions for a common Permalink standard in all future Blog software ;)

Thanks go out to Dave Dash of Spindrop, from whose tutorial I nicked, and slightly modified the SQL Statements.

6 Responses to “How to migrate from drupal to wordpress (and clean up all the nasty leftovers)”


  1. 1 Dipesh

    Hi,

    Thanks for writing such a nice article. I would like to know few more things about tomcat deploying web services. How do we deploy embedded tomcat in a java application. I have found the following article

    http://www.onjava.com/pub/a/onjava/2002/04/03/tomcat.html

    But in this article i don’t know how can i specify my wsdd file. If you know about it and can inform here or by sending me an email it would be of great help.

    What I am guessing that it would be same as normal configuration shown above with a servlet serving the request and then reading init parameters.

    Thanks!

    Dipesh

  2. 2 rj

    Nice article. Just a few questions. When you say –

    “And you have a list of all node ids and their mappings. Now, you can whip out your favourite text editor and create INSERT statements for each article (or code a small script that does this for you if you have LOADS of aliases)”

    What specifically do you insert and where? I have 1500+ posts / url alias in my table currently. Do you have a script that you used?

    I have successfully migrated to WPMU with the last step being this url alias part. :( Any hints would be greatly appreciated. Thx!

  3. 3 RJ

    In the previous comment I misspelled my email address…

  1. 1 links for 2007-04-11 at Wired Gecko
  2. 2 erock.org — fully moooooooved… | erock.org | nothing. absolutely nothing.
  3. 3 links for 2008-04-24 | a crank’s progress

Leave a Reply






Bad Behavior has blocked 1248 access attempts in the last 7 days.

FireStats iconPowered by FireStats