Never been to TextSnippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world (or not, you can keep them private!)

Convert a db to UTF8 after upgrading to MySQL 4.1 (See related posts)

If you've ever used a UTF8 application on a pre-4.1 MySQL server, or never cared about encodings on a 4.1 setup even, you may have a non-UTF8 database containing UTF8 data. While this doesn't bother most applications (e.g. PHP weblogs), it's not clean and you can't sort properly with any non-Western characters. This procedure will fix it:

mysqldump --user=username --password=password --default-character-set=latin1 --skip-set-charset dbname > dump.sql
chgrep latin1 utf8 dump.sql
mysql --user=username --password=password --execute="DROP DATABASE dbname; CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;"
mysql --user=username --password=password --default-character-set=utf8 dbname < dump.sql


The chgrep part is important, because the table definitions in your dump will likely have "latin1" preserved. If you don't have chgrep, you may use any search-and-replace capable editor, but remember that it must open and save UTF8 properly. Edit: Instead of 'chgrep', you can use 'sed' e.g.:

sed -i "" 's/latin1/utf8/g' dump.sql


Alternatively you may attach "--skip-create-options" to the mysqldump command, but that could omit some needed options (e.g. PACK_KEYS=1 etc.).

You may change the utf8_general_ci collation to whatever you need, e.g. utf8_czech_ci for my purposes.

Edit: Fixed the typo as per the comment below.

Comments on this post

vanweerd posts on Feb 20, 2006 at 14:10
Note the typo --default_character-set in first line should b --default-character-set (as it is in line 4)
xy77 posts on Jul 18, 2006 at 11:49
It worked for me to dump the db using the command
mysqldump --user=username --password=password --default-character-set=latin1 --skip-set-charset dbname > dump.sql

emptying the db:
mysql --user=username --password=password --execute="DROP DATABASE dbname; CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;"

and reimporting the undmodified dump.sql using
mysql --user=username --password=password --default-character-set=utf8 dbname < dump.sql


The sed command didn't work for me, I get an error:
sed: couldn`t read s/latin1/utf8/g : No such file or directory

- David
lokus posts on Jul 20, 2006 at 07:19
I believe the instruction and method here is more straight forward than the dump and import.
http://dev.mysql.com/doc/refman/4.1/en/charset-conversion.html
sirn posts on Aug 08, 2006 at 19:56
for those who can't use the provided sed command, use this instead:

sed -r 's/latin1/utf8/g' dump.sql > dump_utf.sql

You need to create an account or log in to post comments to this site.


Related Posts