Never been to CodeSnippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world (or not, you can keep them private!)

Convert a db to UTF8 after upgrading to MySQL 4.1 (See related posts)

If you've ever used a UTF8 application on a pre-4.1 MySQL server, or never cared about encodings on a 4.1 setup even, you may have a non-UTF8 database containing UTF8 data. While this doesn't bother most applications (e.g. PHP weblogs), it's not clean and you can't sort properly with any non-Western characters. This procedure will fix it:

mysqldump --user=username --password=password --default-character-set=latin1 --skip-set-charset dbname > dump.sql
chgrep latin1 utf8 dump.sql
mysql --user=username --password=password --execute="DROP DATABASE dbname; CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;"
mysql --user=username --password=password --default-character-set=utf8 dbname < dump.sql


The chgrep part is important, because the table definitions in your dump will likely have "latin1" preserved. If you don't have chgrep, you may use any search-and-replace capable editor, but remember that it must open and save UTF8 properly. Edit: Instead of 'chgrep', you can use 'sed' e.g.:

sed -i "" 's/latin1/utf8/g' dump.sql


Alternatively you may attach "--skip-create-options" to the mysqldump command, but that could omit some needed options (e.g. PACK_KEYS=1 etc.).

You may change the utf8_general_ci collation to whatever you need, e.g. utf8_czech_ci for my purposes.

Edit: Fixed the typo as per the comment below.

Comments on this post

vanweerd posts on Feb 20, 2006 at 14:10
Note the typo --default_character-set in first line should b --default-character-set (as it is in line 4)
xy77 posts on Jul 18, 2006 at 11:49
It worked for me to dump the db using the command
mysqldump --user=username --password=password --default-character-set=latin1 --skip-set-charset dbname > dump.sql

emptying the db:
mysql --user=username --password=password --execute="DROP DATABASE dbname; CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;"

and reimporting the undmodified dump.sql using
mysql --user=username --password=password --default-character-set=utf8 dbname < dump.sql


The sed command didn't work for me, I get an error:
sed: couldn`t read s/latin1/utf8/g : No such file or directory

- David
lokus posts on Jul 20, 2006 at 07:19
I believe the instruction and method here is more straight forward than the dump and import.
http://dev.mysql.com/doc/refman/4.1/en/charset-conversion.html
sirn posts on Aug 08, 2006 at 19:56
for those who can't use the provided sed command, use this instead:

sed -r 's/latin1/utf8/g' dump.sql > dump_utf.sql
UltraKnorrie posts on Aug 05, 2008 at 11:02
A Quick add-on to this useful post.

I was using PhPMyAdmin, sincee I got no direct access to the server (mysqldump was not possible).
Here's my solution: on the old server running MySQL 3.xx use the export function to grab your data & structure (do both separately).
Now, as meentioned earlier, the charset of early MySQL DBs was by default latin1_swedish_ci.
So, on your new server running MySQL 4.xx or 5.xx (and also a newer version of PhPMyAdmin), select "latin1_swedisch_ci" as connection collation in the main screen of PhPMyAdmin.
Next, create you DB and select utf8_unicode_ci as collation.
Next, import your structure using charset = latin1
Next, import your data using charset = latin1

A browse on your tables should show that all accents are correctly stored in your database now.

Best,

You need to create an account or log in to post comments to this site.