Working with MySQL, encoding, and upgrade issues

The following series of articles was prompted by my hosting provider (1and1) sending me a notice that PHP version 4 support will be discontinued in March 2013. They prompted everyone to try toggling the setting for all website scripts to use PHP version 5, and "see if something breaks" (I am paraphrasing).

I have been putting off upgrading a particular MySQL version 4 database and a particular CMS version for a while, and I had a week of vacation. So I said what the heck, and toggled the option for all of website scripts to use PHP version 5 (Global PHP version).

Of course, doing this on a hosting environment with many live websites was a dumb idea. Immediately, I have found out that a website which used a MySQL version 4 database started displaying strange characters.

 

From my previous exploits, I knew that this was an encoding issue. Therefore, I have decided to research ways of both upgrading a MySQL database and converting its encoding to a proper format.

What followed was two weeks of (almost continuous stretch of) research and tweaking. As usual, most of other articles on the Internet were of little help, and they as a rule dealt with simpler cases.

 

THE TASK:

Upgrade and convert a MySQL version 4 latin1 encoding (and / or collation) database into a MySQL version 5 UTF8 encoded and collated database.

 

THE PROBLEM:

1) MySQL, for not so smart reasons defaulted to latin1 encoding for version 4 databases. My database was created a long time ago, for a very old database.

2) MySQL goes to extreme lengths to cover up problems with encoding mismatch. You cannot trust that a table with a specification of charset=utf8 collation=utf8_general_ci actually contains only UTF-8 formatted data. On the contrary, the original database likely contained UTF-8 encoded data in Latin1 collated (means: specified to be...) tables.

3) There are a large number of tools involved in processing data. For example, in order for there to not be any encoding or conversion steps, the browser, the database, the database server, PHP, the Content Management System, and any other tools involved must all be speaking (and set for) UTF-8.

 

Sub-pages:

 

I have prepared a Website Management Commands Master Reference which can aid you in this process.



˅˅˅ Additional valuable information is available at one of the links below: ˅˅˅

 

Did you like the article? Let Google Search know by clicking this button: . Please link to content that you find useful on this website on your own website, forum or blog! You can also comment on this page below, or to ask a question or suggest a topic for me to research. There is a user-editable Wiki available on my website, as well as a Forum that you can contribute to. Site Map.

Page last modified 31-Dec-12 21:53:31 EST
Comments on this page: