Working with MySQL, encoding, and upgrade issues
The following series of articles was prompted by my hosting provider (1and1) sending me a notice that PHP version 4 support will be discontinued in March 2013. They prompted everyone to try toggling the setting for all website scripts to use PHP version 5, and "see if something breaks" (I am paraphrasing).
I have been putting off upgrading a particular MySQL version 4 database and a particular CMS version for a while, and I had a week of vacation. So I said what the heck, and toggled the option for all of website scripts to use PHP version 5 (Global PHP version).
Of course, doing this on a hosting environment with many live websites was a dumb idea. Immediately, I have found out that a website which used a MySQL version 4 database started displaying strange characters.
From my previous exploits, I knew that this was an encoding issue. Therefore, I have decided to research ways of both upgrading a MySQL database and converting its encoding to a proper format.
What followed was two weeks of (almost continuous stretch of) research and tweaking. As usual, most of other articles on the Internet were of little help, and they as a rule dealt with simpler cases.
THE TASK:
Upgrade and convert a MySQL version 4 latin1 encoding (and / or collation) database into a MySQL version 5 UTF8 encoded and collated database.
THE PROBLEM:
1) MySQL, for not so smart reasons defaulted to latin1 encoding for version 4 databases. My database was created a long time ago, for a very old database.
2) MySQL goes to extreme lengths to cover up problems with encoding mismatch. You cannot trust that a table with a specification of charset=utf8 collation=utf8_general_ci actually contains only UTF-8 formatted data. On the contrary, the original database likely contained UTF-8 encoded data in Latin1 collated (means: specified to be...) tables.
3) There are a large number of tools involved in processing data. For example, in order for there to not be any encoding or conversion steps, the browser, the database, the database server, PHP, the Content Management System, and any other tools involved must all be speaking (and set for) UTF-8.
Sub-pages:
- Connection error with PHP 5 and MySQL 4
- Identifying the problem - who this article is for, and isn't for
- Preliminary Reading
- How to check what your database encoding is set to
- What will not work
- Procedure Outline
- Can't connect to local MySQL server through socket
- Create a proper database dump
- Properly viewing UTF-8
- How to convert the database encoding
- Non-latin1 characters in a MySQL version 4 database
- MySQL query Replace Into fails
- INSERT INTO ... ON DUPLICATE KEY UPDATE instead of REPLACE INTO
- Fixing double-encoding by converting into Blob
- Importing the dump file into a MySQL database
- MySQL Error Specified key was too long
- What else is there to do
- Sources and further reading
I have prepared a Website Management Commands Master Reference which can aid you in this process.
˅˅˅ Additional valuable information is available at one of the links below: ˅˅˅
Did you like the article? Let Google Search know by clicking this button:
Page last modified 31-Dec-12 21:53:31 EST
Previous page: Correcting website mistakes en masse
Next page: Connection error with PHP 5 and MySQL 4