Tagencoding

Hash to encode iso-8859-1 (western) to html entities

ENTITIES = {
 'À' => 'À',
 'Á' => 'Á',
 'Â' => 'Â', Continue reading

Database migration from cp1251 -> utf8

It was time to migrate http://www.cenite.com database from cp1251 -> utf8, I have prepared one nice script which will show you step by step the commands that you should run in order to migrate your database.

Copy the content to cp1251toUTF8.sh or use the commands manually

The script is making echo instead of running the commands because this will give you a chance to fix an error if occurs.

#!/bin/bash
echo "Set the params in the script and you will get the commands that you must run in order to get your database converted"

SOURCE_DATABASE=source_database_name
TARGET_DATABASE=new_database_name

USER=router
HOST=mysql.spider.bg

# ----------------------------------- No need to touch bellow
echo -e  "\n\n# Lets export the source database"
echo "mysqldump -u $USER -p -h $HOST --default-character-set=cp1251 --max_allowed_packet=64M $SOURCE_DATABASE > db1.sql"

# recode latin1..utf8 or what ever...
echo -e "\n\n# Lets convert it"
echo "recode -v -f windows-1251..UTF-8 < db1.sql > db2.sql"

echo -e "\n\n# Lets replace some sql creation statements. maybe you will want to do it manually"
echo perl -pi -e "s/cp1251/utf8/g" db2.sql
echo perl -pi -e "s/utf8_bulgarian_ci/utf8_general_ci/g" db2.sql

echo -e "\n\n# Lets create the target database"
echo mysql -u $USER -p -h $HOST mysql -e \"drop database $TARGET_DATABASE\; CREATE DATABASE $TARGET_DATABASE CHARACTER SET utf8 COLLATE utf8_general_ci\";

echo -e "\n\n# Lets do the initial import"
echo "mysql -u $USER -p -h $HOST $TARGET_DATABASE --max_allowed_packet=64M --default-character-set=utf8 < db2.sql"

As a bonus here is a command with which you can convert an all your html pages to utf8 also

find . -name '*.html' -exec recode -v -f windows-1251..UTF-8 \{\} +

This would recursively find all htmls in the current directory.

How to detect character sets

http://linux.die.net/man/1/enca

Migrating Latin1 -> UTF8

We have the mistake to enter all the data in the database (utf8) without setting the right connection encoding (set names utf8). In this case our content is stored as latin1 characters in the utf8 database.

Here is the magic that fixes the encoding found by my bright colleague bl8cki

alter table articles convert to character set latin1;
alter table articles change content content blob;
alter table articles change title title blob;
alter table articles change author author blob;
alter table articles change content content text character set utf8;
alter table articles change title title text character set utf8;
alter table articles change author author text character set utf8;

here is an example

mysql> CREATE TABLE `articles` (`id` int(11) NOT NULL auto_increment,   `title` mediumtext, PRIMARY KEY  (`id`) ) ENGINE=MyISAM AUTO_INCREMENT=10313630 DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> set names latin1;
Query OK, 0 rows affected (0.00 sec)

mysql>  insert into articles (title) values('закъсал');
Query OK, 1 row affected (0.00 sec)

mysql> alter table articles convert to character set latin1;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> alter table articles change title title blob;
Query OK, 1 row affected (0.01 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> alter table articles change title title text character set utf8;
Query OK, 1 row affected (0.00 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from articles;
+----------+----------------+
| id       | title          |
+----------+----------------+
| 10313630 | закъсал        |
+----------+----------------+
1 row in set (0.00 sec)

As the example shows it works with cyrilic (pasted in utf8) !

© 2025 Gudasoft

Theme by Anders NorénUp ↑