Goodbye MySQL
I was optimistic that I could make http://www.cenite.com, a price monitoring website to use the fulltext search of mysql. Unfortuntly I have found so many drawbacks that I have to leave this idea. The main source for information for me was:
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
http://devzone.zend.com/node/view/id/1304#Heading14
At first the speed was wonderfull. I was searching in 300k> records apx. 350mb. But then I have to surrender. I cant configure mysql to work as I want. I know that If I spend two days to become expert on C/C++ with Unicode I will success but this is not the case. I want working solution.
The resons that make me not to use mysql for searching:
There is no way to change the default operator by default it is OR. You must parse the user query and rewrite it.
I want automaticaly truncation on all my terms.
There is no way to tell MySQL what are characters, and what are not…..sorry, there are two ways:
1. Touching the sources,
2. Configuring in xmls
No documentation on both. Maybe there is …somewhere.
If you use the default configuration then you will wonder how you get or not get the required results.
Here is a summary of the comands that I use to tweak my mysql server:
SHOW VARIABLES LIKE ‘ft%’
SET @global.ft_min_word_len=2;
SET @local.ft_min_word_len=2;
But it is better to have those options in my.conf
[mysqld]
ft_min_word_len=3
ft_stopword_file=”C:\MySQL\stop.txt”
[myisamchk]
ft_min_word_len=3
ft_stopword_file=”C:\MySQL\stop.txt”
To check what characterset is using your mysql:
SHOW VARIABLES LIKE ‘character_sets_dir’;
/usr/share/mysql/charsets | E:MySQL Server 5.1sharecharsets
SHOW VARIABLES LIKE ‘characte%’
After that you need to rebuild your index with one of those commands
slow: REPAIR TABLE products QUICK;
slow: myisamchk –recover –ft_min_word_len=3 tbl_name.MYI
fastest: DROP INDEX …; CREATE INDEX….;
Alternatives
http://www.sphinxsearch.com/
http://endeca.com/
http://lucene.apache.org/solr/