I am creating a high-volume web application, where some of it is really a MySQL database of debate posts that will have to grow to 20M+ rows, easily.

I had been initially thinking about using MyISAM for that tables (for that built-in fulltext search abilities), but the idea of the entire table being locked because of just one write operation makes me shutter. Row-level locks make a lot more sense (as well as InnoDB's other speed advantages when confronted with huge tables). So, because of this, I am pretty going to use InnoDB.

The issue is... InnoDB does not have built-in fulltext search abilities.

Must I opt for another-party search system? Like Lucene(c++) / Sphinx? Inflict individuals database ninjas have suggestions/guidance? LinkedIn's zoie (based off Lucene) looks prefer option right now... getting been built around realtime abilities (that is pretty crucial for my application.) I am a little reluctant to commit yet without some insight...

(FYI: likely to be on EC2 rich in-memory rigs, using PHP for everyone the frontend)

I'm able to attest to MyISAM fulltext as being a bad option - even departing aside the different issues with MyISAM tables generally, I have seen the fulltext stuff set off the rails and begin corrupting itself and crashes MySQL regularly.

A devoted internet search engine is certainly the most flexible option here - keep publish data in MySQL/innodb, after which export the written text for your internet search engine. You are able to setup a periodic full index build/publish pretty easily, and add real-time index updates if you think the requirement and wish to take the time.

Lucene and Sphinx are great options, out of the box Xapian, that is nice lightweight. Should you go the Lucene route don't think that Clucene will better, even when you'd prefer to not wrestle with Java, although I am not necessarily capable of discuss the benefits and drawbacks of either.

You need to spend an hour or so and undergo installation and test-drive of Sphinx and Lucene. Find out if either meets your requirements, regarding data updates.

One thing that disappointed me about Sphinx is it does not support incremental card inserts perfectly. That's, it is extremely costly to reindex after an place, so costly their suggested option would be to separate your computer data into older, constant rows and more recent, volatile rows. So every search your application does would need to search two times: once around the bigger index for old rows as well as on the more compact index for recent rows. In the event that does not integrate together with your usage designs, this Sphinx isn't a good solution (a minimum of not in the current implementation).

Let me explain another possible solution you could look at: Google Custom Search. If you're able to apply some Search engine optimization for your web application, then delegate the indexing and check function to Google, and embed a Search textfield to your site. It may be probably the most economical and scalable method to build your site searchable.

Possibly you should not dismiss MySQL's Foot so rapidly. Craig's list accustomed to utilize it.

MySQL’s speed and Full Text Search has allowed craig's list for everyone their customers .. craig's list uses MySQL for everyone roughly 50 million searches monthly for a price as high as 60 searches per second."


As said below, Craig's list appears to possess switched to Sphinx a while at the begining of 2009.

Sphinx, while you explain, is very nice with this stuff. Everything is incorporated in the configuration file. Make certain whatever your table is by using the strings has some unique integer id key, and you ought to be fine.

do this

ROUND((LENGTH(text) - LENGTH(REPLACE(text, 'serchtext', ''))) / LENGTH('serchtext'),)!=