I've appx. 2TB of text which i want becoming a searchable database, where I'll usually be searching to ascertain if 2-4 word expressions appear in the database (for example I would perform a search to ascertain if the saying "they are four words", or "three consecutive words" seems any place in the written text).
These searches may happen very frequently so it is crucial which i setup the database for little processing as you possibly can. I'd should also minimize the overhead whenever possible in order to lower the quantity of database servers I'll need.
Does anybody have suggestions regarding the way i should setup this database?
For example I believed to do a linked list which was organized idword1word2 (with all of three creatures secrets) so for that expression "they are four words", I'd first search "they areInch, then I'd search "are four", determine if any matches for "they areInch are 1 id less than "are four", after which perform the same factor for "four words". However I think there needs to be considered a more effective method of doing the work.
EDIT: The Only Real factor I'll be by using this database for does these 2-4 word exact match searches, which is intended for internal use. All I would like this database to have the ability to do is tell me if your 2-4 word expression is available somewhere in most of my files of knowledge, and absolutely nothing more.
Does anybody have suggestions as to the way i should setup this database?
Personally, I'd first eliminate the potential of using MySQL's full-text search, and each Free, full-text internet search engine. There is a list of Open Source search engines on Wikipedia. I'd also eliminate using Google Custom Search. Heck, I'd even think about a commercial product before I'd try moving my very own.
At the minimum, studying their code might provide you with ideas about index structure.
If you are considering creating a linked list in SQL, well, you might like to develop a small test before getting too much in it. I do not think it will likely be practical, however i might be wrong.
It requires lots of try to do full-text search very well. (Consider closeness searches—find "you will find" within 3 words of "many different ways to fail". ) Reinventing this wheel is probably not the very best use of your energy.