I am presently searching for a method to search a large database (500MB - 10GB or even more on 10 tables) with many different different fields(nvarchars and bigints). Most of the fields, that needs to be looked aren't within the same table.
A good example: Searching for '5124 Peter' should return all products, that ...
- come with an ID with 5124 inside it,
- have 'Peter' within the title or description
- have item type id with 5124 inside it
- produced with a user named 'peter' or perhaps a user whose id has 5124 inside it
- produced with a user with '5124' or 'peter' in the home address.
How must i perform the search? I just read the full-text search of MS-Sql is much more performant than the usual query using the LIKE keyword and i believe the syntax is much more obvious, but it cant explore bigint(id) values and that i see clearly has performance issues with indexing and for that reason slows lower card inserts towards the DB. During my project you will see more placing than reading through, so this may be an issue.
Thanks ahead of time, Marks
I do not think you are getting the performance you'll need from MS SQL you are gonna need to construct very complex queries to pay for all of the data/tables that you are likely to be searching, and you've got the additional encumbrance of writing data towards the database simultaneously when you are querying it.
I recommend you appear at either Apache Solr (http://lucene.apache.org/solr/) or Lucene (http://lucene.apache.org). Solr is made on the top of Lucene, both may be used to create an inverted file index, essentially such as the index at the back of book (term 1 seems in documents 1, 3, 7, etc.) Solr is really a search-engine-in-a-box, and it has several systems that enables you to tell it where and how to index data. Lucene is much more lower-level, and enables you to setup your indexing and looking out architecture with increased versatility.
The positive thing about Solr is the fact that it's available like a web service, therefore if you are unfamiliar with Java, you'll find a Solr client within the language of your liking, and write indexing and looking out code in whatever language you prefer. Here is a connect to a listing of client libraries for Solr, including some in C# http://wiki.apache.org/solr/IntegratingSolr This is where I'd start.
Full-text search is certainly more performant than like expression. You skill is produce a full-text index on the view rather than a table, and also, since it is simply the index that will get looked that may save table joins later which could quicken things a little. The vista would also permit you to convert the bigint posts to varchar which could then get indexed, say by concatenating all of the posts that should be looked together as you varchar column. To get this done you have to produce a view with SCHEMABINDING and choose a minumum of one column that's unique and make up a clustered unique index onto it.
For the results on full-text on place performance, I've not observed a lot of an effect on bulk place myself however i see from stackoverflow question 3301470, someone point out that performance was slow on sql 2005 but that in sql 2008 that's now fixed. The reason being it now updates the index following the bulk place rather than after every person row place (I am running 2008). If you're running 2005 then to enhance you are able to disable change monitoring only for the majority place and by hand call update index after.