If I have to retrieve a sizable string from the DB, Could it be faster to look for this while using string itself or would I gain by hashing the string and storing the hash within the DB too after which search according to that?

If so what hash formula must i use (security isn't an problem, I'm searching for performance)

Whether it matters: I'm using C# and MSSQL2005

Generally: most likely not, presuming the column is indexed. Database servers are made to do such searches rapidly and effectively. Some databases (e.g. Oracle) provide choices to build indexes according to hashing.

However, ultimately this is often only clarified by performance testing with representative (of the needs) data and usage designs.

Though I have never tried it, it may sound such as this works in principle. There is a chance you might get false positives but that is most likely quite slim.

I'd opt for a quick formula for example MD5 as you won't want to spend longer hashing the string of computer might have taken you to definitely just look for this.

The ultimate factor I'm able to have to say is that you will can just learn if it's better by trying it and measure the performance.

I'd be amazed if the offered huge improvement and I would suggest not making use of your own performance optimisations for any DB search.

If you are using a database index there's scope for performance to become updated with a DBA using attempted and reliable techniques. Hard coding your personal index optimisation may prevent this and could prevent you attaining for just about any performance enhancements in indexing later on versions from the DB.

First - MEASURE it. That's the only method to tell without a doubt.
Second - Without having an problem using the speed from the string searching, then make it simple and do not make use of a Hash.

However, for the actual question (and merely since it is a fascinating thought). It is dependent how similar the strings are. Keep in mind that the DB engine does not have to compare all of the figures inside a string, only enough to locate a difference. If you're searching through ten million strings that start with similar 300 figures then your hash will likely be faster. If you are searching for the only real string that begins by having an x, i quickly the string comparison might be faster. I believe though that SQL will still need to obtain the entire string from disc, even when after that it just uses the very first byte (or first couple of bytes for multi byte figures), therefore the total string length will still have an effect.

If you're using the hash comparison then you definitely should result in the hash an indexed calculated column. It won't be faster if you're exercising the hashes for the strings every time you operate a query!

You might think about using SQL's CRC function. It creates an int, which is even faster to comapre and it is faster to calculate. But you'll have to make sure the outcomes of the query by really testing the string values since the CRC function isn't created for this kind of usage and is a lot more likly to come back duplicate values. You will have to perform the CRC or Hash sign in one query, then come with an outer query that compares the strings. Additionally, you will wish to watch the QEP produced to make certain the optimiser is processing the query within the order you intended. It could choose to perform the string evaluations first, then your CRC or Hash inspections second.

As another person has stated, this really is only worthwhile if you're doing a precise match. A hash can't help if you're attempting to inflict kind of range or partial match.

If you are using a set length area as well as an index it'll most likely be faster...

In case your strings are short (under 100 charaters generally), strings is going to be faster.

When the strings are large, HASH search may and many most likely is going to be faster.

HashBytes(MD4) appears to become the quickest on DML.

Are you currently doing an equality match, or perhaps a containment match? To have an equality match, you need to allow the db handle this (but give a non-clustered index) and merely test via WHERE table.Foo = @foo. For any containment match, you need to possibly take a look at full text index.