# SQL Server: how do you find the most typical string inside a area?

I've got a table with 1,000,000+ records and I must find the most typical sub string that's a minimum of 5 figures lengthy.

Basically possess the following records:

KDHFOUDHGOENWFIJ 1114H4363SDFHDHGFDG
GSDLGJSLJSKJDFSG 1114H20SDGDSSFHGSLD
SLSJDHLJKSSDJFKD 1114HJSDHFJKSDKFSGG

I must write in SQL an argument that chooses 1114H because the most commmon sub string. How do i do that?

Notes:

• The substring doesn't have to become in the same position.
• The subtrings should be length 5
• The utmost period of each record is 50 figures

You will find no requirement to obtain the longest substring so every substring with length more than 5 will invariably possess a substring of 5 figures that's a tie for count. Therefore we just check substrings of length 5.

Within the sample data you will find three strings that occur three occasions. _1114H, _1114 and 1114H (_ would be to show the place of the space ).

Within this solution master..spt_values can be used instead of a amounts table.

declare @T table
(
ID int identity,
Data varchar(50)
)

insert into @T values
('KDHFOUDHGOENWFIJ 1114H4363SDFHDHGFDG'),
('GSDLGJSLJSKJDFSG 1114H20SDGDSSFHGSLD'),
('SLSJDHLJKSSDJFKD 1114HJSDHFJKSDKFSGG')

select top 1 substring(T.Data, N.Number, 5) as Word
from @T as T
cross apply (select N.Number
from master..spt_values as N
where N.type = 'P' and
N.number between 1 and len(T.Data)-4) as N
group by substring(T.Data, N.Number, 5)
order by count(distinct id) desc

Result:

Word
------
1114

This does not answer your question entirely, but here's articles from the book about advanced search techniques where it mentions a person-defined function "LCS" (longest common substring) that could be useful: