# Implementation of Levenshtein distance for mysql/fuzzy search?

I must have the ability to search a table the following for cruz as get exactly what it within 1 variance.

Data:

```
O'Brien

Smithe

Dolan

Smuth

Wong

Smoth

Gunther

Smiht

```

I've investigated using Levenshtein distance does anybody understand how to implement this by using it?

Performs this help? MySQL Levenshtein distance query

EDIT: That old link Levenshtein Distance like a MySQL saved function (Google Cache) is damaged, because of Robert for pointing this in the comment.

To be able to effectively search using levenshtein distance, you'll need a competent, specialized index, like a bk-tree. Regrettably, no database system I understand of, including MySQL, implements bk-tree indexes. This really is further complicated if you are searching for full-text search, rather than only a single term per row. Off-hands, I can not think about in whatever way you could do full-text indexing in a fashion that enables for searching according to levenshtein distance.

An implementation for that damerau-levenshtein distance are available here: Damerau-Levenshtein formula: Levenshtein with transpositions The advance over pure Levenshtein distance would be that the changing of figures is recognized as. I discovered it within the comments of schnaader's link, thanks!

I'm establishing searching according to Levenshtein or Damerau-Levenshtein (most likely the second) for multiple searches over an indexed text, with different paper by by Gonzalo Navarro and Ricardo Baeza-yates: link text

After creating a suffix array (see wikipedia), if you are looking at a string with for the most part k mismatches towards the search string, break the search string into k+1 pieces a minumum of one of individuals should be intact. Discover the substrings by binary search within the suffix array, then apply the length function towards the patch around each matched up piece.

The hyperlinks here were a large help. Because of all.

this can be used function

```
CREATE FUNCTION `levenshtein`( s1 text, s2 text) RETURNS int(11)

DETERMINISTIC

BEGIN

DECLARE s1_len, s2_len, i, j, c, c_temp, cost INT

DECLARE s1_char CHAR

DECLARE cv0, cv1 text

SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c =

IF s1 = s2 THEN

RETURN

ELSEIF s1_len =  THEN

RETURN s2_len

ELSEIF s2_len =  THEN

RETURN s1_len

ELSE

WHILE j  c_temp THEN SET c = c_temp Finish IF

SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1

IF c > c_temp THEN

SET c = c_temp

Finish IF

SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1

Finish WHILE

SET cv1 = cv0, i = i + 1

Finish WHILE

Finish IF

RETURN c

Finish

```

as well as for setting it up as XX% make use of this function

```
CREATE FUNCTION `levenshtein_ratio`( s1 text, s2 text ) RETURNS int(11)

DETERMINISTIC

BEGIN

DECLARE s1_len, s2_len, max_len INT

SET s1_len = LENGTH(s1), s2_len = LENGTH(s2)

IF s1_len > s2_len THEN

SET max_len = s1_len

ELSE

SET max_len = s2_len

Finish IF

RETURN ROUND((1 - LEVENSHTEIN(s1, s2) / max_len) * 100)

Finish

```

An implementation in MySQL, including k-bounded levenshtein distance formula with linear some time and constant space.

http://world wide web.jmcejuela.com/2011/11/k-levenshtein-linear-time/