I've got a high number (about 40 million) of VARCHAR records inside a MySQL table. The size of the string could be between 5-80 figures. I'm attempting to group similar text together and considered a potential approach:

Have a row and calculate a similarity measure (like Edit Distance) with almost every other row and choose (I don't know how you can decide though) whether each goes towards the same group. For example, I've the next records:

The quick brown fox
The qick brwn fox
This is another sentence
Ths is another sntence

I wish to have the ability to convert this right into a form where I assign an organization ID after which obtain the best match (so within this situation, it might be 'The quick brown fox' and 'This is yet another sentence' but assign the audience ID of just one to both 'The quick brown fox' and 'The qick brwn fox' records and a pair of towards the other set).

It is possible to better method for this type of problem? Like maybe utilize indexing schemes or any other database advantages? Also, only a confirmation, I'm not looking for rows that contains similar text, but instead rows that act like one another. Possibly, a great reasoning I'm able to give is the fact that some rows will vary because of typo errors and I wish to correct them.

EDIT 2: Available to different ways not using MySQL that may be reasonably similar to a DB's performance

So if you do research and also the answer given below, this won't be very easy and that i may need to consider fuzzy matching. What are the good processes for this thinking about that my information is now saved inside a database?

EDIT 1: Attempt using MySQL's FULLTEXT

mysql> create table fulltextsim(id INT PRIMARY KEY AUTO_INCREMENT, text TEXT, FULLTEXT(text));
Query OK, 0 rows affected (0.44 sec)

mysql> insert into fulltextsim(text) VALUES("The quick brown fox");
Query OK, 1 row affected (0.02 sec)

mysql> insert into fulltextsim(text) VALUES("The qick brwn fox");
Query OK, 1 row affected (0.00 sec)

mysql> insert into fulltextsim(text) VALUES("This is another sentence");
Query OK, 1 row affected (0.00 sec)

mysql> insert into fulltextsim(text) VALUES("Ths is anther sntence");
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM fulltextsim WHERE MATCH(text) AGAINST ('The qick brwn');
+----+-------------------+
| id | text              |
+----+-------------------+
|  2 | The qick brwn fox |
+----+-------------------+
1 row in set (0.02 sec)

mysql> SELECT * FROM fulltextsim WHERE MATCH(text) AGAINST ('The qick fox');
+----+-------------------+
| id | text              |
+----+-------------------+
|  2 | The qick brwn fox |
+----+-------------------+
1 row in set (0.00 sec)

I needed the 'The quick brown fox' row too.

Perhaps you have checked out MySQL's FULLTEXT functiality?

UPDATE -- MySQL's FULLTEXT does not appear to aid fuzzy searching, that is what you're searching for here. Take a look at http://stackoverflow.com/questions/3047641/mysql-full-text-search-boolean-mode-partial-match

MySQL does offer the SOUNDEX() function, that will match words that seem much like what's joined, but this doesn't work with phrases.

So, I believe you might be at a complete loss.