I've countless tunes, each song has its own unique Song ID. Akin to each Song ID I've some characteristics like song title, artist title, album title, year etc.

Now, I've implemented a mechanism to discover similarity ratio between two tunes. It provides me with something between - 100.

So, I have to show similar music to customers, which cannot be done on the run time. I have to preprocess the similarity values between every single song.

Hence, basically produce a DB with three characteristics,

song1, song2, similarity

I'll be getting n*n records where n is the amount of tunes.

And whenever I wish to fetch the same music, I have to execute this question:

SELECT song2 WHERE song1 = x AND similarity > 80 ORDER BY similarity DESC;

Please suggest something to keep similarly info.


What you're suggesting works, however, you are able to reduce the amount of rows by storing each pair only one time. Then modifying your query to choose the song id in song1 or song2.

Something similar to:

SELECT if(song1=?,song2,song1) as similar WHERE (song1 = ? or song2 =?) AND similarity > 80 ORDER BY similarity DESC;

It appears needed mass computation energy to keep and access the similarity information. For instance, if you have 2000 tunes processed, but you just have to carry out the similarity evaluate 2000 occasions for the following new song. It might have scalability problem and also the data plan could make the database slow in a small amount of time period.

I suggest that you could find some pattern and tag each song. For instance, you are able to evaluate the tunes for "blues", "rocks", "90's" pattern and provide them tags. If you wish to find similar song according to one song, you can easily query all tags the given tunes have. ex. "ModernInch, "Slow" and "techno"

I believe you would be best evaluating resemblance of a "prototypical" song or classification. Devise a fingerprint mechanism which includes information metadata concerning the song and whatever audio mechanism you utilize to evaluate similarity. Place each song into one (or even more) groups and score the song within that category -- how carefully will it match the prototype for that category while using fingerprint. Note you could have 100s or 1000's of groups, i.e., they are not the normal groups that you simply think about whenever you think about music.

After you have this done, after that you can maintain indexes by category so when finding similar tunes you devise fat loss in line with the category and similarity measures inside the category -- say by providing greater weight towards the category where the song is nearest towards the prototype. Multiply the load through the square from the distinction between the candidate song and also the current song towards the prototype for that category. Sum the weights for that say best three groups with lower values being more similar.

By doing this you only have to store a couple of products of metadata for every song instead of keep relationship between pairs of tunes. When the primary formula runs too gradually, you can keep cached pair-smart data which are more common tunes and default towards the algorithmic comparison whenever a song is not inside your cached data set.