I'm thinking about recommendation engines nowadays and I wish to improve myself in this region. I'm presently reading through "Programming Collective Intelligence" I believe this is actually the best book relating to this subject, from O'Reilly. However I haven't any ideas how you can implement engine What i'm saying by "no clue" is "don't understand how to start". I've got a project like Last.fm i believe.

  1. Where do (ought to be implemented on database side or after sales side) I start creating recommendation engine?
  2. What degree of database understanding is going to be needed?
  3. Can there be any free ones you can use for help or any resource?
  4. What ought to be the steps that I must do?

I have developed one for any video portal myself. The primary concept that I'd involved collecting data about everything:

  • Who submitted a relevant video?
  • Who said on the video?
  • Which tags where produced?
  • Who visited the recording? (also monitoring anonymous site visitors)
  • Who favorited a relevant video?
  • Who ranked a relevant video?
  • Which channels was the recording designated to?
  • Text streams of title, description, tags, channels and surveys are collected with a fulltext indexer which puts weight on each one of the data sources.

Next I produced functions which return lists of (id,weight) tuples for each one of the above points. Some only think about a limited quantity of videos (eg last 50), some customize the weight by eg rating, tag count (more frequently labeled = less significant). You will find functions that return the next lists:

  • Similar videos by fulltext search
  • Videos submitted through the same user
  • Other videos the customers from all of these comments also said on
  • Other videos the customers from all of these faves also favorited
  • Other videos the raters from all of these rankings also ranked on (weighted)
  • Other videos within the same channels
  • Other videos with similar tags (weighted by "expressiveness" of tags)
  • Other videos performed by individuals who performed this video (XY latest plays)
  • Similar videos by comments fulltext
  • Similar videos by title fulltext
  • Similar videos by description fulltext
  • Similar videos by tags fulltext

Each one of these is going to be combined right into a single list just by summing in the weights by video ids, then sorted by weight. This works pretty much for approximately 1000 videos now. But you must do background processing or extreme caching with this to become fast.

I am wishing will be able to reduce this to some generic recommendation engine or similarity calculator soon and release like a rails/activerecord wordpress plugin. Presently will still be a properly integrated a part of my project.

To provide a little hint, in ruby code it appears such as this:

def related_by_tags
  tag_names.find(:all, :include => :videos).inject([]) { |result,t|
    result + t.video_ids.map { |v|
      [v, TAG_WEIGHT / (0.1 + Math.log(t.video_ids.length) / Math.log(2))]

I'd have an interest how others solve such calculations.