Tuesday, November 25, 2014

Recommendation Engine using Python

What is Recommendation System


Well if you are reading this you anyway know what is recommendation but basically its giving a suggestion to the user based on certain Parameters like in Amazon , on buying books it gives you suggestion , in IMDB on checking any movie, it gives you recommended movies etc.



Now lets think how is that possible?

Answer - its Science or I must say Mathematics.

Let's play with some normal logic here-

Lets consider Movie 'The Godfather'

Now if somebody likes the same, can we recommend the same user any superhit comedy movie ... aah Naah... Not possible , so naturally we would like to recommend next best movie under Genre Crime or Drama or Thriller.
So thats a recommendation.

Hold on!... As a human I can search and find but hows is that possible using computing languages.

So now we got a point and I'd like to push you all in past , schooling days.
'Euclidean Distance or Manhattan Distance' ..

Are those words seems to be familier to you .. well at first sight it may be like "What d hell this guy is talking about" or what ... why should I know mathematics here.

Because its mathematics only :-)

The distance between two points in a grid based on a strictly horizontal and/or vertical path (that is, along the grid lines), as opposed to the diagonal or "as the crow flies" distance. The Manhattan distance is the simple sum of the horizontal and vertical components, whereas the diagonal distance might be computed by applying the Pythagorean theorem.

For more details , Google and know the actual concept behind them. That's important.


What Exactly I did with R and Python

I used R for do quick analytics on my data and used python to write the main algorithm behind it.
I must say everything I did in Python, I could have been done in R as well but I used python because I am little more comfortable in programming language rather than scientific one ie R. My personal opinion and please don't believe me :)


Concept behind the entire approach

If I have a collection of movies , then I can extract details about each of the movies like user ratings , reviews , Genre and all other things easily and if that is possible then I can use any distance algorithm like euclidean to find the similarity between any 2 movies.

Now I will pick the movies in each of my friends' list from facebook and then will compare the same with the list of unwatched movies and sort them based on distance , so now I have a list of movies recommendation start with the best till the end.


Is it actually so easy ????


Explanation based on Programming

First I connected to Facebook using Oauth and REST so that I could get access to my friends and then the movies liked by my friends.


 graph = facebook.GraphAPI(access_token)  
 profile = graph.get_object("me")  
 friends = graph.get_connections("me", 'friends')['data']  

Then I used algorithm to get movies like by my friend



 allLikes = graph.get_connections(friend['id'], "movies")['data']  


Now next job is to get the details about which of the movies like rating and all other things and finally I got following contents-




MOVIE Year Released Genre Director Poster imdbRating imdbVotes imdbID tomatoRating tomatoUserReviews BoxOffice
2 Jilla 2014 10/01/14 Action, Drama, Thriller R.T. Neason http://ia.media-imdb.com/images/M/MV5BOTUxNzExOTA0NF5BMl5BanBnXkFtZTgwMTUzNTAxMjE@._V1_SX300.jpg 6.4 5108 tt2678948 N/A 152 N/A
3 Pursuit of Happyness 2005 16/07/05 Documentary Patrick McGuinn http://ia.media-imdb.com/images/M/MV5BMTk4NjQ2NzI5Nl5BMl5BanBnXkFtZTcwOTIzNTM0MQ@@._V1_SX300.jpg 6.8 35 tt0375174 N/A 125 N/A
4 The Karate Kid 1984 22/06/84 Action, Drama, Family John G. Avildsen http://ia.media-imdb.com/images/M/MV5BMTkyNjE3MjM2MV5BMl5BanBnXkFtZTYwMzY5ODk4._V1_SX300.jpg 7.2 99763 tt0087538 6.9 314496 N/A
5 Yes Man 2008 19/12/08 Comedy, Romance Peyton Reed http://ia.media-imdb.com/images/M/MV5BNjYyOTkyMzg2OV5BMl5BanBnXkFtZTcwODAxNjk3MQ@@._V1_SX300.jpg 6.9 231169 tt1068680 5.3 316060 $97.6M
6 The Butterfly Effect 2004 23/01/04 Sci-Fi, Thriller Eric Bress, J. Mackye Gruber http://ia.media-imdb.com/images/M/MV5BMTI1ODkxNzg2N15BMl5BanBnXkFtZTYwMzQ2MTg2._V1_SX300.jpg 7.7 291519 tt0289879 4.8 621210 $57.7M
7 Unknown 2011 18/02/11 Action, Mystery, Thriller Jaume Collet-Serra http://ia.media-imdb.com/images/M/MV5BODA4NTk3MTQwN15BMl5BanBnXkFtZTcwNjUwMTMxNA@@._V1_SX300.jpg 6.9 174047 tt1401152 5.8 74879 $63.7M
8 A Year Ago in Winter 2008 06/01/10 Drama Caroline Link http://ia.media-imdb.com/images/M/MV5BMTQ4MTUzNTIwM15BMl5BanBnXkFtZTcwMTEzMjA0Mg@@._V1_SX300.jpg 7.2 1038 tt0452580 N/A 253 N/A
10 James Bond 007 1983 N/A Adventure, Animation, Action N/A N/A 7.2 28 tt0297197 N/A N/A N/A
14 Department 2012 18/05/12 Action Ram Gopal Varma N/A 3.2 690 tt2186731 N/A 862 N/A
15 Ajab Prem Ki Ghazab Kahani 2009 06/11/09 Comedy, Romance Rajkumar Santoshi http://ia.media-imdb.com/images/M/MV5BMjA0NjAwNzYxOV5BMl5BanBnXkFtZTcwNzA4NTk5Mw@@._V1_SX300.jpg 6.2 5594 tt1252596 N/A 2436 N/A



Entire code to get the data can be found facebook_movies_dataset

Since now I have data, so why not run some quick analytic on them and see have I really did any good job -




So from the graph above , we can easily see that 3 idiots is the most viewed movie among my FB movies list and Gaurav Shr.. has liked the most number of movies , so naturally he doesn't have any work :)


Source Movies_Analytics


So Now I have dataset which give me some values and and now I can work on my recommendation engine... if you have taken some breath .. lets move on to next important thing ... what to code and how to shape the data...

'This is my first-ever program in Python, so can't claim it as a very great code'

Part2





No comments: