Recommendation systems


Internet is flooded with information every day. Easy access to large amount of information along with difficulty in judging the validity of so much content can lead to information overload. As a result, e-commerce applications and social media sites are increasingly challenged to attract new users and retain existing ones. Based on user interests and preferences, these systems recommend items that may be of interest or value to the customers/users.
Recommendation systems help users deal with the information-overload by giving them recommendations of products, etc.They help businesses make more profits, i.e., selling more products.

Recommendation systems are a type of information filtering system that uses the preferences of a group of people to make recommendations to other people. Some of the well-known features of recommendation system are: product recommendation for online shopping, social matching, targeted content/advertising etc.

35% of product sales in Amazon result from recommendations. Recommendations generate 38% greater click through in Google News. Two third of movies rented in Netflix were recommended.

There are many different approaches of Collective Intelligence today which can be broadly classified as: Collaborative filtering, Content based and hybrid approach. 
Collaborative Filtering is that user will be recommended items what his friends or users similar to him have preferred. Two types User base and item based. Content based is recommend items to user based what he himself have preferred in the past. 
Hybrid is a combination of the above two.
Lets focus on Collaborative filtering for now. Steps 1) Collecting preferences, Similarity scoring , Ranking and  Making recommendations..


There are many different approaches of CI today, Collaborative filtering, Content based and hybrid approach. Collaborative Filtering is that user will be recommended items what his friends or users similar to him have preferred. Two types User base and item based. Content based is recommend items to user based what he himself have preferred in the past. Hybrid is a combination of the above two.
Lets focus on Collaborative filtering for now. Steps 1) Collecting preferences, Similarity scoring , Ranking and  Making recommendations..


Collaborative Filtering
Collaborative filtering is a method of making automatic predictions for a user by collecting preference information from large group of people and finding a smaller set with tastes similar to the user.
The workflow of a collaborative filtering system is:

  1. Collecting user preferences/rating of items.
  2. Finding similar users by matching the user’s ratings against other users ratings for an item.
  3. Ranking users who have similar preferences based on the similarity score
  4. Recommend items that the similar users have high preference for but not yet being rated by this user (assuming the item is not familiar to this user since the user did not rate it).

Methodology

Collaborative filtering is of two types: (i) User-based and (ii) Item-based

User-based collaborative filtering

Identify people who share similar rating patterns as the user and use the ratings from these like-minded users to calculate a prediction for the active user. For instance, if a person A and person B like a product, A is more likely to buy a different product that B purchased than to buy a product that a randomly chosen person purchased.
User-based Nearest Neighbor algorithm applies this principle.
The distance between the target user and every other user is calculated and the closest k (k is 5 in this case) users are chosen as ‘neighbors’.

Item-based collaborative filtering

This filtering proceeds in an item-centric manner (users who bought x also bought y). An item-item matrix is built to determine relationships between pairs of items. The prediction for preference for an item is made using this matrix and the user’s data.
Slope One algorithm applies this principle.

Collaborative filtering based on observations of user behavior

The data collected through user actions (clicks, page-view time, purchases, tagging, and page navigation) play a significant role in the prediction process. The predictions made have to be filtered through application logic to handle system response to such actions to boost purchase. A/B testing can be used effectively to learn about people.

Similarity measures

Some of the mathematical formulae used to calculate the similarity between two items:

Cosine (Vector) similarity
Items and their ratings are represented as vectors. The similarity is the angle between these vectors:
Pearson (correlation) similarity
For a given set of items, the similarity is based on the difference in the rating by common users from average ratings for those items:

User based nearest neighbor algorithms

Find the similarity between each user and the active user. Select the ‘neighbors’ to use for recommendations.
The mean vote for user i:      
Ii is the set of items on which user i voted; rij corresponds to the vote of user i for item j
Rating prediction:
wa,u is the similarity between ua and uu.

Item based nearest neighbor algorithms

The prediction is based on the user’s ratings for similar items    W(k, j) is a cosine similarity for the item

Measures the rating by the user i for all m items, averaged by their similarity to the predicted item

Challenges in Collaborative Filtering


  • Calculating a user’s perfect neighborhood is expensive. Techniques like sampling, clustering can be employed for comparison against all other users.
  • User-item matrix could be extremely large and sparse and hence can affect performance of the recommendation.
  • New users need to rate sufficient items before the recommendation engine can predict for the user, since the recommendations are based on the user’s past performance (cold start problem).
  • As the dataset grows, the algorithm could face scalability problems.
  • The prevalence of synonyms is difficult to track and it degrades the performance of the algorithm.
To reduce domain complexity, reduce the dimensionality of recommender system databases to a smaller number of underlying dimensions. Popular Dimensionality reduction algorithms: Singular value decomposition and Principal Component Analysis. Highlights of this approach are: more accurate predictions, better run-time performance and larger numbers of co-rated dimensions.

Enhancing Collaborative filtering with Social relationships

Recommendation from social relationships (as in social networking sites like Facebook, LinkedIn) is more effective compared to traditional approach of collaborative filtering. The user knows the people and can judge their preference better than a random person with similar interests. Social relationships could include friends, friends of friends, and other people of interests.
Studies have shown that friend-relationships provide similar ratings in taste related domains. Cliques show a higher similarity than mere friend-pairs on the average. Hence, cliques and friend pairs are suitable recommendation sources as they share a common taste in the investigated domain. Social recommendations are more transparent than traditional collaborative filtering as the user will know why a certain recommendation was made (based on preferences by user’s social network).


User based Collaborative filtering
Generates recommendation based on similar customers
Represent customer as an N dimensional vector of items
Similarity – cosine of the angle between the vectors
Computationally expensive O(M)
Cluster
Classification problem
Divide customer base into segments
Assign user to segment containing similar users
Computes user’s similarity to vectors that summarize the segment
Better online scalability but poor quality recommendations
Search based
Search query based on user’s purchased items
 - find other popular items with similar keywords/subject
For users with less items –performs well
For users with more items – query too large ; impractical
 - subset of query chosen; reduces quality
Item to Item Collaborative filtering
Matches user’s purchased items to similar items
Builds a similar items table – items that customers tend to purchase together
Vector corresponds to item ; its dimensions correspond to customers
Offline computation – time  intensive – O(NM)
Online computation – subsecond processing time – depends only on items purchased by user

Recommend personalized set of videos based on user’s recent actions in the site.
Ranking of these recommendations is based on relevance, user personalization, video quality and diversity.
Data: (1) content data (video streams and its metadata) and (2) explicit & implicit user activity data.
Explicit activities are actions like subscribe, favorite, like. Implicit activities are data generated as a result of users interacting with videos (e.g. duration of watch).
As a batch process, the recommendation system constructs the mapping of video v to a set of similar/related videos as a graph. The mapping is computed by using the technique known as association rule mining or co-visitation counts. For each pair of videos (vi, vj) the number of times they were co-watched within a given time period is counted.
The relatedness score of video vj to base video vi is given by:
Where cij is the co-visitation count; f(vi, vj) is a normalization function that takes the “global popularity” of both the seed and candidate videos into account and is given by f(vi, vj) = ci · cj. ci and cj are the total occurrence counts across all sessions for videos vi and vj respectively.
The videos can be seen as a directed graph over the set of related videos. For each pair of videos (vi, vj), there is an edge eij from vi to vj iff vj  Ri, with the weight of this edge given by the above equation.
In order to obtain candidate recommendations C for a given seed set S, expand it along the edges of the related videos graph. Recommendation candidate C is denoted as:
Where Ri is the related videos for each video vi. This set of candidate videos are ranked based on video quality, user specificity and diversification.
A/B testing can be used for evaluating the performance of the recommendation system. The site traffic is separated into groups where one group acts as the baseline and the other group is exposed to a new feature. The two groups are then compared against one another. Metrics considered for evaluating recommendation quality and performance: click through rate, session length, time until first long watch.
Metrics considered for evaluating recommendation: click through rate, session length, time until first long watch.

Hybrid approach of CB-CF was shown to perform better than CF alone.
An accurate profile of users' current interests is critical for the success of content-based recommendation systems.
Google News employs hybrid of content-based and collaborative methods on the live traffic in its site. Combining the content-based method and the collaborative method offers the advantages of both methods and shows improved performance. An accurate profile of users' current interests is critical for the success of content-based recommendation systems. These systems can construct profiles automatically from users' interaction with the system.
For each article the system calculates:
(1) content-based recommendation score CR(article) -  based on the topic, users click history (to capture user’s interests) and interest and news trend based on click behavior from the general public and
(2) collaborative filtering score, CF(article) - based on clustering dynamic datasets (MinHash based on the proportional overlap between the set of items they clicked)
Recommendation candidates is given by: Rec(article) = CR(article)×CF(article)
Based on the evaluation of live trial, hybrid approach of CB-CF was shown to perform better than CF alone.
  

Content Matching
Recommends users associated with similar content. Creates a bag-of-words  representation of each user (from user profiles, status messages, tags).
Two users will be considered similar if they share many common words in their associated content.
The weight of the similarity scores increases if only a few users share those words.
Performance is poor. Algorithm can be enhanced by matching content with social link information derived from user’s social network structure.


 
Recommendation from social relationships (as in social networking sites like Facebook, LinkedIn) is more effective compared to traditional approach of collaborative filtering. The user knows the people and can judge their preference better than a random person with similar interests. Social relationships could include friends, friends of friends, and other people of interests.
Studies have shown that friend-relationships provide similar ratings in taste related domains. Cliques show a higher similarity than mere friend-pairs on the average. Hence, cliques and friend pairs are suitable recommendation sources as they share a common taste in the investigated domain. Social recommendations are more transparent than traditional collaborative filtering as the user will know why a certain recommendation was made (based on preferences by user’s social network).


The Cold Start Problem

The recommendation system will not be able to draw inferences for users or items without sufficient information, e.g. new users to a new site, new items without tags. This is known as the Cold Start problem. A hybrid approach of content-based matching and collaborative filtering is adopted to reduce the effect of the problem.
New items are assigned a rating based on the ratings to other similar items according to the items' content-based characteristics.
The new user's profile is updated automatically based on the user activities such as click-through data, searches, browsing history, tagging, friends and communities the user belongs to (implicit feedback).
For items without tags, a hybrid with a content based recommender is employed and content based tag extraction is used. Collaborative filtering is applied to enrich tags. However this approach will not work for recently created users or items.

REFERENCES 
Programming Collective Intelligence – Toby Segaran
J. Davidson, B. Liebald, J. Liu, P. Nandy & T. Vleet (2010). The YouTube Video Recommendation System. RecSys '10 Proceedings of the fourth ACM conference on Recommender system.1-4. Doi: 10.1145/1864708.1864770
J. Liu, P. Dolan & E. Pedersen (2010). Personalized news recommendation based on click behavior. IUI '10: Proceedings of the 15th international conference on Intelligent user interfaces. Doi: 10.1145/1719970.1719976
A. Das, M. Data, A. Garg & S. Rajaram (May 2007). Google News Personalization  Scalable Online Collaborative Filtering. WWW '07: Proceedings of the 16th international conference on World Wide Web. Doi: 10.1145/1242572.1242610
Greg Linden, Brent Smith, and Jeremy York  Amazon.com Industry Report Recommendations Item-to-Item Collaborative Filtering  
Hybrid Recommender Systems: Survey and Experiments - Robin Burke, California State University, Fullerton, Department of Information Systems and Decision Sciences 
The YouTube Video Recommendation System - James Davidson, Benjamin Liebald, Junning Liu, Palash Nandy, Taylor Van Vleet, Google Inc 
Google News Personalization: Scalable Online Collaborative Filtering - Abhinandan Das, Mayur Data, Ashutosh Garg, Shyam Rajaram, Google Inc 
Recommendations in Taste Related Domains: Collaborative Filtering vs. Social Filtering, Georg Groh, Christian Ehmig, TU München, Department of Informatics 
DoYou Know? Recommending People to Invite into Your Social Network - Ido Guy*, Inbal Ronen*, Eric Wilcox** - *IBM Haifa Research Lab, ** IBM Almaden Research Center 
“Make New Friends, but Keep the Old” – Recommending People on Social Networking Sites - Jilin Chen*, Werner Geyer**, Casey Dugan**, Michael Muller**, Ido Guy*** - *University of Minnesota, **IBM T.J Watson Research, ***IBM Haifa Research Lab

Comments

Popular posts from this blog

ZooKeeper as distributed consensus service

What is Apache Druid?

What is Apache Kafka?