Collaborative Filtering Recommender System Use Case Scenario in OrientDB

Recommender System

In this post I am going to introduce you a way of using a graph database like OrientDB for building a recommendation system through a simple use case scenario.

RECOMMENDER SYSTEMS

Every day people take decisions based on suggestion coming from the most varied sources: books, magazines, web sites, travel guides, radio, television, papers, advertisement and so on. Such advices are considered the more trustworthy the better is the level of the awareness of the source by the person.

Recommender Systems (RS) were introduced for identifying the items that may be considered interesting from a specific user in domains that extend from musical tracks to movies, from points of interests to articles to buy online.
Xiao & Benbasat give the following definition in the paper “E-commerce product recommendation agents: Use, characteristics, and impact, MIS Quarterly 31 (2007)”

“RS are software agents that elicit the interests and preferences of individual consumers […] and make recommendations accordingly.
They have potential to support and improve the quality of the decisions that consumers make while searching for and selecting products online.”

A lot of firms are currently implementing a recommender system in their services. Amazon, Spotify, Netflix, Ebay, just to name a few, gained their popularity
thanks to them.

Collaborative Filtering: A Recommender System Technique

Various state of art techniques have been proposed for RS. Here I will introduce the collaborative filtering, the process of looking for information or pattern through methods that involve the collaboration among multiple agents, data sources, viewpoints and so on. The aim is to provide recommendations or to predict the unknown interests of a user using the known preferences of a group of people. The core assumption of this approach is the following one: if the person A has the same opinion of the person B on a determined question, then is more likely that A has the most similar idea of B about a different problem than another unknown individual.

Scenario

Let’s consider this scenario. The user Alessio has recently visited some places in his city and rated each one of his experiences in a 1 (the lowest) to 5 range.
The friends of Alessio did the same. Can a system infer his preferences and recommend him places that he didn’t visit yet and he may consider interesting?
You can solve this question using a collaborative filtering approach based on a graph model.
You can identify two type of nodes, Person and Place, and two kind of relationships, FOAF (Friend Of A Friend) and Visited, as in the Figure. Visited
has an attribute named rating.

example_recsys

We will use OrientDB, “the first Multi-Model Open Source NoSQL DBMS that brings together the power of graphs and the flexibility of documents into one scalable high-performance operational database”.
The version used is 2.0.8. Recommendation engines are general use case scenarios for graph databases. We will interact with the db via OrientStudio tool and query it using the OrientDB Sql Dialect.

Firstly we create the vertex classes Person and Place, then the edges classes FOAF and Visited.

CREATE CLASS Person EXTENDS V
CREATE CLASS Place EXTENDS V
Create class FOAF extends E
Create class Visited extends E

Now we populate the db as the Figure.

graph_orientdb

The syntax is pretty straightforward for users who knows Sql. In particular you can create edges using subqueries or directly rid, if you already know them.

The person vertexes

INSERT INTO Person (id, name) VALUES (1, 'Alessio'), (2, 'Alice'), (3, 'Bob')

The place vertexes

 INSERT INTO Place (id, name) VALUES (4, 'Colosseum'), (5, 'Circo Massimo'), (6, 'Trondheim'), (7, 'Cinque Terre')

The edges

create edge FOAF from (select from Person where name = 'Pippo') to (select from Person where name = 'Alice')
create edge FOAF from #12:0 to #12:2
create edge Visited from (select from person where name = 'Alessio') to (select from Place where name = 'Colosseum') set rating = '5'
create edge Visited from (select from person where name = 'Alessio') to (select from Place where name = 'Circo Massimo') set rating = '4'
create edge Visited from (select from person where name = 'Alice') to (select from Place where name = 'Circo Massimo') set rating = '3'
create edge Visited from (select from person where name = 'Alice') to (select from Place where name = 'Trondheim') set rating = '5'

edge for Bob using rid

create edge Visited from #12:2 to #13:0 set rating = '4'
create edge Visited from #12:2 to #13:1 set rating = '3'
create edge Visited from #12:2 to #13:2 set rating = '5'
create edge Visited from #12:2 to #13:3 set rating = '4'
create edge Visited from #12:2 to #13:3 set rating = '4'

The database has now been populated. It’s time to query it and unleash the power of collaborative filtering! The approach proposed here is the following one:

let’s consider we want to provide recommendations to the user Alessio about places he may like. Among all of his friends, we firstly look to the most similar user to Alessio, that is the one who has mostly visited the same places as Alessio. The system will thereby suggest him places visited by the most similar friend but not already visited by Alessio, sorted by ratings.

How can you translate this algorithm into an OrientDB query? Let’s break it down into smaller parts and go on step by step. For the sake of simplicity, consider we already know the rid of the user Alessio, id est #12:0 (it is easy to retrieve it anyways).

1) Find the most similar user. The similarity is the total number of the incoming edges to the places visited. We will proceeding incrementally:

Select the places visited from Alessio

SELECT expand(out('Visited')) FROM #12:0    query1

Select the person nodes that have a relationship with the places visited by Alessio

SELECT expand(out('Visited').in('Visited')) FROM #12:0
query2

Select the person nodes that have a Visited relationship with the places visited by Alessio and a FOAF relationship with the node Alessio. Count the occurrences as the friend similarity score, grouping by @rid and ordering by the similarity.

SELECT count(@rid) as friend_similarity, @rid as friend_id, name FROM  (SELECT expand(in('Visited')) FROM (SELECT expand(out('Visited')) FROM #12:0))
WHERE @rid <> #12:0 AND @rid in (SELECT out('FOAF') FROM #12:0)
GROUP BY @rid 
ORDER BY friend_similarity DESC
query3

So putting it all together, the most similar user can be retrieved this way

SELECT expand(friend_id) FROM
(SELECT count(@rid) as friend_similarity, @rid as friend_id, name FROM (SELECT expand(in('Visited')) FROM (SELECT expand(out('Visited')) FROM #12:0)) WHERE @rid <> #12:0 AND @rid in (SELECT out('FOAF') from #12:0) GROUP BY @rid ORDER BY friend_similarity DESC LIMIT 1)
query4

2) Recommend the user the places visited by the most similar friend and not by him, sorted by rating. The previous query is bind to the $most_similar_friend variable in the LET block.

SELECT in.name, rating, $most_similar_friend.name FROM
(SELECT expand(inE('Visited')) FROM
(SELECT expand(@rid)  FROM
(SELECT expand(out('Visited')) FROM
(select expand($most_similar_friend)
LET $most_similar_friend = (SELECT expand(friend_id) FROM
(SELECT count(@rid) as friend_similarity, @rid as friend_id, name FROM (SELECT expand(out('Visited').in('Visited')) FROM #12:0) WHERE @rid <> #12:0 AND @rid in (SELECT out('FOAF') from #12:0) GROUP BY @rid ORDER BY friend_similarity DESC LIMIT 1))))
WHERE @rid not in (select out('Visited') from #12:0)))
WHERE out in $most_similar_friend
ORDER BY rating DESC
query_final

We can use query against a bigger dataset: the italian cities visited by 100 friends. The graph is composed by 100 Person nodes, 8092 Place nodes (the ), 98 FOAF edges and 23917Visited edges (avg of 239 places visited by a single person ). Note that it has been used a 0-500 rating to have a better perception of the row ordering.

The top 3 most similar friends are

friend_similarity_big

and recommended places generated by the system

bigger_dataset_result

Conclusion and future works

The aim of this post is to show how to use a graph db as OrientDB to train a recommendation system scenario with just a few lines of query. This collaborative filtering approach can be for sure refined and optimized. For example you can consider the top N similar friends and not only the first one, ordering the places aggregating by the sum of the ranking attribute in the incoming links. In other blog posts I will probably explore other graph databases and techniques.

6 thoughts on “Collaborative Filtering Recommender System Use Case Scenario in OrientDB

  1. Hi, thanks for your nice post. In OrientDB your last query will not work.
    One thing I cant understand, why after filtering places that user didnt visited ( WHERE @rid not in (select out(‘Visited’) from #12:0)) you are going into this:

    (SELECT expand(inE(‘Visited’)) and then filtering it with friend visits? (WHERE out in $most_similar_friend) I just cant understand this step.

    Like

  2. Hello Eugene! Thank you for showing interest to the post! As you point it out with this block you can select the places the user didn’t visit:

    SELECT expand(@rid) FROM
    (SELECT expand(out(‘Visited’)) FROM
    (select expand($most_similar_friend)
    LET $most_similar_friend = (SELECT expand(friend_id) FROM
    (SELECT count(@rid) as friend_similarity, @rid as friend_id, name FROM (SELECT expand(out(‘Visited’).in(‘Visited’)) FROM #12:0) WHERE @rid #12:0 AND @rid in (SELECT out(‘FOAF’) from #12:0) GROUP BY @rid ORDER BY friend_similarity DESC LIMIT 1))))
    WHERE @rid not in (select out(‘Visited’) from #12:0)

    However you don’t want all the places visited by all the friends (and not by the user), but only those visited by the most similar user (and not by the user). So you need to know which person visited each place

    SELECT expand(inE(‘Visited’)) FROM
    (SELECT expand(@rid) FROM…WHERE @rid not in (select out(‘Visited’) from #12:0)

    in particular the places visited by the most similar friend -> WHERE out in $most_similar_friend
    rated by the rating in a desc order -> ORDER BY rating DESC.

    As projection you have in.name (the name of the place, because the place is the node where the edge Visited is incoming ), rating, $most_similar_friend.name (the name of the most similar friend, that you can substitute with out.name because for the node Person the edge Visited is OUTgoing).

    Hope this helped you understanding better the query 🙂 if you have more question don’t hesitate to ask me!

    Like

    1. hello,

      nice article.
      – are there any docs or info available on “content filtering?”
      – have you made any more tutorials regarding orientdb and filtering?

      thanks

      Like

    2. …are those queries above required to get the final result, or is the last query a culmination / optimized version of all the others?

      thanks

      Like

  3. Hello MR. ONOFFON,
    thanks for your interest.
    All the queries you can find in the first point of the article are just final query snippets.
    You can execute them step by step to get and understand better the logic behind the final one you can read in point 2.

    I have written two articles on filtering and orientdb so far (collaborative and community filtering).
    Concerning content based recommendation systems, I have not experimented them with orientdb yet.
    As far as I know, usually they are developed through pure machine learning techniques.
    However, if you model your domain as a graph, you can you think a way of using orientdb to achieve your purpose.
    Best Regards,
    Alessio

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s