The goal of this blog post is to build a simple movie recommendation engine using Apache Mahout.

The code is on my github here: Movie Recommendation Engine.

I first came across Apache Mahout a couple years ago while researching machine learning libraries for a music application I was working on. Mahout’s goal, according to the official Apache web page, is “to build an environment for quickly creating scalable performant machine learning applications.” The latest version 0.10 seems to offer just that and more combined with Hadoop and several new features.

But today, I just want to look at a simple use case of building a movie recommendation API. The Use case:


  • A list of users
  • A list of movies
  • Similarities between movies…movie 1 and movie 2, etc.
  • Users preferences for each movie

For a given user recommend a number of movies.

This is similar to the feature we see all the time on Amazon after buying a book: “You might also be interested in that book”. This falls under the collaborative-filtering algorithms families whereby we have a database of user preferences for certain items like movies, books, etc and similarity between the items. Using that data, we can predict what a user would prefer.

Data Model:

The data model is somewhat simple:

  • users table stores user info
  • items table stores movie details
  • taste_preferences stores user preference for a movie
  • taste_item_similarity table stores similarity between movies

To start, I created an interface for the recommendation engine. While the data can be stored in a MySQL database, it can also live in a CSV file. Also recommendations can be user based or item based. With user based recommendations, the idea is to look for similar users and what items they like. Item based recommendation on the other hand mean given several user preferences for items, find similar items. Therefore, we can have multiple implementations.


MySQL Item Based Recommendation Implementation:

Initializing the recommender:

Recommending movies:

Getting movie details:

Example Response: