I’ve meant to write about this some time ago, but forgot. While using Mahout to build a recommendation engine with a datasource like MySQL or PostgreSQL, one of the warnings one usually encounters is this:

“You are not using ConnectionPoolDataSource. Make sure your DataSource pools connections to the database itself, or database performance will be severely reduced.”

This is a weird warning because even if you’re using a connection pool, the warning will not go away.

Last week, I got very annoyed and decided to fix it. I dug through the source code on github and found this code in AbstractJDBCInMemoryItemSimilarity.java:

In order to create an item similarity based recommender with data stored in a MySQL database, we would do something like this:

MySQLJDBCInMemoryItemSimilarity extends SQL92JDBCInMemoryItemSimilarity which in turn extends the abstract class AbstractJDBCInMemoryItemSimilarity where the data source type is checked to make sure it is a ConnectionPoolDataSource type as above.

The fix is rather simple, but takes some time to dig through the code and figure it out. Here I’m using Spring to autowire my dataSource object, creating the tables and populating them with some initial data SpringDataConfig.java:

This requires commons-dbcp dependency like this:

The difference is instead of returning a dataSource that is an instance of DriverManagerDataSource, we return a new ConnectionPoolDataSource that takes in our normal data source. Mahout then takes care of the connection pooling as follow:

That’s it.