Jay Paulynice

My scratchpad where I write about coding, startups, music, weight lifting, fitness, and life in general.

Category: Java

Building an OAuth2 Server and Resource

This is part of an authentication service I’ve been working on for a small app. The project is open source on my github here: oauth2-provider-resource. I decided to open source it in the hope that it may help someone trying to build a similar service.

Fixing Apache Mahout’s Connection Pooling Datasource Warning

I’ve meant to write about this some time ago, but forgot. While using Mahout to build a recommendation engine with a datasource like MySQL or PostgreSQL, one of the warnings one usually encounters is this:

“You are not using ConnectionPoolDataSource. Make sure your DataSource pools connections to the database itself, or database performance will be severely reduced.”

This is a weird warning because even if you’re using a connection pool, the warning will not go away.

Last week, I got very annoyed and decided to fix it. I dug through the source code on github and found this code in AbstractJDBCInMemoryItemSimilarity.java:

In order to create an item similarity based recommender with data stored in a MySQL database, we would do something like this:

MySQLJDBCInMemoryItemSimilarity extends SQL92JDBCInMemoryItemSimilarity which in turn extends the abstract class AbstractJDBCInMemoryItemSimilarity where the data source type is checked to make sure it is a ConnectionPoolDataSource type as above.

The fix is rather simple, but takes some time to dig through the code and figure it out. Here I’m using Spring to autowire my dataSource object, creating the tables and populating them with some initial data SpringDataConfig.java:

This requires commons-dbcp dependency like this:

The difference is instead of returning a dataSource that is an instance of DriverManagerDataSource, we return a new ConnectionPoolDataSource that takes in our normal data source. Mahout then takes care of the connection pooling as follow:

That’s it.

Detecting malicious web attacks with my simple server

Within the last week, I’ve noticed that my blog was frequently down with this error:

My first thought was that I must be getting a lot of visits, but eventually everything would go back to normal. So I restarted the mysql server and apache then all was fine. A couple hours later, I tried accessing my blog again and realized it was down again for the upteenth time.

With the small web server I’ve been working on, I decided to see what’s going on and where the traffic is coming from. I stopped my apache server and installed git, java and gradle on my ubuntu instance.

Once I got the code on the ubuntu instance, I started up my simple web server on port 80 and realized something weird…it seems that someone is running a distributed denial of service attack against my wordpress blog. Instead of using several machines, they’re running on the same IP but different port.

Request signature:

Payload:

The attacker is using IP 185.130.5.209 and trying to brute force a post request with some xml file to my wordpress xmlrpc.php page. Notice also they’re trying to access my blog IP directly. It could be some targeted attack against Digital Ocean where my ubuntu instance lives or the attacker just has a list of random IP’s they’re trying to attack. The content length also varies but is always in the 250-300 bytes…283 bytes in the case above.

Knowing this info, I modified my code to match the signature and silently drop the request:

Added a request request filter and defined these values:

The filter method:

Check if the signature match:

Then modified the code that handles the request to simply log the IP

The resulting logs from my simple web server showing the logged request and that the response is dropped:

I will add some more details later, but that’s all for now.

Creating a simple web server in Java

This post is part of some ideas I have been working on. Initially, I was just curious about how to create a web server in Java…a simple server that accepts an http request and returns a response. As expected, the Java API is extremely high level…hiding all the nitty gritty details of network programming (TCP, IP, UDP etc…). Fortunately, I found this book: Unix Network Programming which is a definite reference on sockets/networking. While it’s written in C, I found the examples quite easy to read and understand.

As usual, the code is on my repository: simple-java-web-server

Main thread:

Initializing the server:

Implementing the run method:
As long as the server is running, continuously check to see if a client wants to do something: i.e: accept, connect, read or write.

Accepting client connections and registering the client for reading:

Reading from and writing to the client:
This is a rather simple server and reading and writing is done on the same thread. Here we read all the data from the client and based on the request, write back to the client with a response.

Configuring our application with spring annotations:

Finally the Main class responsible for running the server:

Jacoco coverage for multi-project Gradle setup

This post is about setting up a Gradle multi-project build with Jacoco to get an aggregate test code coverage report. This is something I struggled with at first, but with some free time in my hands yesterday, I set out to make it work finally.

The example is part of a Seed Project I have been working on for REST API using Jersey2/Spring.

See build.gradle for full example. The end results:

Coveralls integration:

Simple Html reports:

Building A Simple Movie Recommendation Engine

The goal of this blog post is to build a simple movie recommendation engine using Apache Mahout.

The code is on my github here: Movie Recommendation Engine.

I first came across Apache Mahout a couple years ago while researching machine learning libraries for a music application I was working on. Mahout’s goal, according to the official Apache web page, is “to build an environment for quickly creating scalable performant machine learning applications.” The latest version 0.10 seems to offer just that and more combined with Hadoop and several new features.

But today, I just want to look at a simple use case of building a movie recommendation API. The Use case:

Given:

  • A list of users
  • A list of movies
  • Similarities between movies…movie 1 and movie 2, etc.
  • Users preferences for each movie

For a given user recommend a number of movies.

This is similar to the feature we see all the time on Amazon after buying a book: “You might also be interested in that book”. This falls under the collaborative-filtering algorithms families whereby we have a database of user preferences for certain items like movies, books, etc and similarity between the items. Using that data, we can predict what a user would prefer.

Data Model:

The data model is somewhat simple:

  • users table stores user info
  • items table stores movie details
  • taste_preferences stores user preference for a movie
  • taste_item_similarity table stores similarity between movies

To start, I created an interface for the recommendation engine. While the data can be stored in a MySQL database, it can also live in a CSV file. Also recommendations can be user based or item based. With user based recommendations, the idea is to look for similar users and what items they like. Item based recommendation on the other hand mean given several user preferences for items, find similar items. Therefore, we can have multiple implementations.

Interface:

MySQL Item Based Recommendation Implementation:

Initializing the recommender:

Recommending movies:

Getting movie details:

Example Response:

© 2019 Jay Paulynice