/ Programming

Moving data in Java. 3 ways of solving my data problem. Part 1

Problem

I've been working on this project at work to help facilitate better logging and searching of events. Every time a user does something in Sakai it gets logged to our MariaDB database, over time this can add up to tens of millions of rows (seriously).

Most of the time when we need to use this data is to confirm:

  • A student was in a site.
  • A student took submitted a quiz.
  • A student uploaded a resource.

Along with a host of other options.

Even though the data is indexed, a query trying to look through that amount of data can bog down our system and take minutes to respond. That is definitely not want we want to happen on a production DB.

Solution

My solution for this was to offload the events/sessions/site info to a MongoDB installation we have. The data would still be put into our MariaDB installation since some tools depend on them, but all have the work done for logging/searching/filtering the data would be done off of the MongoDB installation.

I made this choice at the time since we don't have that much of a need for ATOMIC & Relational/Schema databases for this data. It would not matter if we lost a few events for every thousand. Mongo also excels as storing ginormous amounts of little pieces of data. Each record is stored as a document in collections (MySQL Tables) in its own database. There are a bunch of other differences, but that is beyond the scope of this post.

So while I found out the solution, I needed to figure out how to implement this...
Here are 3 different ways of implementing my solution.

Implementation

After figure out the what/why/where for my problem, I needed to next tackle the how. I realized that there were 3 main places I needed to edit the code to copy the data over. Two in Kernel for the Events/Sessions and 1 in our Site-Manage section for the site deletions. The different areas lead me to 3 different ways of moving/inserting the data into the Mongo database.

Directly inserting the data

One way of inserting the data was to directly capture the section of code that inserts the event into the MariaDB database and change it to insert the code into the SakaiDB and then right into the MongoDB.

private void catchEvent(Event ee) {
    postEvent(ee); //Put it into the Sakai Database first.
    if (isMongoEnabled) { //If we have the Mongo logging enabled for this server, than put it in there too.
    
     sme.insertEvent(new SakaiEvent(ee)); //Insert the code into MongoDB
        
    }
}

All of the Exception/error checking is done on the side of the library that I created.

I was going to go through with it this way until I learned about MongoDB's new Bulk inserting in the new Java driver and when I talked with my boss about the possible performance hit on both servers since it is doing an insane amount of very tiny I/O all the time. Also, we did not want the prod code to have to wait on the MongoDB code to continue.

Using a timed write program -

I have no idea what to actually call this one.

The way this one would work would be to read the data from the Sakai DB in steps. For example, I would take items 1-15000 and then write them and then 30 seconds later take items 15000-the most recent. The class that would read from the DB would of course be threaded so as to not lock the system.

This one would not work too well because of the transactional locking it may engage on the DB. While it reads the events from the DB it would need to lock the EVENT & SESSIONS table so that no other events get added to it while it reads from it.

While this would definitely cause a transactional lock on the DB it could also cause all of the other records that are waiting to be written to the DB to be stored in memory. This would be no beuno for a productions system since we need all of the memory in the DB ASAP and can't have huge locks happening on the database.

Using a ArrayBlockingQueue

This is the method that I am going to be implementing.

An ArrayBlockingQueue would be used to hold/serve up events to the MongoDB. As Sakai generates events, they would be added into the Queue until it reaches its limit.

private BlockingQueue<SakaiEvent> sharedQueue = new ArrayBlockingQueue<>(64);

...

if (sharedQueue.offer(event)) {
          //If true, then add it to the queue, if false, dump to DB.
} else {
          //Since the queue is full, lets dump it to the database
          mongoManager.dumpToDB(sharedQueue);
}

Once the ArrayBlockingQueue is full, I would then dump the Queue into the MongoDB using their UnorderedBulkOperation to write it into the database. I'll be creating a method to take each type of Queue(Events & Sessions) to cut down on fluff. There is no need to do an ordered write since I don't care about the order that the data is written.

Database Operations.

Once I write the code for dumping it to the database, I'll be create a link to that post here.

Moving data in Java. 3 ways of solving my data problem. Part 1
Share this

Subscribe to Wills Thoughts