Wednesday, September 26, 2012

Implementing a Task Queue Service



Task Queue Service

Recently I was digging myself about designing Task Queue service for PaaS platform. I realized how it could be an important component of our platform as well. Though Task queue has been used long time for large scale web implementation but using as a service in PaaS platform is quite challenging and very few players are in this market. After long research, I got few like google App Engine and heroku ,ironWorker who have implemented this kind of service. If you know others, Please let me know.


The main idea behind task queue is to avoid doing a resource-intensive task immediately and having to wait for it to complete. Instead we schedule the task to be done later. We encapsulate a task as a message and send it to the queue. A worker process running in the background will pop the tasks and eventually execute the job. When you run many workers the tasks will be shared between them.This concept is especially useful in web applications where it's impossible to handle a complex task during a short HTTP request window.

Task queues have all sorts of uses for off line processing, including periodically pulling data from third party sources, computing aggregate statistics,making decision based on analysis etc. Basic advantages of using Task Queue is the ability to easily parallel work.Most of the message queue supports this feature. It means we don't need to balance the load of worker externally. It would be automatically handled by MQ implementer. If we are building up a backlog of work, we can just add more workers and that way, scale easily. In sensor platform, these kind of task is very much common.So designing a proper task queue service is required for stable sensor based platform.

How it should work:

Web application or a external job [aka -producer or sender] puts jobs or schedule jobs on a queue with enough content to run or proceed. A group of worker processes[aka-consumer] in the background take the jobs off the queue and execute them. The results can be given back onto a reply queue or perhaps written into a database. It depends on how you want to display the results back to the user. For a web application, writing results into the database probably make more sense.

Scheduling a job and executing a job are two related but independent tasks. Separating a job’s execution from its scheduling ensures the responsibilities of each component are clearly defined and results in a more structured and manageable system.
Use a job scheduler only to queue background work and not to perform it. Background workers then receive the work to be executed out of process from the scheduler.
Fig 2: Sequence Diagram


So based on the above discussion, we need 4 software components
  1. A language framework which helps to create all below components[ not nessary]
  2. A Job scheduler probably single instance.
  3. A sophisticated queue
  4. A worker process – Your task logic




Implementation Strategy:

A language framework: Web Development framework and core platform system.
We choose java based development and we will be using Play 1.2.5 version for our main development framework[???]. Play 2.0 version has been implemented "Actor" concept for this kind of job and adopted scala based "Akka" system which has JAVA API also.But due to complex learning curve and lack of experienced resources, I decided to stick on play 1.2.5 version. We might move play 2.0 version later or future version.

A Job scheduler:
There are many ways to schedule background jobs in Java applications. One of the popular method is using Quartz library along with RabbitMQ to create a scalable and reliable method of scheduling background jobs. Fortunately, Play frameworks has good support of scheduling jobs in handy way.In core, it uses Quartz library. You may write like this
import play.jobs.*;
 
@Every("1h")
public class Bootstrap extends Job {
    
    public void doJob() {
        // get the data and push to queue for worker
    }
    
}
It says that, every one hour, Play will trigger this job for you. You may put minutes or sec to schedule also. If the @Every annotation is not enough ,you can use the @On annotation to run your jobs using a CRON expression. Like


/** Fire at 12pm (noon) every day **/ 
@On("0 0 12 * * ?")
Queue:

I have decided to use rabbitMQ. It supports all kinds of features we wanted,its popularity and adoption to PaaS platform.Resque,Beanstalkd,ActiveMQ are also few among of them. RabbitMQ has been using long time for enterprise messaging bus. A nice tutorial from rabbitMQ site would help us to write queue and worker process.


Worker:

Will discuss in subsequent post.

No comments:

Post a Comment