Task
Queue Service
Recently I was digging
myself about designing Task Queue service for PaaS platform. I
realized how it could be an important component of our platform as
well. Though Task queue has been used long time for large scale web
implementation but using as a service in PaaS platform is quite challenging and very few players are in this market. After long
research, I got few like google
App Engine and heroku
,ironWorker
who have implemented this kind of service. If you know others,
Please let me know.
The main idea behind
task queue is to avoid doing a resource-intensive task immediately
and having to wait for it to complete. Instead we schedule the task
to be done later. We encapsulate a task as a message and send it to
the queue. A worker process running in the background will pop the
tasks and eventually execute the job. When you run many workers the
tasks will be shared between them.This concept is especially useful
in web applications where it's impossible to handle a complex task
during a short HTTP request window.
Task queues have all
sorts of uses for off line processing, including periodically pulling
data from third party sources, computing aggregate statistics,making
decision based on analysis etc. Basic advantages of using Task Queue
is the ability to easily parallel work.Most of the message queue
supports this feature. It means we don't need to balance the load of
worker externally. It would be automatically handled by MQ
implementer. If we are building up a backlog of work, we can just add
more workers and that way, scale easily. In sensor platform, these
kind of task is very much common.So designing a proper task queue
service is required for stable sensor based platform.
How it should work:
Web application or a
external job [aka -producer or sender] puts jobs or schedule
jobs on a queue with enough content to run or proceed. A group
of worker processes[aka-consumer] in the background take the
jobs off the queue and execute them. The results can be given back
onto a reply queue or perhaps written into a database. It depends on
how you want to display the results back to the user. For a web
application, writing results into the database probably make more
sense.
Scheduling a job and executing a job are two
related but independent tasks. Separating a job’s execution from
its scheduling ensures the responsibilities of each component are
clearly defined and results in a more structured and manageable
system.
Use a job scheduler only to queue
background work and not to perform it.
Background workers then receive the work to be executed out of
process from the scheduler.
Fig 2: Sequence
Diagram
So based on the above discussion, we need 4
software components
- A language framework which helps to create all below components[ not nessary]
- A Job scheduler probably single instance.
- A sophisticated queue
- A worker process – Your task logic
Implementation
Strategy:
A language framework: Web Development
framework and core platform system.
We choose java based development and we will be using Play 1.2.5 version for our main
development framework[???]. Play 2.0 version has been implemented
"Actor" concept for this kind of job and adopted scala
based "Akka" system which has JAVA API also.But due to
complex learning curve and lack of experienced resources, I decided
to stick on play 1.2.5 version. We might move play 2.0 version later
or future version.
A Job scheduler:
There are many ways to schedule background jobs in
Java applications. One of the popular method is using Quartz library
along with RabbitMQ to create a scalable and reliable method of
scheduling background jobs. Fortunately, Play frameworks has good
support of scheduling
jobs in handy way.In core, it uses Quartz
library. You may write like this
import play.jobs.*;@Every("1h")public class Bootstrap extends Job {public void doJob() {// get the data and push to queue for worker}}
It says that, every one hour, Play will trigger this job for you. You
may put minutes or sec to schedule also. If the
@Every
annotation is not enough ,you can use the @On
annotation to run your jobs using a CRON expression. Like/** Fire at 12pm (noon) every day **/@On("0 0 12 * * ?")
Queue:
I have decided to use rabbitMQ. It supports all
kinds of features we wanted,its popularity and adoption to PaaS
platform.Resque,Beanstalkd,ActiveMQ are also few among of them.
RabbitMQ has been using long time for enterprise messaging bus. A
nice tutorial from rabbitMQ site would help us to write queue and
worker process.
Worker:
Will discuss in subsequent post.

No comments:
Post a Comment