Wednesday, September 26, 2012

Implementing a Task Queue Service



Task Queue Service

Recently I was digging myself about designing Task Queue service for PaaS platform. I realized how it could be an important component of our platform as well. Though Task queue has been used long time for large scale web implementation but using as a service in PaaS platform is quite challenging and very few players are in this market. After long research, I got few like google App Engine and heroku ,ironWorker who have implemented this kind of service. If you know others, Please let me know.


The main idea behind task queue is to avoid doing a resource-intensive task immediately and having to wait for it to complete. Instead we schedule the task to be done later. We encapsulate a task as a message and send it to the queue. A worker process running in the background will pop the tasks and eventually execute the job. When you run many workers the tasks will be shared between them.This concept is especially useful in web applications where it's impossible to handle a complex task during a short HTTP request window.

Task queues have all sorts of uses for off line processing, including periodically pulling data from third party sources, computing aggregate statistics,making decision based on analysis etc. Basic advantages of using Task Queue is the ability to easily parallel work.Most of the message queue supports this feature. It means we don't need to balance the load of worker externally. It would be automatically handled by MQ implementer. If we are building up a backlog of work, we can just add more workers and that way, scale easily. In sensor platform, these kind of task is very much common.So designing a proper task queue service is required for stable sensor based platform.

How it should work:

Web application or a external job [aka -producer or sender] puts jobs or schedule jobs on a queue with enough content to run or proceed. A group of worker processes[aka-consumer] in the background take the jobs off the queue and execute them. The results can be given back onto a reply queue or perhaps written into a database. It depends on how you want to display the results back to the user. For a web application, writing results into the database probably make more sense.

Scheduling a job and executing a job are two related but independent tasks. Separating a job’s execution from its scheduling ensures the responsibilities of each component are clearly defined and results in a more structured and manageable system.
Use a job scheduler only to queue background work and not to perform it. Background workers then receive the work to be executed out of process from the scheduler.
Fig 2: Sequence Diagram


So based on the above discussion, we need 4 software components
  1. A language framework which helps to create all below components[ not nessary]
  2. A Job scheduler probably single instance.
  3. A sophisticated queue
  4. A worker process – Your task logic




Implementation Strategy:

A language framework: Web Development framework and core platform system.
We choose java based development and we will be using Play 1.2.5 version for our main development framework[???]. Play 2.0 version has been implemented "Actor" concept for this kind of job and adopted scala based "Akka" system which has JAVA API also.But due to complex learning curve and lack of experienced resources, I decided to stick on play 1.2.5 version. We might move play 2.0 version later or future version.

A Job scheduler:
There are many ways to schedule background jobs in Java applications. One of the popular method is using Quartz library along with RabbitMQ to create a scalable and reliable method of scheduling background jobs. Fortunately, Play frameworks has good support of scheduling jobs in handy way.In core, it uses Quartz library. You may write like this
import play.jobs.*;
 
@Every("1h")
public class Bootstrap extends Job {
    
    public void doJob() {
        // get the data and push to queue for worker
    }
    
}
It says that, every one hour, Play will trigger this job for you. You may put minutes or sec to schedule also. If the @Every annotation is not enough ,you can use the @On annotation to run your jobs using a CRON expression. Like


/** Fire at 12pm (noon) every day **/ 
@On("0 0 12 * * ?")
Queue:

I have decided to use rabbitMQ. It supports all kinds of features we wanted,its popularity and adoption to PaaS platform.Resque,Beanstalkd,ActiveMQ are also few among of them. RabbitMQ has been using long time for enterprise messaging bus. A nice tutorial from rabbitMQ site would help us to write queue and worker process.


Worker:

Will discuss in subsequent post.

Interesting Article on scalable web architecture

Currently I am too much interested on learning scalable web system design.
I will post my findings and knowledge in sub-sequent post.

I found this blog for discussion about scalable web system design.

http://horicky.blogspot.in/2008/02/scalable-system-design.html


Tuesday, January 25, 2011

Search Technology

Let's talk about search technology .I will be continuing about search related blog post over a time.Please visit frequently for update.Full Text Search is a big factor for any successful web site or an application. It's all about speed,accuracy and context sensitive.Building a proper search architecture is a modern art.That's the Google does perfectly.I am not going to talk a another web search engine or not going to talk how Google does search or not going to design new search algorithm.It's all about search technology behind application(Web application,desktop application which needs search functionality.). Let's jump to the point.The Basic idea is,You need to search data from data store. It may be from  relational database,Graph database ,file storage,distributed file system,SAN store,cluster anywhere. We already did this in several way but we are not always happy with speed,accuracy,context in terms of return search result.We always try to improve these area.Here I am going to share some recent trend about using such kind of practice you may heard about Apache Lucene, xapian search server ,mysql full text search or list of open source in java .I would be talking about Apache Lucene as I am interested in Java and it is also proven to be very fast and full featured search engine.
"Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform." - From Apache Lucene webSite.
There are a lot of websites who adopted lucene as their primary search library.Linkedln.com and twitter.com are among of them. So what makes lucene be a popular search library?. I tried to find answer. It is

  • Advanced Full-Text Search Capabilities
  • Standards Based Open Interfaces - XML,JSON and HTTP
  • Extensible Plugin Architecture
  • Faceted Search and Filtering
  • Advanced, Configurable Text Analysis
  • Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika

lot many.............
Ok  from next post I will be talking much on Apache Lucene with some example.

Saturday, January 22, 2011

Welcome to high scalable web architecture using open source blog

Welcome to my blog. This blog keeps update from latest high scalable web architecture technology. Today's web is leading by social networking site like orkut.comfacebook.comtwitter.com ,linkedln.com etc. Most of the internet traffic comes from these type of  sites.I just wonder and want to know ,how these sites has been build? What are the softwares they use? What are the architecture makes these site so scalable?
While doing research and searching over web,I found that most of the sites architecture are build using open source softwares,whether it is searching technology, data storing, caching, clustering, web server, proxy server, app server,front-end technology ,message queue blah blah, everywhere open source softwares play a major role. I would be taking about all of these area with help of great article published in internet and some hands on coding wherever applicable up-to my knowledge.
I don't have much experience and knowledge in these fields but I am learning.I though it would be better to keep all these knowledge resources in one place so that anyone can learn(aka knowledge sharing).I would like to have comments from expert whenever I do some mistake. My English writing is very poor and am trying to improve it. Hope I would do good in future.Please let me  point out on that too.
Happy open learning and coding!!