Tuesday, January 25, 2011

Search Technology

Let's talk about search technology .I will be continuing about search related blog post over a time.Please visit frequently for update.Full Text Search is a big factor for any successful web site or an application. It's all about speed,accuracy and context sensitive.Building a proper search architecture is a modern art.That's the Google does perfectly.I am not going to talk a another web search engine or not going to talk how Google does search or not going to design new search algorithm.It's all about search technology behind application(Web application,desktop application which needs search functionality.). Let's jump to the point.The Basic idea is,You need to search data from data store. It may be from  relational database,Graph database ,file storage,distributed file system,SAN store,cluster anywhere. We already did this in several way but we are not always happy with speed,accuracy,context in terms of return search result.We always try to improve these area.Here I am going to share some recent trend about using such kind of practice you may heard about Apache Lucene, xapian search server ,mysql full text search or list of open source in java .I would be talking about Apache Lucene as I am interested in Java and it is also proven to be very fast and full featured search engine.
"Apache Lucene(TM) is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform." - From Apache Lucene webSite.
There are a lot of websites who adopted lucene as their primary search library.Linkedln.com and twitter.com are among of them. So what makes lucene be a popular search library?. I tried to find answer. It is

  • Advanced Full-Text Search Capabilities
  • Standards Based Open Interfaces - XML,JSON and HTTP
  • Extensible Plugin Architecture
  • Faceted Search and Filtering
  • Advanced, Configurable Text Analysis
  • Rich Document Parsing and Indexing (PDF, Word, HTML, etc) using Apache Tika

lot many.............
Ok  from next post I will be talking much on Apache Lucene with some example.

Saturday, January 22, 2011

Welcome to high scalable web architecture using open source blog

Welcome to my blog. This blog keeps update from latest high scalable web architecture technology. Today's web is leading by social networking site like orkut.comfacebook.comtwitter.com ,linkedln.com etc. Most of the internet traffic comes from these type of  sites.I just wonder and want to know ,how these sites has been build? What are the softwares they use? What are the architecture makes these site so scalable?
While doing research and searching over web,I found that most of the sites architecture are build using open source softwares,whether it is searching technology, data storing, caching, clustering, web server, proxy server, app server,front-end technology ,message queue blah blah, everywhere open source softwares play a major role. I would be taking about all of these area with help of great article published in internet and some hands on coding wherever applicable up-to my knowledge.
I don't have much experience and knowledge in these fields but I am learning.I though it would be better to keep all these knowledge resources in one place so that anyone can learn(aka knowledge sharing).I would like to have comments from expert whenever I do some mistake. My English writing is very poor and am trying to improve it. Hope I would do good in future.Please let me  point out on that too.
Happy open learning and coding!!