Sunday, September 21, 2014

On new search technologies

Organizing information, search and retrieval were always important aspects of computer science. Older management information systems used to store and index data in databases and used to provide the programmer with fairly robust and reliable ways to select & display the desired data based on certain criteria.

Depending on the technology, the size of the data and the nature of the data certain solutions delivered more performance (speed) and accuracy than others.

Since the raise (and fall) of different search engines- you can read raise for Google and fall for companies like WebCrawler, AOL and Altavista- the topic of searching in unstructured data became more prominent.

There is an enormous amount of data on the web, most of it unstructured and a lot of it noisy. The issue of relevance in search and retrieve, especially on free text searches, is more and more current.

While companies like Google do a very good job, there are many other smaller players who implement similar technologies to fuel their search.

Let's take the case study of a business that matches service providers with their clients in a certain industry. They maintain a fairly large database, a free text search field on their homepage and want people to be able to search & retrieve the most relevant companies to the search criteria based on what the user types in that free text search field.

The company chose Python with the django framework to implement their search. A module called Hystack allows for plugging in powerful search services like Solr or Elasticsearch. These solutions implement high performance, index based, natural search on free text fields.

With Elasticsearch for example you can boost the weight of a field, i.e. for example giving a priority to a match by title in comparison to a match by description.

Here are two very interesting tutorials.

Building a full-text search engine with django:


Haystack:


Adrian Corbuleanu
Miami Beach, FL
http://wittymobileapps.weebly.com