Let’s not kid ourselves; Hadoop is doing well.
Some would say very well; a recent eWeek article declared that 2013 would be the year that Hadoop beat out the big data analytics competition.
“The Apache Hadoop platform has been ascendant in recent years, thanks to its flexibility, rich developer ecosystem, and ability to suit the analytics needs of developers, web startups and enterprises,” wrote the Greenplum blog recently.
But that doesn’t mean Hadoop is fully polished and meeting all needs. One area where some organizations still struggle with Hadoop is search.
“Big data is everywhere. Today’s problem is less about storing data and more about being able to actually find what you’re looking for in the data,” noted Geoffrey Hendrey in a statement, chief technology officer for Vertascale. “Everyone working with Big Data is challenged by the ‘I don’t know what I don’t know’ problem, and the prohibitively long iteration cycles.”
That’s partially because the existing ways to search big data sets with Hadoop rely on MapReduce batch processing or add-on search systems that are not able to nimbly handle the data.
SimpleSearch, a new product by Hadoop solutions startup, Vertascale, looks to ease the problem of search.
“SimpleSearch lets you find, explore and export large data sets quickly and easily in a way that’s scalable and cost effective,” added Hendrey. “We believe that by speeding up the time-to-answer for engineers, business analysts and data scientists we can provide a critical advantage to organizations that deploy SimpleSearch.”
The idea behind SimpleSearch is a unique index that speeds up queries by a factor of 1,000, according to the company. It brings real-time search capabilities to semi-structured and mix-structured data stored either the Hadoop File System or Amazon’s S3 cloud storage service. It uses MapReduce to build a scalable, distributed index that supports real-time query and analysis.
“Vertascale’s approach is intriguing because it’s an alternative to the slow, batch processing nature of MapReduce,” said Lee Paries, VP of sales for Teradata (News
- Alert) Aster and a Vertascale advisor. “Real-time Big Data query and analysis is a missing piece today.”
The company noted that traditional search techniques were developed more than 30 years ago, and they focused on building a concise index that relied on real-time computation to minimize storage requirements. A better approach, used today by SimpleSearch, takes advantage of commodity storage and distributed computing to yield a better way to handle the volumes of data within a Hadoop data set.
Currently SimpleSearch is in private beta, according to the company.
Edited by Rachel Ramsey