MapR and Twitter (News - Alert) recently demonstrated real-time Hadoop analytics at the Strata Conference at the end of February. By harnessing the power of the Twitter API, the two companies streamed the #strataconf hashtag directly into a cluster.
Two real-time tag (News - Alert) clouds were shown and a word bubble with the most frequently used words in conference tweets would shrink and expand in real-time as tweets were made. Meanwhile, top tweeters' user names were displayed in another cloud. This impressed conference goers and for good reason.
While real-time analytics are becoming more common these days in a number of scenarios — social media, stock tick data, network sensors, payments and ad impressions — but there aren't many tools capable of fitting all the needs of the enterprise. Even the MapReduce framework isn't capable of providing this without very high latency — at least not on its own. Indeed, traditional Hadoop solutions usually need to be augmented with other solutions to provide real-time streams.
That's where Storm comes in. Written by Nathan Marz at Backtype/Twitter, Storm acts as a continuous, distributed stream computation engine for the large amount of tweets that need to be processed. Storm is similar to Hadoop in that it hides the complexity of its systems and gets its data from queuing systems like Kafka or Kestrel. Typically, something like Storm would write raw data to Hadoop at the end of the real-time workflow for batch analysis.
However, since MapRFS is capable of interacting with lower-latency systems, it allows for publish-subscribe models within the data platform as opposed to using a queuing system. Storm then "tails" the file it wants to subscribe to, which is then injected into the Storm topology as soon as new data enters the file system — resulting in strong Storm/Hadoop interoperability.
Meanwhile, experts have been suggesting that real-time interactive queries will be a major focus for Hadoop going forward, particularly through the use of an SQL interface.
Edited by
Rachel Ramsey