Twitter to open source streaming data analyzer
- — 06 August, 2011 03:07
Expanding the field of complex event process software with another offering, Twitter will release as open source its software for analyzing live large-scale data streams, called Storm.
Twitter acquired the software when it purchased BackType in July. BackType offered a service that analyzed the impact of an organization's Twitter feed, summarizing how often Twitter messages were repeated by others.
Although the software has been compared to Hadoop, Storm is best suited for analyzing live data streams, such as millions of Twitter feeds.
This approach could provide a speedier and more practical alternative to the traditional approach of real time analysis, which can involve storing the data first in a database, data store or data warehouse. Its use is not limited to Twitter, however. Storm could be used to study other forms of unstructured, frequently updated data.
"The beauty of Storm is that it's able to solve such a wide variety of use cases with just a simple set of primitives," said Nathan Marz, in a blog posting announcing the pending release.
The user creates a query, or search term, that will continue to run against an ever-updating stream of data until the query is terminated. Because Storm can be distributed across multiple servers, it is capable of analyzing large amounts of data.
This sort of analysis is sometimes called complex event processing (CEP). Oracle, StreamBase, SAP and other companies also offer CEP software. Unlike most of these products, however, Storm does not have a built-in storage layer, relying instead of external data stores, Marz pointed out.
Twitter will launch the software next month at the Strange Loop conference in St. Louis.