Informatica, MapR team for Hadoop streaming

Informatica and MapR tackle the problem of analyzing data in Hadoop in near-real time

Apache Hadoop users will soon be able to analyze data as it is streamed from its source, thanks to a partnership between data-warehouse software provider Informatica and Hadoop distributor MapR.

The companies are integrating their products so that the new world of big data analysis can work more easily with more traditional data warehouse implementations.

Specifically, the companies are writing a connector that will ingest data streamed from Informatica's Ultra Messaging application into a MapR Hadoop implementation.

Ultra Messaging copies log file entries, transaction data and other forms of high-volume, continually updated content onto a messaging bus, so it can be reused and analyzed by other systems. Hadoop is a data processing platform, one that can be used to store and analyze large amounts of data in varying formats.

One disadvantage to Hadoop is that it is designed for batch processing, explained Jack Norris, MapR vice president of marketing. With the standard edition of Hadoop, the underlying file system, HDFS, requires that a data file be closed before it can be analyzed. This can be problematic when trying to analyze a flow of constantly updated data. The administrator must pick arbitrary times to close the file for analysis. As a result, "You are knowingly dealing in old data," Norris said.

MapR's distribution, however, is unique in that it allows data to be read even while the file the data resides in is still open and being written to. By connecting MapR with Ultra Messaging, the combined system will offer the ability to analyze data in near-real time as it comes off the message bus.

With Hadoop, users can then combine this live data with other types of data, providing a wider breadth of data to analyze. "With Hadoop, [analysis] is not just done on a single data source. It's the combination of different data sources," Norris said.

This combination of technologies would be handy for time-sensitive pattern recognition tasks, Norris said. One such task is fraud detection, in which a financial institution would need to spot the misuse of its credit cards as early as possible. While computer systems have long been used for fraud detection, using Hadoop in conjunction with a stream of live data provides more data sources to monitor, along with the ability to identify infractions more quickly. "You can look across an entire portfolio of transactions and detect small frauds earlier," Norris said.

At least one other technology has been created to tackle the problem of real-time big data analysis. Last year, Twitter purchased BackType, and subsequently released as open source the company's Storm stream data analysis software. Twitter itself uses the software to spot emerging trends from its users.

In addition to Ultra Messaging, the two companies are building connectors to other Informatica data warehousing tools, including bidirectional connectivity with Informatica's flagship PowerCenter and PowerExchange data warehouse applications. MapR data will be able to be backed up in Informatica Data Replication and Informatica FastClone. Also, the community edition of Informatica's HParser, a Hadoop file parser, will be bundled with the MapR distribution.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection
Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Cool Tech

Breitling Superocean Heritage Chronographe 44

Learn more >

SanDisk MicroSDXC™ for Nintendo® Switch™

Learn more >

Toys for Boys

Family Friendly

Panasonic 4K UHD Blu-Ray Player and Full HD Recorder with Netflix - UBT1GL-K

Learn more >

Stocking Stuffer

Razer DeathAdder Expert Ergonomic Gaming Mouse

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

David Coyle

Brother PocketJet PJ-773 A4 Portable Thermal Printer

I rate the printer as a 5 out of 5 stars as it has been able to fit seamlessly into my busy and mobile lifestyle.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?