IBM develops new clustered analytics processing platform

The GPFS-SNC distributed computing architecture supports Posix

IBM today announced that it has created a new distributed computing architecture with a General Parallel File System technology that is twice as fast as existing clustered file systems and that provides management and advanced data-replication techniques.

Calling it the General Parallel File System-Shared Nothing Cluster (GPFS-SNC), IBM said the new architecture is designed to provide higher availability through advanced clustering technologies.

Prasenjit Sarkar, a master inventor in storage analytics and resiliency for IBM's research branch, said the system scales linearly, so that a file system with 40 nodes would have 12GB/sec. throughput, and a system with 400 nodes could achieve 120GB/sec. throughput.

"It's very cost-effective bandwidth. You get 1MB/sec. per dollar," Sarkar said. "If you try to replicate that with a [storage area network], it gets very costly."

The new architecture is aimed at enabling applications that support high-performance analytics, data warehousing applications and cloud computing, he said.

Sarkar described the GPFS's "shared nothing" cluster technology as each node or standard x86 server having access to its own metadata, cache, the data storage and management tools, while also having access to every other node in the cluster at the same time through Gigabit Ethernet ports.

"What we have done, in contrast to the Google file system, which has a single domain node, is we've distributed every aspect of the file system -- the metadata, the allocation, the lock management, the token management," he said. "Even if you take out a rack of servers [from the cluster], we'll still be able to continue to work."

By "sharing nothing," Sarkar said, new levels of availability, performance and scaling can be achieved with the clustered file system. Each node in the GPFS-SNC architecture is also self-sufficient. Tasks are divided up between these independent computers, and no one has to wait on another, Sarkar said.

The GPFS-SNC code also supports Posix, which enables a wide range of traditional applications to run on top of the file system, allowing both reads and writes to be performed.

"You can open a file, you can read a file, then you can append to the file and overwrite any section. With Google's Hadoop distributed file system, you cannot append to a file, you can't overwrite any sections, so you're very limited in what you can do," Sarkar said.

GPFS-SNC also supports the whole range of enterprise data storage features, such as snapshots, backup, archiving, information life-cycle management, data caching, WAN data replication, and management policies. The architecture has a single global domain namespace, allowing virtual machines to be moved between hypervisor nodes.

"So for example in our cluster, you can run Hadoop as well as a clustered DB2 or Oracle databases," Sarkar said. "This allows us to have a general-purpose file system that [can be used by] a wide range of users."

IBM would not say when the GPFS-SNC file system would make it out of the labs and into the marketplace, but Sarkar said that once it it's available, it will be targeted at three use cases: data warehousing, Hadoop MapReduce applications and cloud computing.

"The cloud may not be intuitive of a parallel architecture, but we have [many] virtual machines on each hypervisor node, and we have a lot of hypervisor nodes in parallel. Each virtual machine is accessing its own storage independently of every other virtual machine. So in effect you're getting a lot of parallel access to storage," Sarkar said.

IBM's current GPFS technology offering is the core technology for the company's high-performance computing systems, Information Archive, Scale-Out Network-Attached Storage (SONAS), and Smart Business Compute Cloud.

The GPFS-SNC technology's ability to run real-time Hadoop applications on a cluster won IBM a first-place award at the Supercomputing 2010 conference in New Orleans this week.

Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at @lucasmearian or subscribe to Lucas's RSS feed. His e-mail address is lmearian@computerworld.com.

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags IBMbusiness intelligencesoftwareapplications

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Lucas Mearian

Computerworld (US)
Show Comments

Cool Tech

Breitling Superocean Heritage Chronographe 44

Learn more >

SanDisk MicroSDXC™ for Nintendo® Switch™

Learn more >

Toys for Boys

Family Friendly

Panasonic 4K UHD Blu-Ray Player and Full HD Recorder with Netflix - UBT1GL-K

Learn more >

Stocking Stuffer

Razer DeathAdder Expert Ergonomic Gaming Mouse

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?