Hopkins to build data analysis super machine

Johns Hopkins redefines the rules for building supercomputers

Disregarding the supercomputing community's insatiable thirst for FLOPS (floating point operations per second), the Baltimore-based Johns Hopkins University is configuring its new machine to achieve the maximum number of IOPS (I/O operations per second) instead.

The novel design will be better suited to the kind of data-mining-oriented scientific workloads processed by today's supercomputers, argued Alexander Szalay, a computer scientist and astrophysicist at Johns Hopkins' Institute for Data Intensive Engineering and Science, who is leading the project.

"For the sciences, it is the I/O that is becoming the major bottleneck," he explained. "People are running larger and larger simulations, and they take up so much memory, it is difficult to write the output to disk."

The U.S. National Science Foundation (NSF) has provided US$2.1 million for the system, called Data-Scope. Hopkins itself is contributing $1 million as well.

Thus far, 20 research groups within Hopkins have indicated they could use the system to study problems in genomics, ocean circulation, turbulence, astrophysics and environmental science. The university will also allow outside organizations to use the machine. Data-Scope is expected to go live by next May.

FLOPS measures the amount of floating point calculations a computer can do in a second, an essential tool for analyzing large amounts of data. But IOPS measures the amount of data that can be moved on and off a computer.

By maximizing IOPS, the new system will "enable data analysis tasks that are simply not possible today," the researchers stated in the proposal.

Today, most researchers are limited to analyzing datasets only up to 10 terabytes in size, while larger datasets, such as those that are 100 terabytes or more, can only be investigated by a handful of the largest supercomputers. Hopkins' novel configuration of hardware might offer a lower cost way to analyze such big datasets, Szalay said.

The machine, once built, will have a total I/O bandwidth of 400 to 500 gigabytes per second, approximately more than twice that of the fastest computer, Oak Ridge National Laboratory's Jaguar, on the Top 500 ranking of the world's most powerful computers.

Data-Scope, however, will only offer a peak performance of about 600 teraflops, far short of Jaguar's 1.75 Petaflops.

In Hopkins' design, each server will have 24 dedicated hard disk drives as well as four solid state disks, which in total can provide 4.4 gigabytes per second across the chassis bus directly to two GPUs (graphics processing units), which will do much of the calculations.

Overall, the system will have about 100 of these machines and about five petabytes in storage total.

To guide the design, the team used a rule-of-thumb devised by computer scientist Gene Amdahl. Ideally, Amdahl posited, a computer should have one I/O bit ready for each instruction it executes.

Most supercomputer architects have disregarded this rule, claiming the processor caches can bank data and have it ready for use when needed. Now that datasets have grown so large, Amdahl's rule should be reconsidered, Szalay argued.

A typical Amdahl number for a supercomputer would be an Amdahl .001, or a thousandth of the optimal balance, whereas Data-Scope should have an Amdahl number of about .6 or .7.

The designers also plan to make some changes in the way databases are used. "We don't use the database just as dump storage but as an active computing environment," Szalay said. Instead of moving data from a database across a network to a cluster of servers, researchers can write user-defined functions that can run against the database itself.

Researchers can use one of three images that can be booted on the system: Windows Server 2008, a combination of Linux and MySQL and a third instance running Hadoop.

Data-Scope will be housed in a new campus green data center being built with $1.3 million in funding from the NSF.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags ClustersHigh performanceJohns Hopkinsapplicationshardware systemsdata miningsoftwareU.S. National Science Foundation

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

David Coyle

Brother PocketJet PJ-773 A4 Portable Thermal Printer

I rate the printer as a 5 out of 5 stars as it has been able to fit seamlessly into my busy and mobile lifestyle.

Kurt Hegetschweiler

Brother PocketJet PJ-773 A4 Portable Thermal Printer

It’s perfect for mobile workers. Just take it out — it’s small enough to sit anywhere — turn it on, load a sheet of paper, and start printing.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?