Hopkins to build data analysis super machine

Johns Hopkins redefines the rules for building supercomputers

Disregarding the supercomputing community's insatiable thirst for FLOPS (floating point operations per second), the Baltimore-based Johns Hopkins University is configuring its new machine to achieve the maximum number of IOPS (I/O operations per second) instead.

The novel design will be better suited to the kind of data-mining-oriented scientific workloads processed by today's supercomputers, argued Alexander Szalay, a computer scientist and astrophysicist at Johns Hopkins' Institute for Data Intensive Engineering and Science, who is leading the project.

"For the sciences, it is the I/O that is becoming the major bottleneck," he explained. "People are running larger and larger simulations, and they take up so much memory, it is difficult to write the output to disk."

The U.S. National Science Foundation (NSF) has provided US$2.1 million for the system, called Data-Scope. Hopkins itself is contributing $1 million as well.

Thus far, 20 research groups within Hopkins have indicated they could use the system to study problems in genomics, ocean circulation, turbulence, astrophysics and environmental science. The university will also allow outside organizations to use the machine. Data-Scope is expected to go live by next May.

FLOPS measures the amount of floating point calculations a computer can do in a second, an essential tool for analyzing large amounts of data. But IOPS measures the amount of data that can be moved on and off a computer.

By maximizing IOPS, the new system will "enable data analysis tasks that are simply not possible today," the researchers stated in the proposal.

Today, most researchers are limited to analyzing datasets only up to 10 terabytes in size, while larger datasets, such as those that are 100 terabytes or more, can only be investigated by a handful of the largest supercomputers. Hopkins' novel configuration of hardware might offer a lower cost way to analyze such big datasets, Szalay said.

The machine, once built, will have a total I/O bandwidth of 400 to 500 gigabytes per second, approximately more than twice that of the fastest computer, Oak Ridge National Laboratory's Jaguar, on the Top 500 ranking of the world's most powerful computers.

Data-Scope, however, will only offer a peak performance of about 600 teraflops, far short of Jaguar's 1.75 Petaflops.

In Hopkins' design, each server will have 24 dedicated hard disk drives as well as four solid state disks, which in total can provide 4.4 gigabytes per second across the chassis bus directly to two GPUs (graphics processing units), which will do much of the calculations.

Overall, the system will have about 100 of these machines and about five petabytes in storage total.

To guide the design, the team used a rule-of-thumb devised by computer scientist Gene Amdahl. Ideally, Amdahl posited, a computer should have one I/O bit ready for each instruction it executes.

Most supercomputer architects have disregarded this rule, claiming the processor caches can bank data and have it ready for use when needed. Now that datasets have grown so large, Amdahl's rule should be reconsidered, Szalay argued.

A typical Amdahl number for a supercomputer would be an Amdahl .001, or a thousandth of the optimal balance, whereas Data-Scope should have an Amdahl number of about .6 or .7.

The designers also plan to make some changes in the way databases are used. "We don't use the database just as dump storage but as an active computing environment," Szalay said. Instead of moving data from a database across a network to a cluster of servers, researchers can write user-defined functions that can run against the database itself.

Researchers can use one of three images that can be booted on the system: Windows Server 2008, a combination of Linux and MySQL and a third instance running Hadoop.

Data-Scope will be housed in a new campus green data center being built with $1.3 million in funding from the NSF.

Joab Jackson covers enterprise software and general technology breaking news for The IDG News Service. Follow Joab on Twitter at @Joab_Jackson. Joab's e-mail address is Joab_Jackson@idg.com

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags softwareapplicationshardware systemsdata miningHigh performanceClustersU.S. National Science FoundationJohns Hopkins

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Joab Jackson

IDG News Service
Show Comments

Father’s Day Gift Guide

Brand Post

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Luke Hill

MSI GT75 TITAN

I need power and lots of it. As a Front End Web developer anything less just won’t cut it which is why the MSI GT75 is an outstanding laptop for me. It’s a sleek and futuristic looking, high quality, beast that has a touch of sci-fi flare about it.

Emily Tyson

MSI GE63 Raider

If you’re looking to invest in your next work horse laptop for work or home use, you can’t go wrong with the MSI GE63.

Laura Johnston

MSI GS65 Stealth Thin

If you can afford the price tag, it is well worth the money. It out performs any other laptop I have tried for gaming, and the transportable design and incredible display also make it ideal for work.

Andrew Teoh

Brother MFC-L9570CDW Multifunction Printer

Touch screen visibility and operation was great and easy to navigate. Each menu and sub-menu was in an understandable order and category

Louise Coady

Brother MFC-L9570CDW Multifunction Printer

The printer was convenient, produced clear and vibrant images and was very easy to use

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Featured Content

Product Launch Showcase

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?