CERN modernizes IT infrastructure with OpenStack and Puppet

But the research organization will remain faithful to tape storage

CERN's datacenter is modernizing with OpenStack and Puppet.

CERN's datacenter is modernizing with OpenStack and Puppet.

CERN is making the infrastructure that handles the data from the Large Hadron Collider (LHC) more flexible by upgrading it with OpenStack for virtualization and Puppet for configuration management.

The research organization's objective is to change how it provides services to scientists working at the LHC, which runs in a 27-kilometer circular tunnel about 100 meters beneath the Swiss and French border at Geneva.

"One of the things we have to contend with is how to scale our infrastructure fairly significantly with a fixed staff and fixed costs. With a fixed budget you can buy more and more equipment, but you can't provide more and more services with the same number of people," said Ian Bird, LHC computing grid project leader.

But that may be possible if you change the way things are done. CERN's goal is to become more efficient by moving in the direction of infrastructure-as-a-service and platform-as-a-service with a private cloud. The goal is to be able to more dynamically change how the infrastructure is used. Right now the accelerator is shut down so the CERN data center has a different workload from last year when the LHC was running, according to Bird.

"Users also want to provision an analysis cluster with 50 machines themselves for an afternoon that then goes away again. It is about providing those kinds of services," Bird said.

CERN chose OpenStack because it seems to be the platform with the most traction behind it. OpenStack's popularity also makes it attractive from a staffing point of view, according to Bird.

"We have a transient staff, because not everybody has permanent contracts. So it's good to have people that come in with that expertise or can leave with it, and then sell it somewhere else," Bird said.

CERN is also moving away from the custom in-house software that manages the cluster itself to software like Puppet.

"When we started scaling up the cluster for LHC, the large scale Googles and Amazons didn't really exist. So we invested quite a lot of effort in configuration management and monitoring, but a couple of years ago we decided to instead go with something that had a larger support community," Bird said.

CERN looked at Chef and Puppet, and chose the latter as it worked in a way that was closer to its own management model. The rollout of Puppet and OpenStack are both underway.

Today CERN's infrastructure is distributed across about 160 data centers of different sizes located around the world.

"The reason behind that is twofold; one is given the size of the data center we have here there is no way we could have done all the computing for the LHC, and the other is political and sociological. We are given money to do computing, but it is preferred that the funding stays where it is coming from," said Bird.

CERN's own data center and a recently announced data center in Budapest is tier 0, and the next tier is made up of 11 data centers that are typically located at large national labs, such as the FermiLab in the U.S., according to Bird. The last tier mostly consists of computing resources at universities.

To make OpenStack a better fit for CERN's distributed computing resources, the organization will collaborate with the community on data center federation.

"If we at CERN are running OpenStack and other of our grid centers are also running OpenStack we would like to federate the cloud parts ... So if you have your credentials at CERN, you ought to be able to let your work migrate to FermiLab, for example," Bird said.

Storage is a very important part of what CERN does, and the demands are huge. The two big detectors -- CMS and ATLAS -- at the LHC produce about 1 petabyte or 1,000 terabytes of data per second. The detectors track the motion and measure the energy and charge of particles thrown out in all directions after a collision in the accelerator. That data is then whittled down to a few hundred megabytes per second of the most interesting events by a farm of Linux machines with 15,000 processing cores located at each detector.

Still, in 2012 about 30PB of data from the LHC was saved. The data is cached on disk, but then archived on tape. The archive stores about 100PB of data, of which about 70PB comes from the accelerator, according to Bird, who calls the archiving "a non-trivial exercise."

Bird is a big fan of tape storage for three main reasons: cost, error rates and power consumption.

Tape is still a factor of 10 cheaper than the equivalent space on disk. Hosted storage services such as Glacier from Amazon Web Services are much too expensive, Bird said. And the error rate on tapes is extremely low compared to the failure of disks, he said.

It's also important to keep down power consumption, which is a limiting factor in today's data centers. The data center in Budapest was added not because CERN ran out of space, but because it ran out of power. The tape robots use very little power compared to disks, according to Bird.

"Tape is quite significantly underrated. Probably for the last 15 years people have been saying that it is dead, and will be replaced by disk. But it hasn't gone away, and I don't see it going away any time soon. For large archives you can't really compete," Bird said.

But tape has to be managed well for it to work.

"You can't just put it on tape and leave it for 20 years. Tape media changes every two or three years, so we are continually reading it from one generation and copying it to the next generation. We also read it actively to make sure it is still readable," Bird said.

Send news tips and comments to

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags cloud computinginternetvirtualizationServer Virtualizationpopular scienceCERNInfrastructure services

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Mikael Ricknäs

IDG News Service
Show Comments

Cool Tech

Toys for Boys

Family Friendly

Stocking Stuffer

SmartLens - Clip on Phone Camera Lens Set of 3

Learn more >

Christmas Gift Guide

Click for more ›

Brand Post

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Aysha Strobbe

Microsoft Office 365/HP Spectre x360

Microsoft Office continues to make a student’s life that little bit easier by offering reliable, easy to use, time-saving functionality, while continuing to develop new features that further enhance what is already a formidable collection of applications

Michael Hargreaves

Microsoft Office 365/Dell XPS 15 2-in-1

I’d recommend a Dell XPS 15 2-in-1 and the new Windows 10 to anyone who needs to get serious work done (before you kick back on your couch with your favourite Netflix show.)

Maryellen Rose George

Brother PT-P750W

It’s useful for office tasks as well as pragmatic labelling of equipment and storage – just don’t get too excited and label everything in sight!

Cathy Giles

Brother MFC-L8900CDW

The Brother MFC-L8900CDW is an absolute stand out. I struggle to fault it.

Luke Hill


I need power and lots of it. As a Front End Web developer anything less just won’t cut it which is why the MSI GT75 is an outstanding laptop for me. It’s a sleek and futuristic looking, high quality, beast that has a touch of sci-fi flare about it.

Emily Tyson

MSI GE63 Raider

If you’re looking to invest in your next work horse laptop for work or home use, you can’t go wrong with the MSI GE63.

Featured Content

Product Launch Showcase

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?