Why Dropbox left Amazon's cloud and built its own from scratch

Cloud storage service migrated 500 petabytes without users noticing as part of its ‘Magic Pocket’ project; Dropbox's storage team lead, Australian James Cowling explains how

Dropbox's storage team lead James Cowling with VP of engineering, infrastructure Akhil Gupta

Dropbox's storage team lead James Cowling with VP of engineering, infrastructure Akhil Gupta

One in two Internet users in Australia use Dropbox. Worldwide, more than 1.2 billion files are saved to the cloud storage service every 24 hours. And until recently, all those files were sitting in Amazon Web Services’ public cloud.

In 2013 Dropbox decided it would store the bulk of them itself. Over the last three years the company has been busy building its own storage infrastructure and migrating 500 petabytes of data from Amazon’s cloud.

“This has been, as far as I know, the largest ever migration of a company from cloud services into managed infrastructure,” says Dropbox principal engineer and storage team lead James Cowling. “And happening at a time when the trend is definitely in the other direction. It’s a big investment. It’s a pretty bold investment too.”

The company had become big enough for the move to make economic sense. And with an eye on the enterprise market, it wanted to release slick and speedy productivity and collaboration tools. For that it needed to boost performance of its storage infrastructure.

“Eventually it became very clear that storage is core to our business, and we can provide a better experience for our users, innovate faster, by owning that part of the stack,” says Cowling.

Cowling is originally from Sydney but has lived and worked in San Francisco for the last 12 years. He did his PhD at MIT where he met Dropbox cofounder Drew Houston, who asked him to join the company four years ago.

“There was this opportunity at Dropbox, which at the time was a fairly small company, to build a big storage system,” he says. “So that drew me in and I’ve been there ever since.

“It’s grown right? It’s grown and it’s grown up. When I joined infrastructure was just seven people and it was kind of a small scrappy kind of start-up — with a very successful, compelling product. And we’re at the point right now where we have a storage team that I think is putting out the best storage system in the world, so at the very least it’s been a pretty rapid transition.”

Since its inception in 2007 Dropbox had operated a hybrid architecture. All the business logic, databases and Web services ran from its own data centres, while the bulk storage was kept in the cloud on Amazon’s S3.

When the likes of Google, Amazon and Facebook began, there were no cloud services to leverage. Those companies built their own infrastructure and were able to start small and expanding when necessary. Dropbox, on the other hand, would have to build and launch a huge infrastructure setup from scratch.

They call this infrastructure Magic Pocket, a nod to what Cowling calls a “very cheesy” 2009 promotional video (one of the company’s first).

The project had four phases, Cowling says: “Build the system. Prove it correct. Scale it up and then optimise the hell out of it.”

'Build the system'

“First you start from nothing,” says Cowling. “Let’s work out what we need to build, what our requirements are and also accept that we’re not going to know the requirements a few years down the track. We were designing something for a small company, knowing we were going to be a big company.”

His team took around six months to create the initial code using Python. During testing on standard hardware the team rewrote the entire system – for greater efficiency and to reduce the memory footprint switching to Go, with some elements in Rust.

“Designing a distributed storage system is a big challenge, but it’s much harder to build one that operates reliably at scale, and supports all of the monitoring and verification systems and tooling that will ensure it’s running correctly. It’s also incredibly important to make technical decisions that are the right solution to the right problem, not just because they’re cool and novel.”

'Prove it correct'

“How could we guarantee to the company and to ourselves that this was correct?” says Cowling. “You can’t just launch something haphazardly.”

Having built the prototype, they then put it through its paces – injecting software failures and trying to simulate hardware failures.

“[We put] people on a plane to data centres to pull out circuit breakers on racks and we got a rack and boxed it up and waited for it overheat and fail and made sure it’d come back up with the data. It was fun! Software’s complicated but hardware fails in much more unexpected ways.”

To be confident in their new system, the team began a 180-day countdown on a screen overlooking their San Francisco office work pod. The idea was to run the system without issue for the duration. They were on track, until day 40.

Read more: Hyperconvergence vendor SimpliVity appoints ANZ sales lead

“We had a staging cluster – a copy of our test cluster – that we’d test out new code on. And we found a bug. It was pretty close. It didn’t actually break the rules, but we wouldn’t have felt good about ourselves launching.”

They started again. And this time they made it. “We all clapped and we drank some champagne,” says Cowling. “And then we were like – ‘what’s the next thing we’re going to do?’ – and we were straight back to work.”

'Scale it up'

“Magic Pocket had to grow from our initial double-digit-petabyte prototypes to a multi-exabyte behemoth within the span of around six months– a fairly unprecedented transition,” says Cowling. “We called it base jump. The metaphor is base jumping; you jump a cliff – we had to get our storage over this cliff with very little time to open the parachute and if not, it’s not pretty!”

In April last year Cowling and his team began the race to install additional servers in three locations fast enough to keep up with the flow of data migrating from AWS.

They built a high-performance network which allowed them to transfer data at a peak rate of over half a terabit per second. At the same time, they were scrambling to get racks into data centres quickly enough.

“We were bringing up 30 or 40 racks of hardware every day. I knew how many racks could fit in the loading dock at any given time,” says Cowling.

On two consecutive days, trucks carrying racks crashed. Despite this, plus network outages and hardware errors, they hit the deadline with a month to spare.

'Optimise the hell out of it'

The resulting system holds more than 90 per cent of customer data (the company still employs AWS for a significant portion of its global infrastructure) and is “three to five times faster against all the latency percentiles we track right now” says Cowling.

It’s also extremely reliable: “The system was built with so many safeguards and so much redundancy. We can lose the entire east coast and still serve data because we have an entire copy elsewhere and that’s very much in the design. We can lose racks and rows and entire data centres and keep running.”

Cowling is now looking to eke out every efficiency and further improve performance for Dropbox’s 500 million users.

He’s focused on the storage of cold, less frequently used data. He’s also exploring how to improve performance and reduce latency for users that are based further away from storage locations. Plus there are a slew of new products that will put extra demand on the infrastructure.

“There’s no end in sight,” says Cowling. “It doesn’t stop, the game keeps going. It’s like giving birth to a child – you’ve got to raise the child.”

Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags Cloudcloud computinginfrastructurestoragehardwaresoftware developmentdropboxpythonAmazon Web ServicesAWSGoRust

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.
George Nott

George Nott

Show Comments

Cool Tech

SanDisk MicroSDXC™ for Nintendo® Switch™

Learn more >

Breitling Superocean Heritage Chronographe 44

Learn more >

Toys for Boys

Family Friendly

Panasonic 4K UHD Blu-Ray Player and Full HD Recorder with Netflix - UBT1GL-K

Learn more >

Stocking Stuffer

Razer DeathAdder Expert Ergonomic Gaming Mouse

Learn more >

Christmas Gift Guide

Click for more ›

Most Popular Reviews

Latest Articles


PCW Evaluation Team

Edwina Hargreaves

WD My Cloud Home

I would recommend this device for families and small businesses who want one safe place to store all their important digital content and a way to easily share it with friends, family, business partners, or customers.

Walid Mikhael

Brother QL-820NWB Professional Label Printer

It’s easy to set up, it’s compact and quiet when printing and to top if off, the print quality is excellent. This is hands down the best printer I’ve used for printing labels.

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?