Microsoft, university researchers break DNA data storage record

University of Washington and Microsoft researchers believe they have broken the record for storing and retrieving data in DNA molecules.

Researchers said the impressive part about reaching the 200MB milestone is not just how much data they were able to encode onto synthetic DNA and then decode, it's also the space they were able to store it in.

Once encoded, the data occupied a spot in a test tube "much smaller than the tip of a pencil," Douglas Carmean, the partner architect at Microsoft overseeing the project, said.

The DNA storage also has a half-life of 500 years, even in harsh conditions. The half-life of DNA -- just as with radioactive material -- determines its rate of decay or the length of time it takes half of its strand bonds to break.

Overall, though, this is a huge step forward. "Think of the amount of data in a big data center compressed into a few sugar cubes. Or all the publicly accessible data on the Internet slipped into a shoebox. That is the promise of DNA storage -- once scientists are able to scale the technology and overcome a series of technical hurdles," Microsoft stated in a blog.

The data stored on the molecular DNA included digital versions of works of art, including a high-definition music video by the band OK Go!, the Universal Declaration of Human Rights in more than 100 languages, the top 100 books of Project Guttenberg and the nonprofit Crop Trust's seed database on DNA strands.

DNA is needed as a storage medium because the world's data is growing exponentially and molecular-level storage is vastly more dense than hard drives, solid state drives (SSDs) or even up-and-coming technologies such as phase-change memory.

"Those systems also degrade after a few years or decades, while DNA can reliably preserve information for centuries," the University of Washington (UW) researchers stated in a news release. "DNA is best suited for archival applications, rather than instances where files need to be accessed immediately."

DNA data storage Tara Brown Photography/University of Washington

UW Associate Professor Luis Henrique Ceze, in blue, and research scientist Lee Organick prepare DNA containing digital data for sequencing, which allows them to read and retrieve the original files.

The UW and Microsoft researchers are one of two teams nationwide that have also demonstrated the ability to perform random access of data from a pool of molecules, which they described as a task similar to reassembling one chapter of a story from a library of torn books.

The researchers said they developed "a novel approach" to convert the long strings of ones and zeroes in digital data into the four basic building blocks of DNA sequences -- adenine, guanine, cytosine and thymine -- represented as As, Gs, Cs and Ts.

The digital data is broken down into pieces and stored by synthesizing it as a massive number of tiny DNA molecules, which can be dehydrated and preserved for long-term storage.

While advances in DNA storage rely on techniques pioneered by the biotechnology industry, it also requires lessons learned from information technology. For example, the Microsoft and UW team's encoding approach uses error correction schemes commonly used in computer memory.

"This is an example where we're borrowing something from nature -- DNA -- to store information. But we're using something we know from computers -- how to correct memory errors -- and applying that back to nature," said Luis Henrique Ceze, a UW associate professor of computer science and engineering and the university's principal researcher on the project.

To access the stored data, the researchers encode the equivalent of zip codes and street addresses into the DNA sequences. Polymerase Chain Reaction (PCR) techniques -- commonly used in molecular biology -- help them more easily identify the zip codes they are looking for.

Using DNA sequencing techniques, the researchers can then read the data and convert it back to a video, image or document file by using the street addresses to reorder the data.

Most of the world's data today is stored on magnetic and optical media. Tape technology has recently seen significant density improvements with tape cartridges as large as 185TB, and is the densest form of storage available commercially today, at about 10GB per millimeter (mm). Recent research reported feasibility of optical discs capable of storing 1PB, yielding a density of about 100GB/mm. Despite this improvement, storing zettabytes of data would still take millions of units, and use significant physical space.

DNA National Human Genome Research Institute

A depiction of a DNA double helix.

DNA has a theoretical limit above one exabyte per millimeter, which is eight orders of magnitude denser than tape. DNA-based storage also has the benefit of eternal relevance: As long as there is DNA-based life, there will be strong reasons to read and manipulate DNA, the researchers stated in an April research paper.

According to the ongoing "Digital Universe" study by IDC and EMC, the amount of data is forecast to grow to over 16 zettabytes (ZB) in 2017. The Internet of Things, in large part, will be responsible for doubling digital data every two years, resulting in 44 trillion gigabytes (44ZB) by 2020.

"A significant fraction of this data is in archival form; for example, Facebook recently built an entire data center dedicated to 1 exabyte of cold storage," the scientists stated in their research paper.

Researchers have been experimenting with DNA as a data-storage medium for more than a dozen years, but it has progressed quickly. In 1999, DNA-based storage involved encoding and recovering just a 23-character message.

dna photos data storage Microsoft

In April, Microsoft and GW researchers were able to store these three image files, which were synthesized and sequenced in DNA.

By 2013, scientists from U.K.-based EMBL-European Bioinformatics Institute claimed they'd encoded an MP3 version of Martin Luther King's "I Have a Dream" speech in DNA.

In April, Microsoft and UW researchers released their paper detailing how synthetic DNA could be used as a form of archival storage.

"DNA is an amazing information storage molecule that encodes data about how a living system works. We're repurposing that capacity to store digital data -- pictures, videos, documents," Ceze said. "This is one important example of the potential of borrowing from nature to build better computer systems."

Join the PC World newsletter!

Error: Please check your email address.

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Lucas Mearian

Computerworld (US)
Show Comments

Father’s Day Gift Guide

Most Popular Reviews

Latest News Articles

Resources

GGG Evaluation Team

Kathy Cassidy

STYLISTIC Q702

First impression on unpacking the Q702 test unit was the solid feel and clean, minimalist styling.

Anthony Grifoni

STYLISTIC Q572

For work use, Microsoft Word and Excel programs pre-installed on the device are adequate for preparing short documents.

Steph Mundell

LIFEBOOK UH574

The Fujitsu LifeBook UH574 allowed for great mobility without being obnoxiously heavy or clunky. Its twelve hours of battery life did not disappoint.

Andrew Mitsi

STYLISTIC Q702

The screen was particularly good. It is bright and visible from most angles, however heat is an issue, particularly around the Windows button on the front, and on the back where the battery housing is located.

Simon Harriott

STYLISTIC Q702

My first impression after unboxing the Q702 is that it is a nice looking unit. Styling is somewhat minimalist but very effective. The tablet part, once detached, has a nice weight, and no buttons or switches are located in awkward or intrusive positions.

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?