MP3s in molecules – how our data could be stored in the future

Chemistry meets computing as firms look to DNA for their long-term data backup needs

Advertisement

Page 1 of 2 MP3s in molecules – how our data could be stored in the future

Data backup is about to become a problem. The shelf life of a recorded CD or DVD can be up to 25 years, although anyone who has tried to access documents archived to disc only five years ago knows this isn’t always the case. The exposed surface of the disc is susceptible to scratches, despite advances in the polymer coating designed to protect it, and the phenomenon of “disc rot” means chemical changes within the disc cause it to become unreadable as the reflective aluminium layer corrodes (the M-disc, an attempt to address this with a Blu-ray-compatible disc made from carbon, claims a 1,000-year life, but this is difficult to test, since the technology has only been around for 16 years). Magnetic tape, still popular with businesses as a backup solution, can last 30 years, while spinning hard drives last about five years, flash drives up to ten.

Capacity is also an issue. Sony’s largest magnetic tape can hold 185TB of data, but a Blu-ray-based backup system will be archiving significantly less information. If your business churns out lots of data that you absolutely have to keep, that’s going to mean a lot of expense and a very large cupboard. What’s needed is a storage medium that can hold many, many terabytes of data and last an extremely long time. If that medium were also small, that would be a bonus.

"Sony’s largest magnetic tape can hold 185TB of data, but a Blu-ray-based backup system will be archiving significantly less information."

Enter deoxyribonucleic acid, DNA, the molecule that is at the heart of all our cells and is to blame for your nose. This complex biological data carrier, the structure of which was discovered in 1953 by Rosalind Franklin, Maurice Wilkins, Francis Crick and Jim Watson, is being put to new uses as we learn how to manipulate it – and one of these uses is long-term data storage.

Last month, Microsoft bought ten million strands of synthetic DNA for use in data storage experiments, hoping to encode binary data using the four “bases” – C, T,  A and G – that make up the molecule. The company turned to biotech startup Twist BioScience, which creates synthetic DNA at its San Francisco base. Don’t think of it as an enormous manufacturing plant with bubbling vats of chemicals, however: the machine Twist uses to create its synthetic DNA is only about as big as two telephone boxes.

“The chemistry of synthesising DNA is very well known,” says Emily Leproust, Twist’s CEO and co-founder. “It was first published in 1982 by a professor called Marvin Caruthers from the University of Colorado. The chemistry is extremely efficient and so good that it’s still the same process that everybody uses today.

“What Marvin and his team did is show how to make one piece of DNA, but at Twist we don’t need one piece or one sequence – we need thousands of them. In the case of Microsoft, they wanted 10 million different, unique pieces of DNA. So, what we’ve done is build an engineering system to enable us to fabricate millions of pieces of DNA at the same time.

“You have to control the chemistry, so in one place you’re making an A, in another a C, and so on, all in parallel on the medium. The medium we’re using is silicon, and the wafers we’ve developed have 10,000 wells on them. In each of those wells we have another 100 locations, and in each of those locations we make one unique molecule.”

To get an idea of the scale, a DNA strand is 2.5 nanometres in diameter, while Twist’s silicon wells are 600 nanometres across. A sheet of paper is about 100,000 nanometres thick.

Continues on Page 2

Page 1 of 2 MP3s in molecules – how our data could be stored in the future

Read more about: