One day, more than your genes will be stored in DNA
Work at the New York Genome Centre represents a big step towards DNA-based information storage. Andrew Masterson reports.
As anyone old enough to remember video tapes will testify, the problem with information storage is that after a while the machines needed to retrieve the data no longer exist.
A pair of New York scientists, however, are working to resolve the issue by adapting a storage medium that has been around for billions of years: DNA.
DNA is, of course, an exceptionally versatile and reliable information storage system. Using just four nucleobases – cytosine, guanine, adenine and thymine – it effectively encapsulates, and replicates, the building and operating instructions for every living thing on earth.
A complete set on chromosomes in a single human cell has been estimated to contain 1.5 Gigabytes worth of data. That data is replicated in every non-sex cell in the body, but if it wasn't – if every cell was able to encode different information – the average human’s total DNA would easily store 150 trillion Gigabytes.
Scientists Yaniv Erlich and Dina Zielinski of the New York Genome Centre have picked up on earlier experimental work on the use of artificially generated DNA as a storage mechanism. Using a coding strategy they dub the DNA Fountain they report successfully using genetic material to store and retrieve a full computer operating system, a movie, and some other files.
In a paper lodged on the pre-print site biorxiv, research awaiting peer review, Erlich and Zielinski note that two bits of information is the maximum storage capacity for a single nucleobase and its support molecules – collectively called a nucleotide. However, once a few biochemical constraints and the need to free up a bit of space to allow indexing are taken into account the maximum storage achieved turned out to be just over 1.8 bits.
The DNA Fountain works by breaking down binary code – the code that stores data on computers and digital discs – into short lengths, and then transfers the information to DNA nucleotides using a computer program that is already widely used to prevent drop-outs in mobile phone and broadcast environments.
The information, once ensconced in the lab-generated DNA, was easily retrieved using existing gene-sequencing technology, after which it was transformed back into usable binary. Erlich and Zielinski successfully recovered all of their data without a single error.
Although DNA has a half-life of some 500-odd years, it still represents a much more robust and archival storage medium than anything else so far invented. (Think about scratched vinyl albums, and now-useless Jaz drive cartridges, for instance.) Furthermore, the scientists note, the storage capacity of DNA is phenomenal. One gram of the stuff could easily store 215 Petabytes of data. A single Petabyte comprises 1,000,000,000,000,000 bytes. That’s a lot of movies.