Scientific European (SCIEU)
Recent Articles
Monthly Popular Science Magazine
 294 views

DNA as a medium to store vast computer data: a reality very soon?

A breakthrough study takes significant step forward in the quest to develop a DNA-based storage
system for digital data.


 

scieu

Digital data is growing at an exponential rate today because of our dependency on gadgets and it requires robust long-term storage. It is safe to say that data storage is slowly becoming challenging because current digital technology is not able to provide a solution. An example being that more digital data has been created in the past two years than in all of history of computers, in fact 2.5 quintillion byte {1 quintillion byte = 2,500,000 Terabytes (TB) = 2,500,000,000 Gigabytes (GB)} of data is being created every day in the world. This includes data on social networking sites, online banking transactions, records of companies and organization, data from satellites, surveillance, research, development etc. This data is huge and unstructured. Therefore, it is now a big challenge to tackle huge storage requirements for data and its exponential growth, especially for organizations and corporations who require robust long-term storage.

The options available currently are hard disk, optical disks (CDs), memory sticks, flash drives, and the more advanced tapes drive or optical BluRay discs which store roughly up to 10 Terabytes (TB) of data. Such storage devices though being used commonly have some obvious disadvantages. Firstly, they have a low-to-medium shelf life and they need to be stored under ideal temperature and humidity conditions to be able to last many decades and thus require specially designed physical storage space. Almost all of these consume a lot of power, are bulky and impractical and can be damaged in a simple fall. Some of them are very expensive, are often plagued with data error and thus are not robust enough. An option which has been universally accepted by organization is called cloud computing – an arrangement in which a company basically hires an “outside” server for handling all its IT and data storage requirements, refereed to as the “cloud”. One of the primary disadvantages of cloud computing are security and privacy issues and vulnerability to attack by hackers. There are also other issues like high costs involved and limited control by the parent organization not to mention platform dependency. Nevertheless, cloud computing is still seen a good alternative for long-term storage. However, it looks like the digital information being generated worldwide is certainly overtaking our ability to store it and even more robust solutions are needed to cater to this data deluge while providing scalability to take into account the future storage needs as well.

 

Can DNA help in computer storage?

Our DNA (Deoxyribonucleic acid) is being considered as an exciting alternative medium for digital data storage. DNA is the self-replicating material present in nearly all living organisms and is what constitutes our genetic information. An artificial or synthetic DNA is a durable material which can be made using commercially available oligonucleotide synthesis machines. A primary benefit of DNA is its longevity as a DNA lasts 1000 times longer than silicon (silicon-chip – the material used for building computers). Amazingly, just a single cubic millimetre of DNA can hold a quintillion of bytes of data! DNA is also an ultracompact material which never degrades and can be stored in a cool, dry place for hundreds of centuries. This idea of using DNA for storage has been around for a long time, in fact way back to 1994. The main reason is the similar fashion in which information is being stored in a computer and in our DNA – since both store the blueprints of information. A computer stores all data as 0s and 1s and DNA stores all data of a living organism using the four bases – thymine (T), guanine (G), adenine (A) and cytosine (C). Therefore, DNA could be called a standard storage device, just like a computer, if these bases can be represented as 0s (bases A and C) and 1s (bases T and G). DNA is quite tough and long-lasting, the simplest reflection being that our genetic code – the blueprint of all our information stored in DNA- is efficiently transmitted from one generation to next in a repeated manner. All software and hardware giants are keen on using synthetic DNA for storing vast amounts to achieve their goal of solving long-term archival of data. The idea is to first convert the computer code 0s and 1s into the DNA code (A, C, T, G), the converted DNA code is then used to produce synthetic strands of DNA which can then be put into cold storage. Whenever required, DNA strands can be removed from cold storage and their information decoded using DNA sequencing machine and DNA sequence is finally translated back to binary computer format of 1s and 0s to be read on the computer.

It’s been shown1 that just a few grams of DNA can store quintillion byte of data and keep it intact for up to 2000 years. However, this simple understanding has faced some challenges. Firstly, it is quite expensive and also painfully slow to write data to DNA i.e. the actual conversion of 0s and 1s to the DNA bases (A, T, C, G). Secondly, once the data is “written” onto the DNA, it is challenging to find and retrieve files and requires a technique called DNA sequencing – a process of determining the precise order of bases within a DNA molecule -after which the data is decoded back to 0s and 1s.

A recent breakthrough2 by scientists from Microsoft Research and the University of Washington have achieved a “random access” on DNA storage. The “random access” aspect is very important because it means that information can be transferred to or from place (generally a memory) in which every location, no matter where in the sequence, can be accessed directly. Therefore, using this technique of random access, files can be retrieved from DNA storage in a selective manner as compared to earlier, when such a retrieval required the need to sequence and decode an entire DNA dataset to find and extract the few files one wanted. The importance of “random access” is further elevated when the amount of data increases and becomes huge. “random access” basically reduces the amount of sequencing that needs to be done. It is for the first time ever the random access has been shown at such a large scale. Researchers have also developed an algorithm for decoding and restoring data more efficiently with more tolerance to data errors making the sequencing procedure also faster. More than 13 million synthetic DNA oligonucleotides were encoded in this study which was data of 200MB size consisting of 35 files (containing video, audio, images and text) ranging in size from 29kB to 44MB. These files were retrieved individually with no errors. Also, authors have devised new algorithms which are more robust and error tolerant in writing and reading the DNA sequences. This study published in Nature Biotechnology, in a major advancement which shows a viable, large-scale system for DNA storage and retrieval.

DNA storage system looks very appealing because it is having high data density, high stability and is very easy to store but it obviously has many challenges before it can be universally adopted. Few factors are time and labour-intensive decoding of the DNA (the sequencing) and also synthesis of DNA. The technique does require more accuracy and broader coverage. Even though advances have been made in this area, the exact format in which data will be stored in the long-term as DNA is still evolving. Microsoft has vowed to improve the production of synthetic DNA and address the various challenges to design a fully operational DNA storage system by 2020.

 

Source:

  1. Yaniv Erlich and Dina Zielinski. DNA Fountain enables a robust and efficient storage architecture. Science, 2017; 355 (6328): 950 DOI: 10.1126/science.aaj2038
  2. Lee Organick et al., 2018, ‘Random access in large-scale DNA data storage’, Nature Biotechnology, vol. 36, pp.242–248, DOI:10.1038/nbt.4079

Download the full issue containing this article here

Subscribe to Scientific European here