June 14, 2021

New technique Lets Users Preview Files Stored in DNA Data Storage

Filed under: Computer Hardware,Emerging Tech,Science Related — Suramya @ 7:45 AM

Using DNA for storage is an idea that has been around for a while with the initial idea of DNA storage being postulated by Richard P. Feynman in 1959. It was mostly a theoretical exercise till 1988, when researchers from Harvard and the artist Joe Davis stored an image of an ancient Germanic rune representing life and the female Earth in the DNA sequence of E.coli. After that In November 2016 (Lot more stuff happened between the two dates and you can read it all on the Wiki page), a company called Catalog encoded 144 words from Robert Frost’s famous poem, “The Road Not Taken” into strands of DNA. Pretty soon after that in June 2019, scientists reported that all 16 GB of text from Wikipedia’s English-language version have been encoded into synthetic DNA.

DNA storage has been becoming easier and cheaper as time goes on with more and more companies getting on the bandwagon. Even Microsoft has a DNA Storage Research project. However, even with all the advances so far there is a lot more work required before this becomes stable, cheap and reliable enough to be a commercial product. One of the problems that we faced with the storage in the past was that it wasn’t possible to preview the data stored in DNA. You had to open the entire file if you wanted to know what was in it. Think of trying to browse an image gallery without thumbnails, you would have to open each file to see what it was when trying to find a particular file.

Researchers from North Carolina State University have developed a way to provide previews of a stored data file similar to how a thumbnail works for image files. Basically they used the fact that when files have similar file names then the system will copy pieces of multiple data files. Till now this was a problem but the researchers figured out how to use this behavior to allow them to either open the entire file or a subset.

“The advantage to our technique is that it is more efficient in terms of time and money,” says Kyle Tomek, lead author of a paper on the work and a Ph.D. student at NC State. “If you are not sure which file has the data you want, you don’t have to sequence all of the DNA in all of the potential files. Instead, you can sequence much smaller portions of the DNA files to serve as previews.”

Here’s a quick overview of how this works.

Users “name” their data files by attaching sequences of DNA called primer-binding sequences to the ends of DNA strands that are storing information. To identify and extract a given file, most systems use polymerase chain reaction (PCR). Specifically, they use a small DNA primer that matches the corresponding primer-binding sequence to identify the DNA strands containing the file you want. The system then uses PCR to make lots of copies of the relevant DNA strands, then sequences the entire sample. Because the process makes numerous copies of the targeted DNA strands, the signal of the targeted strands is stronger than the rest of the sample, making it possible to identify the targeted DNA sequence and read the file.

However, one challenge that DNA data storage researchers have grappled with is that if two or more files have similar file names, the PCR will inadvertently copy pieces of multiple data files. As a result, users have to give files very distinct names to avoid getting messy data.

“At some point it occurred to us that we might be able to use these non-specific interactions as a tool, rather than viewing it as a problem,” says Albert Keung, co-corresponding author of a paper on the work and an assistant professor of chemical and biomolecular engineering at NC State.

Specifically, the researchers developed a technique that makes use of similar file names to let them open either an entire file or a specific subset of that file. This works by using a specific naming convention when naming a file and a given subset of the file. They can choose whether to open the entire file, or just the “preview” version, by manipulating several parameters of the PCR process: the temperature, the concentration of DNA in the sample, and the types and concentrations of reagents in the sample.

The new technique is compatible with the DNA Enrichment and Nested Separation (DENSe) system that enables us to make DNA storage systems more scalable. The researchers are looking for industry partners to explore commercial viability. If things work out then maybe in the near future we could start storing data in biological samples (like spit). Although, it does sound gross to be handling spit and other bio matter when searching for saved data.

Source: New Twist on DNA Data Storage Lets Users Preview Stored Files
Paper: Promiscuous molecules for smarter file operations in DNA-based data storage

– Suramya

