Over billions of years of evolution of biological life on Earth, a recording mechanism has been developed in living organisms to transfer genetic information from one generation to the next. This is the formation of DNA strands in the form of alternating pairs of combinations of four nitrogenous bases: adenine (A), guanine (G), cytosine (C) and thymine (T). Four coding units are better than two (0 and 1), but this is not the limit, the scientists said and synthesized seven more organic compounds.
Expanding the “alphabet” for encoding data in DNA from 4 to 11 characters will at least double the already incredible capacity of such information storage methods. This approach, by the way, will also increase the speed of data recording in the DNA sequence, which today is considered a serious brake on work in this direction. It is also necessary to understand that current DNA sequencing methods will not be able to detect synthesized nitrogenous bases. Reading them requires new tools and reactions. But these are all solvable problems, say researchers from the University of Illinois at Urban-Champaign.
To decipher DNA, the DNA strand passes through a nanopore in a specially designed protein that can detect individual nitrogenous bases, whether they are natural or synthetic. Machine learning algorithms then decode the information stored inside. Yes, AI is indispensable in this matter, the processes of encoding and decryption are so complex. In the future, as technology advances, things will be much easier.
Today, taking into account the use of only four basic nitrogenous bases for encoding data, up to 215 PB of data can be stored in one gram of DNA. Eleven bases will double this density, and this is not the limit.
“We tried 77 different combinations of 11 nitrogenous bases, and our method was able to distinguish each of them perfectly, ” said Chao Pan, co-author of the study . “The deep learning mechanism used in our method to identify various nucleotides is versatile, which allows us to extend our approach to many other applications.”
A source: New Atlas