{"title":"Machine Learning Recognition of Artificial DNA Sequence with Quantum Tunneling Nanogap Junction.","authors":"Milan Kumar Jena, Sneha Mittal, Biswarup Pathak","doi":"10.1021/acs.jpcb.4c06270","DOIUrl":null,"url":null,"abstract":"<p><p>Artificially synthesized DNA holds significant promise in addressing fundamental biochemical questions and driving advancements in biotechnology, genetics, and DNA digital data storage. Rapid and precise electric identification of these artificial DNA strands is crucial for their effective application. Herein, we present a comprehensive investigation into the electric recognition of eight artificial synthesized DNA (<i>x</i>DNA and <i>y</i>DNA) nucleobases using quantum tunneling transport and machine learning (ML) techniques. By embedding these nucleobases within a solid-state nanogap junction, we calculated their fingerprint transmission and current readouts and also analyzed the influence of electronic coupling and molecular orbital delocalization on these properties. The trained ML model achieved a predictive basecalling accuracy of up to 100% for <i>x</i>DNA nucleobases and 99.80% for <i>y</i>DNA transmission readout data sets. ML explainability study revealed that normalized descriptors have a greater impact on nucleobase prediction than the original transmission function, proving more effective in disentangling overlapping artificial DNA nucleobase signals. Quaternary classification results highlighted higher recognition accuracy for <i>x</i>DNA nucleobases than for <i>y</i>DNA nucleobases. Furthermore, precise calling of complementary, purine, and pyrimidine base pair combinations was demonstrated with high sensitivity and an F1 score. Our findings reveal the feasibility of highly sensitive and precise electrical recognition of artificial DNA nucleobases, which can transform genetic research and spur advancements in genetic data storage, synthetic biology, and diagnostics.</p>","PeriodicalId":60,"journal":{"name":"The Journal of Physical Chemistry B","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Physical Chemistry B","FirstCategoryId":"1","ListUrlMain":"https://doi.org/10.1021/acs.jpcb.4c06270","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Artificially synthesized DNA holds significant promise in addressing fundamental biochemical questions and driving advancements in biotechnology, genetics, and DNA digital data storage. Rapid and precise electric identification of these artificial DNA strands is crucial for their effective application. Herein, we present a comprehensive investigation into the electric recognition of eight artificial synthesized DNA (xDNA and yDNA) nucleobases using quantum tunneling transport and machine learning (ML) techniques. By embedding these nucleobases within a solid-state nanogap junction, we calculated their fingerprint transmission and current readouts and also analyzed the influence of electronic coupling and molecular orbital delocalization on these properties. The trained ML model achieved a predictive basecalling accuracy of up to 100% for xDNA nucleobases and 99.80% for yDNA transmission readout data sets. ML explainability study revealed that normalized descriptors have a greater impact on nucleobase prediction than the original transmission function, proving more effective in disentangling overlapping artificial DNA nucleobase signals. Quaternary classification results highlighted higher recognition accuracy for xDNA nucleobases than for yDNA nucleobases. Furthermore, precise calling of complementary, purine, and pyrimidine base pair combinations was demonstrated with high sensitivity and an F1 score. Our findings reveal the feasibility of highly sensitive and precise electrical recognition of artificial DNA nucleobases, which can transform genetic research and spur advancements in genetic data storage, synthetic biology, and diagnostics.
期刊介绍:
An essential criterion for acceptance of research articles in the journal is that they provide new physical insight. Please refer to the New Physical Insights virtual issue on what constitutes new physical insight. Manuscripts that are essentially reporting data or applications of data are, in general, not suitable for publication in JPC B.