{"title":"High fault-tolerant DNA image storage system based on VAE.","authors":"Yuyang Lu, Zhihao Zhang, Jing Yang, Cheng Zhang","doi":"10.1109/TNB.2025.3544401","DOIUrl":null,"url":null,"abstract":"<p><p>DNA-based storage has emerged as a promising storage paradigm due to its immense storage potential. However, the error-prone nature of DNA sequencing and synthesis processes limits this potential. Image data is typically compressed before storage, and even a single mismatch can lead to catastrophic error propagation during decompression, rendering the image unrecoverable. To reduce the error rate of DNA storage-based image compression, we have designed a high fault-tolerant DNA image storage system and applied it to image compression for DNA storage. This system achieves significant improvements in both image data compression ratio and resilience through three key innovations: 1) Using a Variational Autoencoder (VAE) to compress the image into uniformly sized latent variable blocks, followed by further compression via Singular Value Decomposition (SVD); 2) Quantizing the floating-point numbers in the latent variable blocks and applying rotational coding to the resulting ternary sequences, effectively ensuring positive constraints on homopolymer run lengths and GC content; 3) Optimizing the error-correction scheme to best recover each type of error by quantizing it back to its original value. Through image scaling, we adjust the compression ratio, and the comparative results of image compression simulations demonstrate the performance of the proposed model, highlighting its superiority in fault tolerance and storage density.</p>","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"PP ","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on NanoBioscience","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1109/TNB.2025.3544401","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
DNA-based storage has emerged as a promising storage paradigm due to its immense storage potential. However, the error-prone nature of DNA sequencing and synthesis processes limits this potential. Image data is typically compressed before storage, and even a single mismatch can lead to catastrophic error propagation during decompression, rendering the image unrecoverable. To reduce the error rate of DNA storage-based image compression, we have designed a high fault-tolerant DNA image storage system and applied it to image compression for DNA storage. This system achieves significant improvements in both image data compression ratio and resilience through three key innovations: 1) Using a Variational Autoencoder (VAE) to compress the image into uniformly sized latent variable blocks, followed by further compression via Singular Value Decomposition (SVD); 2) Quantizing the floating-point numbers in the latent variable blocks and applying rotational coding to the resulting ternary sequences, effectively ensuring positive constraints on homopolymer run lengths and GC content; 3) Optimizing the error-correction scheme to best recover each type of error by quantizing it back to its original value. Through image scaling, we adjust the compression ratio, and the comparative results of image compression simulations demonstrate the performance of the proposed model, highlighting its superiority in fault tolerance and storage density.
期刊介绍:
The IEEE Transactions on NanoBioscience reports on original, innovative and interdisciplinary work on all aspects of molecular systems, cellular systems, and tissues (including molecular electronics). Topics covered in the journal focus on a broad spectrum of aspects, both on foundations and on applications. Specifically, methods and techniques, experimental aspects, design and implementation, instrumentation and laboratory equipment, clinical aspects, hardware and software data acquisition and analysis and computer based modelling are covered (based on traditional or high performance computing - parallel computers or computer networks).