FrameD: framework for DNA-based data storage design, verification, and validation.

IF 4.4 3区生物学 Q1 BIOCHEMICAL RESEARCH METHODS Bioinformatics Pub Date : 2023-10-03 DOI:10.1093/bioinformatics/btad572

Kevin D Volkel, Kevin N Lin, Paul W Hook, Winston Timp, Albert J Keung, James M Tuck

{"title":"FrameD: framework for DNA-based data storage design, verification, and validation.","authors":"Kevin D Volkel, Kevin N Lin, Paul W Hook, Winston Timp, Albert J Keung, James M Tuck","doi":"10.1093/bioinformatics/btad572","DOIUrl":null,"url":null,"abstract":"Motivation: DNA-based data storage is a quickly growing field that hopes to harness the massive theoretical information density of DNA molecules to produce a competitive next-generation storage medium suitable for archival data. In recent years, many DNA-based storage system designs have been proposed. Given that no common infrastructure exists for simulating these storage systems, comparing many different designs along with many different error models is increasingly difficult. To address this challenge, we introduce FrameD, a simulation infrastructure for DNA storage systems that leverages the underlying modularity of DNA storage system designs to provide a framework to express different designs while being able to reuse common components.Results: We demonstrate the utility of FrameD and the need for a common simulation platform using a case study. Our case study compares designs that utilize strand copies differently, some that align strand copies using multiple sequence alignment algorithms and others that do not. We found that the choice to include multiple sequence alignment in the pipeline is dependent on the error rate and the type of errors being injected and is not always beneficial. In addition to supporting a wide range of designs, FrameD provides the user with transparent parallelism to deal with a large number of reads from sequencing and the need for many fault injection iterations. We believe that FrameD fills a void in the tools publicly available to the DNA storage community by providing a modular and extensible framework with support for massive parallelism. As a result, it will help accelerate the design process of future DNA-based storage systems.Availability and implementation: The source code for FrameD along with the data generated during the demonstration of FrameD is available in a public Github repository at https://github.com/dna-storage/framed, (https://dx.doi.org/10.5281/zenodo.7757762).","PeriodicalId":8903,"journal":{"name":"Bioinformatics","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10563143/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btad572","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation: DNA-based data storage is a quickly growing field that hopes to harness the massive theoretical information density of DNA molecules to produce a competitive next-generation storage medium suitable for archival data. In recent years, many DNA-based storage system designs have been proposed. Given that no common infrastructure exists for simulating these storage systems, comparing many different designs along with many different error models is increasingly difficult. To address this challenge, we introduce FrameD, a simulation infrastructure for DNA storage systems that leverages the underlying modularity of DNA storage system designs to provide a framework to express different designs while being able to reuse common components.

Results: We demonstrate the utility of FrameD and the need for a common simulation platform using a case study. Our case study compares designs that utilize strand copies differently, some that align strand copies using multiple sequence alignment algorithms and others that do not. We found that the choice to include multiple sequence alignment in the pipeline is dependent on the error rate and the type of errors being injected and is not always beneficial. In addition to supporting a wide range of designs, FrameD provides the user with transparent parallelism to deal with a large number of reads from sequencing and the need for many fault injection iterations. We believe that FrameD fills a void in the tools publicly available to the DNA storage community by providing a modular and extensible framework with support for massive parallelism. As a result, it will help accelerate the design process of future DNA-based storage systems.

Availability and implementation: The source code for FrameD along with the data generated during the demonstration of FrameD is available in a public Github repository at https://github.com/dna-storage/framed, (https://dx.doi.org/10.5281/zenodo.7757762).

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FrameD：基于DNA的数据存储设计、验证和验证框架。

动机：基于DNA的数据存储是一个快速发展的领域，希望利用DNA分子的巨大理论信息密度，生产出一种具有竞争力的适用于档案数据的下一代存储介质。近年来，已经提出了许多基于DNA的存储系统设计。由于不存在用于模拟这些存储系统的通用基础架构，因此比较许多不同的设计以及许多不同的错误模型变得越来越困难。为了应对这一挑战，我们引入了FrameD，这是一种用于DNA存储系统的模拟基础设施，它利用DNA存储系统设计的底层模块性，提供了一个框架来表达不同的设计，同时能够重用通用组件。结果：我们通过案例研究证明了FrameD的实用性和对通用仿真平台的需求。我们的案例研究比较了以不同方式使用链拷贝的设计，有些使用多个序列比对算法比对链拷贝，有些则不使用。我们发现，在管道中包括多序列比对的选择取决于错误率和注入的错误类型，并不总是有益的。除了支持广泛的设计外，FrameD还为用户提供了透明的并行性，以处理来自测序的大量读取以及许多故障注入迭代的需要。我们相信，FrameD通过提供一个模块化和可扩展的框架，支持大规模并行性，填补了DNA存储社区公开可用工具的空白。因此，它将有助于加快未来基于DNA的存储系统的设计过程。可用性和实现：FrameD的源代码以及在FrameD演示过程中生成的数据可在公共Github存储库中获得，网址为https://github.com/dna-storage/framed(https://dx.doi.org/10.5281/zenodo.7757762)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Bioinformatics 生物-生化研究方法

CiteScore

11.20

自引率

5.20%

发文量

753

审稿时长

2.1 months

期刊介绍： The leading journal in its field, Bioinformatics publishes the highest quality scientific papers and review articles of interest to academic and industrial researchers. Its main focus is on new developments in genome bioinformatics and computational biology. Two distinct sections within the journal - Discovery Notes and Application Notes- focus on shorter papers; the former reporting biologically interesting discoveries using computational methods, the latter exploring the applications used for experiments.