Claudio Arbib, Andrea D'ascenzo, Fabrizio Rossi, Daniele Santoni
{"title":"通过联合控制转录指标优化 DNA 编码序列的整数线性规划模型。","authors":"Claudio Arbib, Andrea D'ascenzo, Fabrizio Rossi, Daniele Santoni","doi":"10.1089/cmb.2023.0166","DOIUrl":null,"url":null,"abstract":"<p><p>\n <b>A <i>Coding DNA Sequence</i> (CDS) is a fraction of DNA whose nucleotides are grouped into consecutive triplets called codons, each one encoding an amino acid. Because most amino acids can be encoded by more than one codon, the same amino acid chain can be obtained by a very large number of different CDSs. These synonymous CDSs show different features that, also depending on the organism the transcript is expressed in, could affect translational efficiency and yield. The identification of optimal CDSs with respect to given transcript indicators is in general a challenging task, but it has been observed in recent literature that integer linear programming (ILP) can be a very flexible and efficient way to achieve it. In this article, we add evidence to this observation by proposing a new ILP model that simultaneously optimizes different well-grounded indicators. With this model, we efficiently find solutions that dominate those returned by six existing codon optimization heuristics.</b>\n </p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"416-428"},"PeriodicalIF":1.4000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Integer Linear Programming Model to Optimize Coding DNA Sequences By Joint Control of Transcript Indicators.\",\"authors\":\"Claudio Arbib, Andrea D'ascenzo, Fabrizio Rossi, Daniele Santoni\",\"doi\":\"10.1089/cmb.2023.0166\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>\\n <b>A <i>Coding DNA Sequence</i> (CDS) is a fraction of DNA whose nucleotides are grouped into consecutive triplets called codons, each one encoding an amino acid. Because most amino acids can be encoded by more than one codon, the same amino acid chain can be obtained by a very large number of different CDSs. These synonymous CDSs show different features that, also depending on the organism the transcript is expressed in, could affect translational efficiency and yield. The identification of optimal CDSs with respect to given transcript indicators is in general a challenging task, but it has been observed in recent literature that integer linear programming (ILP) can be a very flexible and efficient way to achieve it. In this article, we add evidence to this observation by proposing a new ILP model that simultaneously optimizes different well-grounded indicators. With this model, we efficiently find solutions that dominate those returned by six existing codon optimization heuristics.</b>\\n </p>\",\"PeriodicalId\":15526,\"journal\":{\"name\":\"Journal of Computational Biology\",\"volume\":\" \",\"pages\":\"416-428\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1089/cmb.2023.0166\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/4/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1089/cmb.2023.0166","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/4/30 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
摘要
编码 DNA 序列(CDS)是 DNA 的一部分,其核苷酸被组合成连续的三联体,称为密码子,每个密码子编码一种氨基酸。由于大多数氨基酸可由多个密码子编码,因此同一氨基酸链可由大量不同的 CDS 获得。这些同义 CDS 表现出不同的特征,这些特征也取决于转录本表达的生物体,可能会影响翻译效率和产量。一般来说,根据给定的转录本指标确定最佳 CDS 是一项具有挑战性的任务,但最近的文献表明,整数线性规划(ILP)是一种非常灵活和有效的方法。在本文中,我们提出了一个新的 ILP 模型,该模型可同时优化不同的基础指标,从而为这一观点提供了证据。有了这个模型,我们就能高效地找到解决方案,这些解决方案优于现有的六种密码子优化启发式方法。
An Integer Linear Programming Model to Optimize Coding DNA Sequences By Joint Control of Transcript Indicators.
A Coding DNA Sequence (CDS) is a fraction of DNA whose nucleotides are grouped into consecutive triplets called codons, each one encoding an amino acid. Because most amino acids can be encoded by more than one codon, the same amino acid chain can be obtained by a very large number of different CDSs. These synonymous CDSs show different features that, also depending on the organism the transcript is expressed in, could affect translational efficiency and yield. The identification of optimal CDSs with respect to given transcript indicators is in general a challenging task, but it has been observed in recent literature that integer linear programming (ILP) can be a very flexible and efficient way to achieve it. In this article, we add evidence to this observation by proposing a new ILP model that simultaneously optimizes different well-grounded indicators. With this model, we efficiently find solutions that dominate those returned by six existing codon optimization heuristics.
期刊介绍:
Journal of Computational Biology is the leading peer-reviewed journal in computational biology and bioinformatics, publishing in-depth statistical, mathematical, and computational analysis of methods, as well as their practical impact. Available only online, this is an essential journal for scientists and students who want to keep abreast of developments in bioinformatics.
Journal of Computational Biology coverage includes:
-Genomics
-Mathematical modeling and simulation
-Distributed and parallel biological computing
-Designing biological databases
-Pattern matching and pattern detection
-Linking disparate databases and data
-New tools for computational biology
-Relational and object-oriented database technology for bioinformatics
-Biological expert system design and use
-Reasoning by analogy, hypothesis formation, and testing by machine
-Management of biological databases