首页 > 最新文献

Algorithms for Molecular Biology最新文献

英文 中文
Automated design of dynamic programming schemes for RNA folding with pseudoknots. RNA伪结折叠动态规划方案的自动设计。
IF 1 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-12-01 DOI: 10.1186/s13015-023-00229-z
Bertrand Marchand, Sebastian Will, Sarah J Berkemer, Yann Ponty, Laurent Bulteau

Although RNA secondary structure prediction is a textbook application of dynamic programming (DP) and routine task in RNA structure analysis, it remains challenging whenever pseudoknots come into play. Since the prediction of pseudoknotted structures by minimizing (realistically modelled) energy is NP-hard, specialized algorithms have been proposed for restricted conformation classes that capture the most frequently observed configurations. To achieve good performance, these methods rely on specific and carefully hand-crafted DP schemes. In contrast, we generalize and fully automatize the design of DP pseudoknot prediction algorithms. For this purpose, we formalize the problem of designing DP algorithms for an (infinite) class of conformations, modeled by (a finite number of) fatgraphs, and automatically build DP schemes minimizing their algorithmic complexity. We propose an algorithm for the problem, based on the tree-decomposition of a well-chosen representative structure, which we simplify and reinterpret as a DP scheme. The algorithm is fixed-parameter tractable for the treewidth tw of the fatgraph, and its output represents a [Formula: see text] algorithm (and even possibly [Formula: see text] in simple energy models) for predicting the MFE folding of an RNA of length n. We demonstrate, for the most common pseudoknot classes, that our automatically generated algorithms achieve the same complexities as reported in the literature for hand-crafted schemes. Our framework supports general energy models, partition function computations, recursive substructures and partial folding, and could pave the way for algebraic dynamic programming beyond the context-free case.

虽然RNA二级结构预测是动态规划(DP)的教科书应用和RNA结构分析的常规任务,但每当假结发挥作用时,它仍然具有挑战性。由于通过最小化(实际建模)能量来预测伪结结构是np困难的,因此已经提出了用于捕获最常观察到的构型的受限构象类的专门算法。为了获得良好的性能,这些方法依赖于特定的、精心制作的DP方案。相反,我们推广和完全自动化了DP伪结预测算法的设计。为此,我们形式化了为(无限)类构象设计DP算法的问题,由(有限数量)图形建模,并自动构建最小化其算法复杂性的DP方案。我们提出了一个算法来解决这个问题,基于一个精心选择的代表性结构的树分解,我们将其简化并重新解释为一个DP方案。对于脂肪图的树宽tw,该算法是固定参数可处理的,其输出表示用于预测长度为n的RNA的MFE折叠的[公式:参见文本]算法(甚至可能在简单能量模型中[公式:参见文本])。我们证明,对于最常见的伪结类,我们自动生成的算法实现了与文献中报道的手工方案相同的复杂性。我们的框架支持一般的能量模型、配分函数计算、递归子结构和部分折叠,并且可以为超越上下文无关情况的代数动态规划铺平道路。
{"title":"Automated design of dynamic programming schemes for RNA folding with pseudoknots.","authors":"Bertrand Marchand, Sebastian Will, Sarah J Berkemer, Yann Ponty, Laurent Bulteau","doi":"10.1186/s13015-023-00229-z","DOIUrl":"10.1186/s13015-023-00229-z","url":null,"abstract":"<p><p>Although RNA secondary structure prediction is a textbook application of dynamic programming (DP) and routine task in RNA structure analysis, it remains challenging whenever pseudoknots come into play. Since the prediction of pseudoknotted structures by minimizing (realistically modelled) energy is NP-hard, specialized algorithms have been proposed for restricted conformation classes that capture the most frequently observed configurations. To achieve good performance, these methods rely on specific and carefully hand-crafted DP schemes. In contrast, we generalize and fully automatize the design of DP pseudoknot prediction algorithms. For this purpose, we formalize the problem of designing DP algorithms for an (infinite) class of conformations, modeled by (a finite number of) fatgraphs, and automatically build DP schemes minimizing their algorithmic complexity. We propose an algorithm for the problem, based on the tree-decomposition of a well-chosen representative structure, which we simplify and reinterpret as a DP scheme. The algorithm is fixed-parameter tractable for the treewidth tw of the fatgraph, and its output represents a [Formula: see text] algorithm (and even possibly [Formula: see text] in simple energy models) for predicting the MFE folding of an RNA of length n. We demonstrate, for the most common pseudoknot classes, that our automatically generated algorithms achieve the same complexities as reported in the literature for hand-crafted schemes. Our framework supports general energy models, partition function computations, recursive substructures and partial folding, and could pave the way for algebraic dynamic programming beyond the context-free case.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"18 1","pages":"18"},"PeriodicalIF":1.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691146/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138471179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
New algorithms for structure informed genome rearrangement. 结构信息基因组重排的新算法。
IF 1 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-12-01 DOI: 10.1186/s13015-023-00239-x
Eden Ozeri, Meirav Zehavi, Michal Ziv-Ukelson

We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Block Interchange operations, and a PQ-tree is utilized to guide the allowed operations and to compute their weights. In the first problem, [Formula: see text] ([Formula: see text]), we define the basic structure-informed rearrangement measure. Here, we assume that the gene order members of the gene cluster from which the PQ-tree is constructed are permutations. The PQ-tree representing the gene cluster is ordered such that the series of gene IDs spelled by its leaves is equivalent to that of the reference gene order. Then, a structure-informed genome rearrangement distance is computed between the ordered PQ-tree and the target gene order. The second problem, [Formula: see text] ([Formula: see text]), generalizes [Formula: see text], where the gene order members are not necessarily permutations and the structure informed rearrangement measure is extended to also consider up to [Formula: see text] and [Formula: see text] gene insertion and deletion operations, respectively, when modelling the PQ-tree informed divergence process from the reference gene order to the target gene order. The first algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string and the number of leaves in the tree, and [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively. If one of the penalties of [Formula: see text] is 0, then the algorithm runs in [Formula: see text] time and [Formula: see text] space. The second algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string, m is the number of leaves in the tree, [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively, and allowing up to [Formula: see text] deletions from the tree and up to [Formula: see text] deletions from the string. The third algorithm is intended to reduce the space complexity of the second algorithm. It solves a variant of the problem (where one of the penalties of [Formula: see text] is 0) in [Formula: see text] time and [Formula: see text] space. The algorithm is implemented as a software tool, denoted MEM-Rearrange, and applied to the comparative and evolutionary analysis of 59 chromosomal gene clusters extracted from a dataset of 1487 prokaryotic genomes.

我们定义了完美基因组重排领域的两个新的计算问题,并提出了三种算法来解决它们。该问题建模的重排场景考虑了反转和块交换操作,并使用pq树来指导允许的操作并计算其权重。在第一个问题[公式:见文]([公式:见文])中,我们定义了基本的基于结构的重排度量。在这里,我们假设构建pq树的基因簇的基因顺序成员是排列。表示基因簇的pq树是有序的,其叶子拼写的一系列基因id与参考基因序列相等。然后,计算有序pq树和目标基因序列之间的结构信息基因组重排距离。第二个问题,[公式:见文]([公式:见文]),推广了[公式:见文],其中基因序列成员不一定是排列,并且结构通知重排措施被扩展到分别考虑[公式:见文]和[公式:见文]基因插入和删除操作,当建模pq树通知从参考基因序列到目标基因序列的发散过程时。第一种算法在[公式:见文]时间和[公式:见文]空间中求解[公式:见文],其中[公式:见文]为节点的最大子节点数,n为字符串长度和树中叶子的个数,[公式:见文]和[公式:见文]分别为树中p节点和q节点的个数。如果[Formula: see text]的其中一个惩罚为0,则算法在[Formula: see text]时间和[Formula: see text]空间中运行。第二个算法解决[公式:看到文本][公式:看到文本][公式:看到文本]空间,(公式:看到文本)是儿童的最大数量的节点,n是字符串的长度,m是树中的叶子,[公式:看到文本]和[公式:看到文本]P-nodes和Q-nodes树的数量,分别和允许[公式:看到文本]删除从树上,[公式:看到文本]删除字符串。第三种算法旨在降低第二种算法的空间复杂度。它在[公式:见文本]时间和[公式:见文本]空间中解决了问题的一个变体(其中[公式:见文本]的惩罚之一是0)。该算法作为一个软件工具实现,命名为memm - rearrange,并应用于从1487个原核生物基因组数据集中提取的59个染色体基因簇的比较和进化分析。
{"title":"New algorithms for structure informed genome rearrangement.","authors":"Eden Ozeri, Meirav Zehavi, Michal Ziv-Ukelson","doi":"10.1186/s13015-023-00239-x","DOIUrl":"10.1186/s13015-023-00239-x","url":null,"abstract":"<p><p>We define two new computational problems in the domain of perfect genome rearrangements, and propose three algorithms to solve them. The rearrangement scenarios modeled by the problems consider Reversal and Block Interchange operations, and a PQ-tree is utilized to guide the allowed operations and to compute their weights. In the first problem, [Formula: see text] ([Formula: see text]), we define the basic structure-informed rearrangement measure. Here, we assume that the gene order members of the gene cluster from which the PQ-tree is constructed are permutations. The PQ-tree representing the gene cluster is ordered such that the series of gene IDs spelled by its leaves is equivalent to that of the reference gene order. Then, a structure-informed genome rearrangement distance is computed between the ordered PQ-tree and the target gene order. The second problem, [Formula: see text] ([Formula: see text]), generalizes [Formula: see text], where the gene order members are not necessarily permutations and the structure informed rearrangement measure is extended to also consider up to [Formula: see text] and [Formula: see text] gene insertion and deletion operations, respectively, when modelling the PQ-tree informed divergence process from the reference gene order to the target gene order. The first algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string and the number of leaves in the tree, and [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively. If one of the penalties of [Formula: see text] is 0, then the algorithm runs in [Formula: see text] time and [Formula: see text] space. The second algorithm solves [Formula: see text] in [Formula: see text] time and [Formula: see text] space, where [Formula: see text] is the maximum number of children of a node, n is the length of the string, m is the number of leaves in the tree, [Formula: see text] and [Formula: see text] are the number of P-nodes and Q-nodes in the tree, respectively, and allowing up to [Formula: see text] deletions from the tree and up to [Formula: see text] deletions from the string. The third algorithm is intended to reduce the space complexity of the second algorithm. It solves a variant of the problem (where one of the penalties of [Formula: see text] is 0) in [Formula: see text] time and [Formula: see text] space. The algorithm is implemented as a software tool, denoted MEM-Rearrange, and applied to the comparative and evolutionary analysis of 59 chromosomal gene clusters extracted from a dataset of 1487 prokaryotic genomes.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"18 1","pages":"17"},"PeriodicalIF":1.0,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10691145/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138464177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relative timing information and orthology in evolutionary scenarios. 进化场景中的相对时序信息和正交性。
IF 1 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2023-11-08 DOI: 10.1186/s13015-023-00240-4
David Schaller, Tom Hartmann, Manuel Lafond, Peter F Stadler, Nicolas Wieseke, Marc Hellmuth

Background: Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative timing of the last common ancestors of two extant genes (leaves of T) and the last common ancestors of the two species (leaves of S) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event. The relative timing information of gene and species divergences is captured by three colored graphs that have the extant genes as vertices and the species in which the genes are found as vertex colors: the equal-divergence-time (EDT) graph, the later-divergence-time (LDT) graph and the prior-divergence-time (PDT) graph, which together form an edge partition of the complete graph.

Results: Here we give a complete characterization in terms of informative and forbidden triples that can be read off the three graphs and provide a polynomial time algorithm for constructing an evolutionary scenario that explains the graphs, provided such a scenario exists. While both LDT and PDT graphs are cographs, this is not true for the EDT graph in general. We show that every EDT graph is perfect. While the information about LDT and PDT graphs is necessary to recognize EDT graphs in polynomial-time for general scenarios, this extra information can be dropped in the HGT-free case. However, recognition of EDT graphs without knowledge of putative LDT and PDT graphs is NP-complete for general scenarios. In contrast, PDT graphs can be recognized in polynomial-time. We finally connect the EDT graph to the alternative definitions of orthology that have been proposed for scenarios with horizontal gene transfer. With one exception, the corresponding graphs are shown to be colored cographs.

背景:描述物种集合中基因家族进化的进化场景包括基因树T的顶点到物种树S的顶点和边的映射。两个现存基因(T的叶子)的最后共同祖先和它们所在的两个物种(S的叶子)最后共同祖先的相对时间指示水平基因转移(HGT)和古代复制。另一方面,同源基因对要求它们最后的共同祖先与相应的物种形成事件重合。基因和物种分化的相对时间信息由三个彩色图捕获,这些图以现存基因为顶点,以发现基因的物种为顶点颜色:等分化时间(EDT)图、后分化时间(LDT)图和前分化时间(PDT)图,它们共同形成了完整图的边缘划分。结果:在这里,我们根据可以从三个图中读取的信息和禁止三元组给出了一个完整的刻画,并提供了一个多项式时间算法来构建解释图的进化场景,前提是存在这样的场景。虽然LDT和PDT图都是cograph,但对于EDT图来说,这通常不是真的。我们证明了每个EDT图都是完美的。虽然在一般情况下,关于LDT和PDT图的信息对于在多项式时间内识别EDT图是必要的,但在无HGT的情况下,可以删除这些额外信息。然而,在不知道假定的LDT和PDT图的情况下,对EDT图的识别对于一般情况是NP完全的。相比之下,PDT图可以在多项式时间内识别。最后,我们将EDT图与针对水平基因转移场景提出的矫正学的替代定义联系起来。除了一个例外,相应的图被显示为有色的cograph。
{"title":"Relative timing information and orthology in evolutionary scenarios.","authors":"David Schaller, Tom Hartmann, Manuel Lafond, Peter F Stadler, Nicolas Wieseke, Marc Hellmuth","doi":"10.1186/s13015-023-00240-4","DOIUrl":"10.1186/s13015-023-00240-4","url":null,"abstract":"<p><strong>Background: </strong>Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative timing of the last common ancestors of two extant genes (leaves of T) and the last common ancestors of the two species (leaves of S) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event. The relative timing information of gene and species divergences is captured by three colored graphs that have the extant genes as vertices and the species in which the genes are found as vertex colors: the equal-divergence-time (EDT) graph, the later-divergence-time (LDT) graph and the prior-divergence-time (PDT) graph, which together form an edge partition of the complete graph.</p><p><strong>Results: </strong>Here we give a complete characterization in terms of informative and forbidden triples that can be read off the three graphs and provide a polynomial time algorithm for constructing an evolutionary scenario that explains the graphs, provided such a scenario exists. While both LDT and PDT graphs are cographs, this is not true for the EDT graph in general. We show that every EDT graph is perfect. While the information about LDT and PDT graphs is necessary to recognize EDT graphs in polynomial-time for general scenarios, this extra information can be dropped in the HGT-free case. However, recognition of EDT graphs without knowledge of putative LDT and PDT graphs is NP-complete for general scenarios. In contrast, PDT graphs can be recognized in polynomial-time. We finally connect the EDT graph to the alternative definitions of orthology that have been proposed for scenarios with horizontal gene transfer. With one exception, the corresponding graphs are shown to be colored cographs.</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"18 1","pages":"16"},"PeriodicalIF":1.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634191/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"71523304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Embedding gene trees into phylogenetic networks by conflict resolution algorithms 通过冲突解决算法将基因树嵌入系统发育网络
IF 1 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2022-05-19 DOI: 10.1186/s13015-022-00218-8
Marcin Wawerka, D. Dabkowski, Natalia Rutecka, Agnieszka Mykowiecka, P. Górecki
{"title":"Embedding gene trees into phylogenetic networks by conflict resolution algorithms","authors":"Marcin Wawerka, D. Dabkowski, Natalia Rutecka, Agnieszka Mykowiecka, P. Górecki","doi":"10.1186/s13015-022-00218-8","DOIUrl":"https://doi.org/10.1186/s13015-022-00218-8","url":null,"abstract":"","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"76 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2022-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78686342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Bi-alignments with affine gaps costs 具有仿射间隙的双对齐代价
IF 1 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2022-05-16 DOI: 10.1186/s13015-022-00219-7
Peter F. Stadler, S. Will
{"title":"Bi-alignments with affine gaps costs","authors":"Peter F. Stadler, S. Will","doi":"10.1186/s13015-022-00219-7","DOIUrl":"https://doi.org/10.1186/s13015-022-00219-7","url":null,"abstract":"","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"1 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2022-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82802988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Adding hydrogen atoms to molecular models via fragment superimposition 通过片段叠加将氢原子添加到分子模型中
IF 1 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2022-03-29 DOI: 10.1186/s13015-022-00215-x
Patrick Kunzmann, Jacob Marcel Anter, K. Hamacher
{"title":"Adding hydrogen atoms to molecular models via fragment superimposition","authors":"Patrick Kunzmann, Jacob Marcel Anter, K. Hamacher","doi":"10.1186/s13015-022-00215-x","DOIUrl":"https://doi.org/10.1186/s13015-022-00215-x","url":null,"abstract":"","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"17 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2022-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65741668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Parsimonious Clone Tree Integration in cancer 癌症中的简约克隆树整合
IF 1 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2022-03-14 DOI: 10.1186/s13015-022-00209-9
P. Sashittal, Simone Zaccaria, M. El-Kebir
{"title":"Parsimonious Clone Tree Integration in cancer","authors":"P. Sashittal, Simone Zaccaria, M. El-Kebir","doi":"10.1186/s13015-022-00209-9","DOIUrl":"https://doi.org/10.1186/s13015-022-00209-9","url":null,"abstract":"","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"18 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86681252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Tree diet: reducing the treewidth to unlock FPT algorithms in RNA bioinformatics 树的饮食:减少树的宽度解锁RNA生物信息学中的FPT算法
IF 1 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2021-05-04 DOI: 10.1186/s13015-022-00213-z
Bertrand Marchand, Y. Ponty, L. Bulteau
{"title":"Tree diet: reducing the treewidth to unlock FPT algorithms in RNA bioinformatics","authors":"Bertrand Marchand, Y. Ponty, L. Bulteau","doi":"10.1186/s13015-022-00213-z","DOIUrl":"https://doi.org/10.1186/s13015-022-00213-z","url":null,"abstract":"","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"17 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2021-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65742120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics. 带相似性矩阵的邻接约束层次聚类及其在基因组学中的应用。
IF 1 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2019-11-15 eCollection Date: 2019-01-01 DOI: 10.1186/s13015-019-0157-4
Christophe Ambroise, Alia Dehman, Pierre Neuvial, Guillem Rigaill, Nathalie Vialaneix

Background: Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-resolution, locus-level measurements. An intuitive way of doing this is to perform a modified Hierarchical Agglomerative Clustering (HAC), where only adjacent clusters (according to the ordering of positions within a chromosome) are allowed to be merged. But a major practical drawback of this method is its quadratic time and space complexity in the number of loci, which is typically of the order of 10 4 to 10 5 for each chromosome.

Results: By assuming that the similarity between physically distant objects is negligible, we are able to propose an implementation of adjacency-constrained HAC with quasi-linear complexity. This is achieved by pre-calculating specific sums of similarities, and storing candidate fusions in a min-heap. Our illustrations on GWAS and Hi-C datasets demonstrate the relevance of this assumption, and show that this method highlights biologically meaningful signals. Thanks to its small time and memory footprint, the method can be run on a standard laptop in minutes or even seconds.

Availability and implementation: Software and sample data are available as an R package, adjclust, that can be downloaded from the Comprehensive R Archive Network (CRAN).

背景:基因组数据分析,如全基因组关联研究(GWAS)或Hi-C研究,经常面临基于高分辨率基因座水平测量的相似矩阵将染色体划分为连续区域的问题。一种直观的方法是执行修改的层次聚集聚类(HAC),其中只允许合并相邻的聚类(根据染色体内位置的排序)。但这种方法的一个主要实际缺点是其基因座数量的二次型时间和空间复杂性,每个染色体的基因座数量通常在104到105的数量级。结果:通过假设物理距离遥远的对象之间的相似性可以忽略不计,我们能够提出一种具有拟线性复杂度的邻接约束HAC的实现。这是通过预先计算特定的相似性总和,并将候选融合存储在最小堆中来实现的。我们在GWAS和Hi-C数据集上的插图证明了这一假设的相关性,并表明这种方法突出了具有生物学意义的信号。由于其占用的时间和内存较小,该方法可以在标准笔记本电脑上运行几分钟甚至几秒钟。可用性和实施:软件和样本数据以R包adjcluster的形式提供,可从综合R档案网络(CRAN)下载。
{"title":"Adjacency-constrained hierarchical clustering of a band similarity matrix with application to genomics.","authors":"Christophe Ambroise,&nbsp;Alia Dehman,&nbsp;Pierre Neuvial,&nbsp;Guillem Rigaill,&nbsp;Nathalie Vialaneix","doi":"10.1186/s13015-019-0157-4","DOIUrl":"https://doi.org/10.1186/s13015-019-0157-4","url":null,"abstract":"<p><strong>Background: </strong>Genomic data analyses such as Genome-Wide Association Studies (GWAS) or Hi-C studies are often faced with the problem of partitioning chromosomes into successive regions based on a similarity matrix of high-resolution, locus-level measurements. An intuitive way of doing this is to perform a modified Hierarchical Agglomerative Clustering (HAC), where only adjacent clusters (according to the ordering of positions within a chromosome) are allowed to be merged. But a major practical drawback of this method is its quadratic time and space complexity in the number of loci, which is typically of the order of <math><msup><mn>10</mn> <mn>4</mn></msup> </math> to <math><msup><mn>10</mn> <mn>5</mn></msup> </math> for each chromosome.</p><p><strong>Results: </strong>By assuming that the similarity between physically distant objects is negligible, we are able to propose an implementation of adjacency-constrained HAC with quasi-linear complexity. This is achieved by pre-calculating specific sums of similarities, and storing candidate fusions in a min-heap. Our illustrations on GWAS and Hi-C datasets demonstrate the relevance of this assumption, and show that this method highlights biologically meaningful signals. Thanks to its small time and memory footprint, the method can be run on a standard laptop in minutes or even seconds.</p><p><strong>Availability and implementation: </strong>Software and sample data are available as an R package, <b>adjclust</b>, that can be downloaded from the Comprehensive R Archive Network (CRAN).</p>","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"14 ","pages":"22"},"PeriodicalIF":1.0,"publicationDate":"2019-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-019-0157-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49684571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Using a constraint-based regression method for relative quantification of somatic mutations in pyrosequencing signals: a case for NRAS analysis 使用基于约束的回归方法对焦磷酸测序信号中的体细胞突变进行相对量化:NRAS分析的一个案例
IF 1 4区 生物学 Q4 BIOCHEMICAL RESEARCH METHODS Pub Date : 2016-09-15 DOI: 10.1186/s13015-016-0086-4
J. Ambroise, Jamal Badir, Louise Nienhaus, Annie Robert, A. Dekairelle, J. Gala
{"title":"Using a constraint-based regression method for relative quantification of somatic mutations in pyrosequencing signals: a case for NRAS analysis","authors":"J. Ambroise, Jamal Badir, Louise Nienhaus, Annie Robert, A. Dekairelle, J. Gala","doi":"10.1186/s13015-016-0086-4","DOIUrl":"https://doi.org/10.1186/s13015-016-0086-4","url":null,"abstract":"","PeriodicalId":50823,"journal":{"name":"Algorithms for Molecular Biology","volume":"11 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2016-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13015-016-0086-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"65742106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Algorithms for Molecular Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1