首页 > 最新文献

Journal of Bioinformatics and Computational Biology最新文献

英文 中文
Mining sponge phenomena in RNA expression data. 挖掘RNA表达数据中的海绵现象。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-02-01 Epub Date: 2021-11-18 DOI: 10.1142/S0219720021500220
Fabrizio Angiulli, Teresa Colombo, Fabio Fassetti, Angelo Furfaro, Paola Paci

In the last few years, the interactions among competing endogenous RNAs (ceRNAs) have been recognized as a key post-transcriptional regulatory mechanism in cell differentiation, tissue development, and disease. Notably, such sponge phenomena substracting active microRNAs from their silencing targets have been recognized as having a potential oncosuppressive, or oncogenic, role in several cancer types. Hence, the ability to predict sponges from the analysis of large expression data sets (e.g. from international cancer projects) has become an important data mining task in bioinformatics. We present a technique designed to mine sponge phenomena whose presence or absence may discriminate between healthy and unhealthy populations of samples in tumoral or normal expression data sets, thus providing lists of candidates potentially relevant in the pathology. With this aim, we search for pairs of elements acting as ceRNA for a given miRNA, namely, we aim at discovering miRNA-RNA pairs involved in phenomena which are clearly present in one population and almost absent in the other one. The results on tumoral expression data, concerning five different cancer types, confirmed the effectiveness of the approach in mining interesting knowledge. Indeed, 32 out of 33 miRNAs and 22 out of 25 protein-coding genes identified as top scoring in our analysis are corroborated by having been similarly associated with cancer processes in independent studies. In fact, the subset of miRNAs selected by the sponge analysis results in a significant enrichment of annotation for the KEGG32 pathway "microRNAs in cancer" when tested with the commonly used bioinformatic resource DAVID. Moreover, often the cancer datasets where our sponge analysis identified a miRNA as top scoring match the one reported already in the pertaining literature.

在过去的几年里,竞争内源性rna (ceRNAs)之间的相互作用被认为是细胞分化、组织发育和疾病的关键转录后调控机制。值得注意的是,这种从其沉默靶标中减去活性microrna的海绵现象已被认为在几种癌症类型中具有潜在的抑癌或致癌作用。因此,从大型表达数据集(例如来自国际癌症项目)的分析中预测海绵的能力已成为生物信息学中重要的数据挖掘任务。我们提出了一种旨在挖掘海绵现象的技术,这些现象的存在或不存在可能区分肿瘤或正常表达数据集中健康和不健康的样本群体,从而提供潜在相关病理的候选列表。为此,我们寻找作为特定miRNA的ceRNA的元素对,也就是说,我们的目标是发现在一个群体中明显存在而在另一个群体中几乎不存在的现象所涉及的miRNA- rna对。涉及五种不同癌症类型的肿瘤表达数据的结果证实了该方法在挖掘有趣知识方面的有效性。事实上,在我们的分析中,33个mirna中的32个和25个蛋白质编码基因中的22个被鉴定为得分最高,在独立研究中与癌症过程有着相似的关联。事实上,当使用常用的生物信息学资源DAVID进行测试时,海绵分析选择的miRNAs亚群结果显著丰富了KEGG32通路“癌症中的microRNAs”注释。此外,我们的海绵分析确定的最高得分的miRNA通常与相关文献中已经报道的miRNA相匹配。
{"title":"Mining sponge phenomena in RNA expression data.","authors":"Fabrizio Angiulli,&nbsp;Teresa Colombo,&nbsp;Fabio Fassetti,&nbsp;Angelo Furfaro,&nbsp;Paola Paci","doi":"10.1142/S0219720021500220","DOIUrl":"https://doi.org/10.1142/S0219720021500220","url":null,"abstract":"<p><p>In the last few years, the interactions among competing endogenous RNAs (ceRNAs) have been recognized as a key post-transcriptional regulatory mechanism in cell differentiation, tissue development, and disease. Notably, such sponge phenomena substracting active microRNAs from their silencing targets have been recognized as having a potential oncosuppressive, or oncogenic, role in several cancer types. Hence, the ability to predict sponges from the analysis of large expression data sets (e.g. from international cancer projects) has become an important data mining task in bioinformatics. We present a technique designed to mine sponge phenomena whose presence or absence may discriminate between healthy and unhealthy populations of samples in tumoral or normal expression data sets, thus providing lists of candidates potentially relevant in the pathology. With this aim, we search for pairs of elements acting as ceRNA for a given miRNA, namely, we aim at discovering miRNA-RNA pairs involved in phenomena which are clearly present in one population and almost absent in the other one. The results on tumoral expression data, concerning five different cancer types, confirmed the effectiveness of the approach in mining interesting knowledge. Indeed, 32 out of 33 miRNAs and 22 out of 25 protein-coding genes identified as top scoring in our analysis are corroborated by having been similarly associated with cancer processes in independent studies. In fact, the subset of miRNAs selected by the sponge analysis results in a significant enrichment of annotation for the KEGG32 pathway \"microRNAs in cancer\" when tested with the commonly used bioinformatic resource DAVID. Moreover, often the cancer datasets where our sponge analysis identified a miRNA as top scoring match the one reported already in the pertaining literature.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39636898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
O-glycosylation site prediction for Homo sapiens by combining properties and sequence features with support vector machine. 基于属性与序列特征结合的智人o -糖基化位点预测。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-02-01 Epub Date: 2021-11-19 DOI: 10.1142/S0219720021500293
Yan Zhu, Shuwan Yin, Jia Zheng, Yixia Shi, Cangzhi Jia

O-glycosylation is a protein posttranslational modification important in regulating almost all cells. It is related to a large number of physiological and pathological phenomena. Recognizing O-glycosylation sites is the key to further investigating the molecular mechanism of protein posttranslational modification. This study aimed to collect a reliable dataset on Homo sapiens and develop an O-glycosylation predictor for Homo sapiens, named Captor, through multiple features. A random undersampling method and a synthetic minority oversampling technique were employed to deal with imbalanced data. In addition, the Kruskal-Wallis (K-W) test was adopted to optimize feature vectors and improve the performance of the model. A support vector machine, due to its optimal performance, was used to train and optimize the final prediction model after a comprehensive comparison of various classifiers in traditional machine learning methods and deep learning. On the independent test set, Captor outperformed the existing O-glycosylation tool, suggesting that Captor could provide more instructive guidance for further experimental research on O-glycosylation. The source code and datasets are available at https://github.com/YanZhu06/Captor/.

o -糖基化是一种蛋白质翻译后修饰,对几乎所有细胞的调节都很重要。它与大量的生理和病理现象有关。识别o糖基化位点是进一步研究蛋白质翻译后修饰分子机制的关键。本研究旨在收集可靠的智人数据集,并通过多个特征开发智人o -糖基化预测器,命名为Captor。采用随机欠抽样和综合少数过抽样技术处理不平衡数据。此外,采用Kruskal-Wallis (K-W)检验优化特征向量,提高模型的性能。综合比较传统机器学习方法和深度学习中的各种分类器后,利用支持向量机的最优性能对最终的预测模型进行训练和优化。在独立测试集上,Captor优于现有的o -糖基化工具,这表明Captor可以为进一步的o -糖基化实验研究提供更具指导性的指导。源代码和数据集可从https://github.com/YanZhu06/Captor/获得。
{"title":"O-glycosylation site prediction for <i>Homo sapiens</i> by combining properties and sequence features with support vector machine.","authors":"Yan Zhu,&nbsp;Shuwan Yin,&nbsp;Jia Zheng,&nbsp;Yixia Shi,&nbsp;Cangzhi Jia","doi":"10.1142/S0219720021500293","DOIUrl":"https://doi.org/10.1142/S0219720021500293","url":null,"abstract":"<p><p>O-glycosylation is a protein posttranslational modification important in regulating almost all cells. It is related to a large number of physiological and pathological phenomena. Recognizing O-glycosylation sites is the key to further investigating the molecular mechanism of protein posttranslational modification. This study aimed to collect a reliable dataset on <i>Homo sapiens</i> and develop an O-glycosylation predictor for <i>Homo sapiens</i>, named <b>Captor</b>, through multiple features. A random undersampling method and a synthetic minority oversampling technique were employed to deal with imbalanced data. In addition, the Kruskal-Wallis (K-W) test was adopted to optimize feature vectors and improve the performance of the model. A support vector machine, due to its optimal performance, was used to train and optimize the final prediction model after a comprehensive comparison of various classifiers in traditional machine learning methods and deep learning. On the independent test set, <b>Captor</b> outperformed the existing O-glycosylation tool, suggesting that <b>Captor</b> could provide more instructive guidance for further experimental research on O-glycosylation. The source code and datasets are available at https://github.com/YanZhu06/Captor/.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39645905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Amino acid environment affinity model based on graph attention network. 基于图关注网络的氨基酸环境亲和模型。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-02-01 Epub Date: 2021-11-13 DOI: 10.1142/S0219720021500323
Xueheng Tong, Shuqi Liu, Jiawei Gu, Chunguo Wu, Yanchun Liang, Xiaohu Shi

Proteins are engines involved in almost all functions of life. They have specific spatial structures formed by twisting and folding of one or more polypeptide chains composed of amino acids. Protein sites are protein structure microenvironments that can be identified by three-dimensional locations and local neighborhoods in which the structure or function exists. Understanding the amino acid environment affinity is essential for additional protein structural or functional studies, such as mutation analysis and functional site detection. In this study, an amino acid environment affinity model based on the graph attention network was developed. Initially, we constructed a protein graph according to the distance between amino acid pairs. Then, we extracted a set of structural features for each node. Finally, the protein graph and the associated node feature set were set to input the graph attention network model and to obtain the amino acid affinities. Numerical results show that our proposed method significantly outperforms a recent 3DCNN-based method by almost 30%.

蛋白质是几乎所有生命功能的引擎。它们具有由一条或多条氨基酸组成的多肽链扭曲和折叠而形成的特定空间结构。蛋白质位点是蛋白质结构的微环境,可以通过结构或功能存在的三维位置和局部邻域来识别。了解氨基酸环境亲和性对于其他蛋白质结构或功能研究至关重要,例如突变分析和功能位点检测。本文提出了一种基于图注意网络的氨基酸环境亲和性模型。首先,我们根据氨基酸对之间的距离构造了一个蛋白质图。然后,我们为每个节点提取一组结构特征。最后,设置蛋白质图和关联节点特征集,输入图关注网络模型,获得氨基酸亲和度。数值结果表明,我们提出的方法明显优于最近基于3dcnn的方法近30%。
{"title":"Amino acid environment affinity model based on graph attention network.","authors":"Xueheng Tong,&nbsp;Shuqi Liu,&nbsp;Jiawei Gu,&nbsp;Chunguo Wu,&nbsp;Yanchun Liang,&nbsp;Xiaohu Shi","doi":"10.1142/S0219720021500323","DOIUrl":"https://doi.org/10.1142/S0219720021500323","url":null,"abstract":"<p><p>Proteins are engines involved in almost all functions of life. They have specific spatial structures formed by twisting and folding of one or more polypeptide chains composed of amino acids. Protein sites are protein structure microenvironments that can be identified by three-dimensional locations and local neighborhoods in which the structure or function exists. Understanding the amino acid environment affinity is essential for additional protein structural or functional studies, such as mutation analysis and functional site detection. In this study, an amino acid environment affinity model based on the graph attention network was developed. Initially, we constructed a protein graph according to the distance between amino acid pairs. Then, we extracted a set of structural features for each node. Finally, the protein graph and the associated node feature set were set to input the graph attention network model and to obtain the amino acid affinities. Numerical results show that our proposed method significantly outperforms a recent 3DCNN-based method by almost 30%.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39875262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EdClust: A heuristic sequence clustering method with higher sensitivity. EdClust:一种灵敏度较高的启发式序列聚类方法。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-02-01 Epub Date: 2021-12-23 DOI: 10.1142/S0219720021500360
Ming Cao, Qinke Peng, Ze-Gang Wei, Fei Liu, Yi-Fan Hou

The development of high-throughput technologies has produced increasing amounts of sequence data and an increasing need for efficient clustering algorithms that can process massive volumes of sequencing data for downstream analysis. Heuristic clustering methods are widely applied for sequence clustering because of their low computational complexity. Although numerous heuristic clustering methods have been developed, they suffer from two limitations: overestimation of inferred clusters and low clustering sensitivity. To address these issues, we present a new sequence clustering method (edClust) based on Edlib, a C/C[Formula: see text] library for fast, exact semi-global sequence alignment to group similar sequences. The new method edClust was tested on three large-scale sequence databases, and we compared edClust to several classic heuristic clustering methods, such as UCLUST, CD-HIT, and VSEARCH. Evaluations based on the metrics of cluster number and seed sensitivity (SS) demonstrate that edClust can produce fewer clusters than other methods and that its SS is higher than that of other methods. The source codes of edClust are available from https://github.com/zhang134/EdClust.git under the GNU GPL license.

高通量技术的发展产生了越来越多的序列数据,对高效聚类算法的需求也越来越大,这些算法可以处理大量的测序数据,用于下游分析。启发式聚类方法由于计算复杂度低,在序列聚类中得到了广泛的应用。虽然目前已有许多启发式聚类方法,但它们都存在两个局限性:对推断聚类的过高估计和聚类灵敏度低。为了解决这些问题,我们提出了一种新的序列聚类方法(edClust),该方法基于C/C库Edlib,用于快速,精确的半全局序列对齐以对相似序列进行分组。在三个大型序列数据库上对edClust方法进行了测试,并与UCLUST、CD-HIT和VSEARCH等经典启发式聚类方法进行了比较。基于聚类数和种子敏感性(SS)指标的评价表明,edClust产生的聚类比其他方法少,但SS高于其他方法。edClust的源代码可以在GNU GPL许可下从https://github.com/zhang134/EdClust.git获得。
{"title":"EdClust: A heuristic sequence clustering method with higher sensitivity.","authors":"Ming Cao,&nbsp;Qinke Peng,&nbsp;Ze-Gang Wei,&nbsp;Fei Liu,&nbsp;Yi-Fan Hou","doi":"10.1142/S0219720021500360","DOIUrl":"https://doi.org/10.1142/S0219720021500360","url":null,"abstract":"<p><p>The development of high-throughput technologies has produced increasing amounts of sequence data and an increasing need for efficient clustering algorithms that can process massive volumes of sequencing data for downstream analysis. Heuristic clustering methods are widely applied for sequence clustering because of their low computational complexity. Although numerous heuristic clustering methods have been developed, they suffer from two limitations: overestimation of inferred clusters and low clustering sensitivity. To address these issues, we present a new sequence clustering method (edClust) based on Edlib, a C/C[Formula: see text] library for fast, exact semi-global sequence alignment to group similar sequences. The new method edClust was tested on three large-scale sequence databases, and we compared edClust to several classic heuristic clustering methods, such as UCLUST, CD-HIT, and VSEARCH. Evaluations based on the metrics of cluster number and seed sensitivity (SS) demonstrate that edClust can produce fewer clusters than other methods and that its SS is higher than that of other methods. The source codes of edClust are available from https://github.com/zhang134/EdClust.git under the GNU GPL license.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39751492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Bioinformatics and Computational Biology: A Primer for Biologists 生物信息学和计算生物学:生物学家入门
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.1007/978-981-16-4241-8
B. Tiwary
{"title":"Bioinformatics and Computational Biology: A Primer for Biologists","authors":"B. Tiwary","doi":"10.1007/978-981-16-4241-8","DOIUrl":"https://doi.org/10.1007/978-981-16-4241-8","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83849236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Novel Method for Predicting DNA N4-Methylcytosine Sites Based on Deep Forest Algorithm 基于深度森林算法的DNA n4 -甲基胞嘧啶位点预测新方法
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2022-01-01 DOI: 10.2139/ssrn.4062895
Yonglin Zhang, Mei Hu, Qi Mo, Wenli Gan, Jiesi Luo
{"title":"A Novel Method for Predicting DNA N4-Methylcytosine Sites Based on Deep Forest Algorithm","authors":"Yonglin Zhang, Mei Hu, Qi Mo, Wenli Gan, Jiesi Luo","doi":"10.2139/ssrn.4062895","DOIUrl":"https://doi.org/10.2139/ssrn.4062895","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"68686715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introduction to the Special Issue of the 18th Annual International RECOMB Satellite Workshop on Comparative Genomics. 第18届国际RECOMB卫星比较基因组学研讨会特刊简介。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2021-12-01 Epub Date: 2021-12-13 DOI: 10.1142/S0219720021020030
Rohan B H Williams, Louxin Zhang
{"title":"Introduction to the Special Issue of the 18th Annual International RECOMB Satellite Workshop on Comparative Genomics.","authors":"Rohan B H Williams,&nbsp;Louxin Zhang","doi":"10.1142/S0219720021020030","DOIUrl":"https://doi.org/10.1142/S0219720021020030","url":null,"abstract":"","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39805625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Small parsimony for natural genomes in the DCJ-indel model. 自然基因组在DCJ-indel模型中的微小简约。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2021-12-01 Epub Date: 2021-11-19 DOI: 10.1142/S0219720021400096
Daniel Doerr, Cedric Chauve

The Small Parsimony Problem (SPP) aims at finding the gene orders at internal nodes of a given phylogenetic tree such that the overall genome rearrangement distance along the tree branches is minimized. This problem is intractable in most genome rearrangement models, especially when gene duplication and loss are considered. In this work, we describe an Integer Linear Program algorithm to solve the SPP for natural genomes, i.e. genomes that contain conserved, unique, and duplicated markers. The evolutionary model that we consider is the DCJ-indel model that includes the Double-Cut and Join rearrangement operation and the insertion and deletion of genome segments. We evaluate our algorithm on simulated data and show that it is able to reconstruct very efficiently and accurately ancestral gene orders in a very comprehensive evolutionary model.

小简约问题(Small Parsimony Problem, SPP)的目标是在给定的系统发育树的内部节点上找到基因顺序,从而使整个基因组沿着树枝的重排距离最小化。这个问题在大多数基因组重排模型中是难以解决的,特别是当考虑到基因复制和丢失时。在这项工作中,我们描述了一个整数线性规划算法来解决自然基因组的SPP,即包含保守的、唯一的和重复的标记的基因组。我们考虑的进化模型是DCJ-indel模型,包括双切和连接重排操作和基因组片段的插入和删除。我们在模拟数据上评估了我们的算法,并表明它能够在非常全面的进化模型中非常有效和准确地重建祖先基因顺序。
{"title":"Small parsimony for natural genomes in the DCJ-indel model.","authors":"Daniel Doerr,&nbsp;Cedric Chauve","doi":"10.1142/S0219720021400096","DOIUrl":"https://doi.org/10.1142/S0219720021400096","url":null,"abstract":"<p><p>The Small Parsimony Problem (SPP) aims at finding the gene orders at internal nodes of a given phylogenetic tree such that the overall genome rearrangement distance along the tree branches is minimized. This problem is intractable in most genome rearrangement models, especially when gene duplication and loss are considered. In this work, we describe an Integer Linear Program algorithm to solve the SPP for natural genomes, i.e. genomes that contain conserved, unique, and duplicated markers. The evolutionary model that we consider is the DCJ-indel model that includes the Double-Cut and Join rearrangement operation and the insertion and deletion of genome segments. We evaluate our algorithm on simulated data and show that it is able to reconstruct very efficiently and accurately ancestral gene orders in a very comprehensive evolutionary model.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39734889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A symmetry-inclusive algebraic approach to genome rearrangement. 基因组重排的对称包涵代数方法。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2021-12-01 Epub Date: 2021-11-19 DOI: 10.1142/S0219720021400151
Venta Terauds, Joshua Stevenson, Jeremy Sumner

Of the many modern approaches to calculating evolutionary distance via models of genome rearrangement, most are tied to a particular set of genomic modeling assumptions and to a restricted class of allowed rearrangements. The "position paradigm", in which genomes are represented as permutations signifying the position (and orientation) of each region, enables a refined model-based approach, where one can select biologically plausible rearrangements and assign to them relative probabilities/costs. Here, one must further incorporate any underlying structural symmetry of the genomes into the calculations and ensure that this symmetry is reflected in the model. In our recently-introduced framework of genome algebras, each genome corresponds to an element that simultaneously incorporates all of its inherent physical symmetries. The representation theory of these algebras then provides a natural model of evolution via rearrangement as a Markov chain. Whilst the implementation of this framework to calculate distances for genomes with "practical" numbers of regions is currently computationally infeasible, we consider it to be a significant theoretical advance: one can incorporate different genomic modeling assumptions, calculate various genomic distances, and compare the results under different rearrangement models. The aim of this paper is to demonstrate some of these features.

在许多通过基因组重排模型来计算进化距离的现代方法中,大多数都与一组特定的基因组建模假设和有限的允许重排有关。在“位置范式”中,基因组被表示为表示每个区域的位置(和方向)的排列,这使得一种基于模型的改进方法成为可能,人们可以选择生物学上合理的重排,并为它们分配相对概率/成本。在这里,人们必须进一步将任何潜在的基因组结构对称性纳入计算,并确保这种对称性在模型中得到反映。在我们最近引入的基因组代数框架中,每个基因组对应于一个元素,同时包含其所有固有的物理对称性。这些代数的表示理论然后提供了一个自然的进化模型,通过重排作为一个马尔可夫链。虽然该框架的实现计算具有“实际”区域数的基因组的距离目前在计算上是不可行的,但我们认为这是一个重要的理论进步:人们可以纳入不同的基因组建模假设,计算不同的基因组距离,并比较不同重排模型下的结果。本文的目的是演示其中的一些特性。
{"title":"A symmetry-inclusive algebraic approach to genome rearrangement.","authors":"Venta Terauds,&nbsp;Joshua Stevenson,&nbsp;Jeremy Sumner","doi":"10.1142/S0219720021400151","DOIUrl":"https://doi.org/10.1142/S0219720021400151","url":null,"abstract":"<p><p>Of the many modern approaches to calculating evolutionary distance via models of genome rearrangement, most are tied to a particular set of genomic modeling assumptions and to a restricted class of allowed rearrangements. The \"position paradigm\", in which genomes are represented as permutations signifying the position (and orientation) of each region, enables a refined model-based approach, where one can select biologically plausible rearrangements and assign to them relative probabilities/costs. Here, one must further incorporate any underlying structural symmetry of the genomes into the calculations and ensure that this symmetry is reflected in the model. In our recently-introduced framework of <i>genome algebras</i>, each genome corresponds to an element that simultaneously incorporates all of its inherent physical symmetries. The representation theory of these algebras then provides a natural model of evolution via rearrangement as a Markov chain. Whilst the implementation of this framework to calculate distances for genomes with \"practical\" numbers of regions is currently computationally infeasible, we consider it to be a significant theoretical advance: one can incorporate different genomic modeling assumptions, calculate various genomic distances, and compare the results under different rearrangement models. The aim of this paper is to demonstrate some of these features.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39734890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The monoploid chromosome complement of reconstructed ancestral genomes in a phylogeny. 系统发育中重建祖先基因组的单倍体染色体补体。
IF 1 4区 生物学 Q3 Computer Science Pub Date : 2021-12-01 Epub Date: 2021-11-19 DOI: 10.1142/S0219720021400084
Qiaoji Xu, Xiaomeng Zhang, Yue Zhang, Chunfang Zheng, James H Leebens-Mack, Lingling Jin, David Sankoff

Using RACCROCHE, a method for reconstructing gene content and order of ancestral chromosomes from a phylogeny of extant genomes represented by the gene orders on their chromosomes, we study the evolution of three orders of woody plants. The method retrieves the monoploid complement of each Ancestor in a phylogeny, consisting a complete set of distinct chromosomes, despite some of the extant genomes being recently or historically polyploidized. The three orders are the Sapindales, the Fagales and the Malvales. All of these are independently estimated to have ancestral monoploid number [Formula: see text].

利用RACCROCHE方法,从以染色体上的基因顺序为代表的现存基因组的系统发育中重建祖先染色体的基因含量和顺序,研究了木本植物3目的进化。该方法检索系统发育中每个祖先的单倍体补体,包括一套完整的不同染色体,尽管一些现存的基因组最近或历史上被多倍体化。这三个目是Sapindales, Fagales和Malvales。所有这些都被独立地估计为具有祖先的单倍体数[公式:见文本]。
{"title":"The monoploid chromosome complement of reconstructed ancestral genomes in a phylogeny.","authors":"Qiaoji Xu,&nbsp;Xiaomeng Zhang,&nbsp;Yue Zhang,&nbsp;Chunfang Zheng,&nbsp;James H Leebens-Mack,&nbsp;Lingling Jin,&nbsp;David Sankoff","doi":"10.1142/S0219720021400084","DOIUrl":"https://doi.org/10.1142/S0219720021400084","url":null,"abstract":"<p><p>Using RACCROCHE, a method for reconstructing gene content and order of ancestral chromosomes from a phylogeny of extant genomes represented by the gene orders on their chromosomes, we study the evolution of three orders of woody plants. The method retrieves the monoploid complement of each Ancestor in a phylogeny, consisting a complete set of distinct chromosomes, despite some of the extant genomes being recently or historically polyploidized. The three orders are the Sapindales, the Fagales and the Malvales. All of these are independently estimated to have ancestral monoploid number [Formula: see text].</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":null,"pages":null},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39734891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Bioinformatics and Computational Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1