首页 > 最新文献

Journal of Bioinformatics and Computational Biology最新文献

英文 中文
The monoploid chromosome complement of reconstructed ancestral genomes in a phylogeny. 系统发育中重建祖先基因组的单倍体染色体补体。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-12-01 Epub Date: 2021-11-19 DOI: 10.1142/S0219720021400084
Qiaoji Xu, Xiaomeng Zhang, Yue Zhang, Chunfang Zheng, James H Leebens-Mack, Lingling Jin, David Sankoff

Using RACCROCHE, a method for reconstructing gene content and order of ancestral chromosomes from a phylogeny of extant genomes represented by the gene orders on their chromosomes, we study the evolution of three orders of woody plants. The method retrieves the monoploid complement of each Ancestor in a phylogeny, consisting a complete set of distinct chromosomes, despite some of the extant genomes being recently or historically polyploidized. The three orders are the Sapindales, the Fagales and the Malvales. All of these are independently estimated to have ancestral monoploid number [Formula: see text].

利用RACCROCHE方法,从以染色体上的基因顺序为代表的现存基因组的系统发育中重建祖先染色体的基因含量和顺序,研究了木本植物3目的进化。该方法检索系统发育中每个祖先的单倍体补体,包括一套完整的不同染色体,尽管一些现存的基因组最近或历史上被多倍体化。这三个目是Sapindales, Fagales和Malvales。所有这些都被独立地估计为具有祖先的单倍体数[公式:见文本]。
{"title":"The monoploid chromosome complement of reconstructed ancestral genomes in a phylogeny.","authors":"Qiaoji Xu,&nbsp;Xiaomeng Zhang,&nbsp;Yue Zhang,&nbsp;Chunfang Zheng,&nbsp;James H Leebens-Mack,&nbsp;Lingling Jin,&nbsp;David Sankoff","doi":"10.1142/S0219720021400084","DOIUrl":"https://doi.org/10.1142/S0219720021400084","url":null,"abstract":"<p><p>Using RACCROCHE, a method for reconstructing gene content and order of ancestral chromosomes from a phylogeny of extant genomes represented by the gene orders on their chromosomes, we study the evolution of three orders of woody plants. The method retrieves the monoploid complement of each Ancestor in a phylogeny, consisting a complete set of distinct chromosomes, despite some of the extant genomes being recently or historically polyploidized. The three orders are the Sapindales, the Fagales and the Malvales. All of these are independently estimated to have ancestral monoploid number [Formula: see text].</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140008"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39734891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incorporating intergenic regions into reversal and transposition distances with indels. 将基因间区与索引结合到反转和转位距离中。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-12-01 Epub Date: 2021-11-13 DOI: 10.1142/S0219720021400114
Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Ulisses Dias, Zanoni Dias

Problems in the genome rearrangement field are often formulated in terms of pairwise genome comparison: given two genomes [Formula: see text] and [Formula: see text], find the minimum number of genome rearrangements that may have occurred during the evolutionary process. This broad definition lacks at least two important considerations: the first being which features are extracted from genomes to create a useful mathematical model, and the second being which types of genome rearrangement events should be represented. Regarding the first consideration, seminal works in the genome rearrangement field solely used gene order to represent genomes as permutations of integer numbers, neglecting many important aspects like gene duplication, intergenic regions, and complex interactions between genes. Regarding the second consideration, some rearrangement events are widely studied such as reversals and transpositions. In this paper, we shed light on the first consideration and created a model that takes into account gene order and the number of nucleotides in intergenic regions. In addition, we consider events of reversals, transpositions, and indels (insertions and deletions) of genomic material. We present a 4-approximation algorithm for reversals and indels, a [Formula: see text]-approximation algorithm for transpositions and indels, and a 6-approximation for reversals, transpositions, and indels.

基因组重排领域的问题通常用成对基因组比较的方式来表述:给定两个基因组[公式:见文]和[公式:见文],找出在进化过程中可能发生的基因组重排的最小数量。这个宽泛的定义至少缺少两个重要的考虑:第一是从基因组中提取哪些特征来创建有用的数学模型,第二是应该表示哪些类型的基因组重排事件。关于第一个考虑,基因组重排领域的开创性工作仅仅使用基因顺序将基因组表示为整数排列,而忽略了许多重要方面,如基因复制、基因间区域和基因之间复杂的相互作用。关于第二个考虑因素,一些重排事件被广泛研究,如反转和移位。在本文中,我们阐明了第一个考虑因素,并创建了一个考虑基因顺序和基因间区域核苷酸数量的模型。此外,我们还考虑了基因组物质的反转、转位和缺失(插入和缺失)事件。我们提出了一个用于反转和索引的4-近似算法,一个用于换位和索引的[公式:见文本]-近似算法,以及一个用于反转、换位和索引的6-近似算法。
{"title":"Incorporating intergenic regions into reversal and transposition distances with indels.","authors":"Alexsandro Oliveira Alexandrino,&nbsp;Andre Rodrigues Oliveira,&nbsp;Ulisses Dias,&nbsp;Zanoni Dias","doi":"10.1142/S0219720021400114","DOIUrl":"https://doi.org/10.1142/S0219720021400114","url":null,"abstract":"<p><p>Problems in the genome rearrangement field are often formulated in terms of pairwise genome comparison: given two genomes [Formula: see text] and [Formula: see text], find the minimum number of genome rearrangements that may have occurred during the evolutionary process. This broad definition lacks at least two important considerations: the first being which features are extracted from genomes to create a useful mathematical model, and the second being which types of genome rearrangement events should be represented. Regarding the first consideration, seminal works in the genome rearrangement field solely used gene order to represent genomes as permutations of integer numbers, neglecting many important aspects like gene duplication, intergenic regions, and complex interactions between genes. Regarding the second consideration, some rearrangement events are widely studied such as reversals and transpositions. In this paper, we shed light on the first consideration and created a model that takes into account gene order and the number of nucleotides in intergenic regions. In addition, we consider events of reversals, transpositions, and indels (insertions and deletions) of genomic material. We present a 4-approximation algorithm for reversals and indels, a [Formula: see text]-approximation algorithm for transpositions and indels, and a 6-approximation for reversals, transpositions, and indels.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140011"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39889211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
DNN-Boost: Somatic mutation identification of tumor-only whole-exome sequencing data using deep neural network and XGBoost. DNN-Boost:利用深度神经网络和XGBoost对肿瘤全外显子组测序数据进行体细胞突变鉴定。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-12-01 Epub Date: 2021-12-13 DOI: 10.1142/S0219720021400175
Firda Aminy Maruf, Rian Pratama, Giltae Song

Detection of somatic mutation in whole-exome sequencing data can help elucidate the mechanism of tumor progression. Most computational approaches require exome sequencing for both tumor and normal samples. However, it is more common to sequence exomes for tumor samples only without the paired normal samples. To include these types of data for extensive studies on the process of tumorigenesis, it is necessary to develop an approach for identifying somatic mutations using tumor exome sequencing data only. In this study, we designed a machine learning approach using Deep Neural Network (DNN) and XGBoost to identify somatic mutations in tumor-only exome sequencing data and we integrated this into a pipeline called DNN-Boost. The XGBoost algorithm is used to extract the features from the results of variant callers and these features are then fed into the DNN model as input. The XGBoost algorithm resolves issues of missing values and overfitting. We evaluated our proposed model and compared its performance with other existing benchmark methods. We noted that the DNN-Boost classification model outperformed the benchmark method in classifying somatic mutations from paired tumor-normal exome data and tumor-only exome data.

在全外显子组测序数据中检测体细胞突变有助于阐明肿瘤进展的机制。大多数计算方法都需要对肿瘤和正常样本进行外显子组测序。然而,仅对肿瘤样本进行外显子组测序而不对配对的正常样本进行外显子组测序更为常见。为了将这些类型的数据用于肿瘤发生过程的广泛研究,有必要开发一种仅使用肿瘤外显子组测序数据识别体细胞突变的方法。在这项研究中,我们设计了一种机器学习方法,使用深度神经网络(DNN)和XGBoost来识别肿瘤外显子组测序数据中的体细胞突变,并将其整合到一个名为DNN- boost的管道中。XGBoost算法用于从变量调用者的结果中提取特征,然后将这些特征作为输入输入到DNN模型中。XGBoost算法解决了缺失值和过拟合的问题。我们评估了我们提出的模型,并将其性能与其他现有的基准方法进行了比较。我们注意到DNN-Boost分类模型在从配对肿瘤-正常外显子组数据和仅肿瘤外显子组数据中分类体细胞突变方面优于基准方法。
{"title":"DNN-Boost: Somatic mutation identification of tumor-only whole-exome sequencing data using deep neural network and XGBoost.","authors":"Firda Aminy Maruf,&nbsp;Rian Pratama,&nbsp;Giltae Song","doi":"10.1142/S0219720021400175","DOIUrl":"https://doi.org/10.1142/S0219720021400175","url":null,"abstract":"<p><p>Detection of somatic mutation in whole-exome sequencing data can help elucidate the mechanism of tumor progression. Most computational approaches require exome sequencing for both tumor and normal samples. However, it is more common to sequence exomes for tumor samples only without the paired normal samples. To include these types of data for extensive studies on the process of tumorigenesis, it is necessary to develop an approach for identifying somatic mutations using tumor exome sequencing data only. In this study, we designed a machine learning approach using Deep Neural Network (DNN) and XGBoost to identify somatic mutations in tumor-only exome sequencing data and we integrated this into a pipeline called DNN-Boost. The XGBoost algorithm is used to extract the features from the results of variant callers and these features are then fed into the DNN model as input. The XGBoost algorithm resolves issues of missing values and overfitting. We evaluated our proposed model and compared its performance with other existing benchmark methods. We noted that the DNN-Boost classification model outperformed the benchmark method in classifying somatic mutations from paired tumor-normal exome data and tumor-only exome data.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140017"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39716735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Comparing the topology of phylogenetic network generators. 比较系统发育网络生成器的拓扑结构。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-12-01 Epub Date: 2021-12-06 DOI: 10.1142/S0219720021400126
Remie Janssen, Pengyu Liu

Phylogenetic networks represent evolutionary history of species and can record natural reticulate evolutionary processes such as horizontal gene transfer and gene recombination. This makes phylogenetic networks a more comprehensive representation of evolutionary history compared to phylogenetic trees. Stochastic processes for generating random trees or networks are important tools in evolutionary analysis, especially in phylogeny reconstruction where they can be utilized for validation or serve as priors for Bayesian methods. However, as more network generators are developed, there is a lack of discussion or comparison for different generators. To bridge this gap, we compare a set of phylogenetic network generators by profiling topological summary statistics of the generated networks over the number of reticulations and comparing the topological profiles.

系统发育网络反映了物种的进化史,可以记录基因水平转移和基因重组等自然的网状进化过程。这使得系统发育网络比系统发育树更全面地代表了进化史。用于生成随机树或随机网络的随机过程是进化分析中的重要工具,特别是在系统发育重建中,它们可以用于验证或作为贝叶斯方法的先验。然而,随着越来越多的网络生成器的开发,缺乏对不同生成器的讨论和比较。为了弥补这一差距,我们比较了一组系统发育网络生成器,通过分析生成的网络在网络数量上的拓扑汇总统计数据并比较拓扑概况。
{"title":"Comparing the topology of phylogenetic network generators.","authors":"Remie Janssen,&nbsp;Pengyu Liu","doi":"10.1142/S0219720021400126","DOIUrl":"https://doi.org/10.1142/S0219720021400126","url":null,"abstract":"<p><p>Phylogenetic networks represent evolutionary history of species and can record natural reticulate evolutionary processes such as horizontal gene transfer and gene recombination. This makes phylogenetic networks a more comprehensive representation of evolutionary history compared to phylogenetic trees. Stochastic processes for generating random trees or networks are important tools in evolutionary analysis, especially in phylogeny reconstruction where they can be utilized for validation or serve as priors for Bayesian methods. However, as more network generators are developed, there is a lack of discussion or comparison for different generators. To bridge this gap, we compare a set of phylogenetic network generators by profiling topological summary statistics of the generated networks over the number of reticulations and comparing the topological profiles.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140012"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39805628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
The potential of family-free rearrangements towards gene orthology inference. 无家族重排对基因同源推断的潜力。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-12-01 Epub Date: 2021-11-13 DOI: 10.1142/S021972002140014X
Diego P Rubert, Daniel Doerr, Marília D V Braga

Recently, we proposed an efficient ILP formulation [Rubert DP, Martinez FV, Braga MDV, Natural family-free genomic distance, Algorithms Mol Biol 16:4, 2021] for exactly computing the rearrangement distance of two genomes in a family-free setting. In such a setting, neither prior classification of genes into families, nor further restrictions on the genomes are imposed. Given two genomes, the mentioned ILP computes an optimal matching of the genes taking into account simultaneously local mutations, given by gene similarities, and large-scale genome rearrangements. Here, we explore the potential of using this ILP for inferring groups of orthologs across several species. More precisely, given a set of genomes, our method first computes all pairwise optimal gene matchings, which are then integrated into gene families in the second step. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities. It can be downloaded from gitlab.ub.uni-bielefeld.de/gi/FFGC. We obtained promising results with experiments on both simulated and real data.

最近,我们提出了一个有效的ILP公式[Rubert DP, Martinez FV, Braga MDV, Natural family-free genomic distance, Algorithms Mol Biol 16:4, 2021],用于精确计算两个基因组在无家族环境下的重排距离。在这种情况下,既没有预先将基因分类为家族,也没有对基因组施加进一步的限制。给定两个基因组,上述ILP计算出基因的最佳匹配,同时考虑到由基因相似性引起的局部突变和大规模基因组重排。在这里,我们探索了使用这种ILP来推断几个物种的同源物群的潜力。更准确地说,给定一组基因组,我们的方法首先计算所有成对的最佳基因匹配,然后在第二步将其整合到基因家族中。我们的方法被实现到一个包含基因相似性预计算的管道中。可以从gitlab.ub. unit -bielefeld.de/gi/FFGC下载。在模拟数据和实际数据上进行了实验,得到了令人满意的结果。
{"title":"The potential of family-free rearrangements towards gene orthology inference.","authors":"Diego P Rubert,&nbsp;Daniel Doerr,&nbsp;Marília D V Braga","doi":"10.1142/S021972002140014X","DOIUrl":"https://doi.org/10.1142/S021972002140014X","url":null,"abstract":"<p><p>Recently, we proposed an efficient ILP formulation [Rubert DP, Martinez FV, Braga MDV, Natural family-free genomic distance, <i>Algorithms Mol Biol</i> <b>16</b>:4, 2021] for exactly computing the rearrangement distance of two genomes in a <i>family-free</i> setting. In such a setting, neither prior classification of genes into families, nor further restrictions on the genomes are imposed. Given two genomes, the mentioned ILP computes an optimal matching of the genes taking into account simultaneously local mutations, given by gene similarities, and large-scale genome rearrangements. Here, we explore the potential of using this ILP for inferring groups of orthologs across several species. More precisely, given a set of genomes, our method first computes all pairwise optimal gene matchings, which are then integrated into gene families in the second step. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities. It can be downloaded from gitlab.ub.uni-bielefeld.de/gi/FFGC. We obtained promising results with experiments on both simulated and real data.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140014"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39889208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Colorful orthology clustering in bounded-degree similarity graphs. 有界度相似图的彩色正交聚类。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-12-01 Epub Date: 2021-11-13 DOI: 10.1142/S0219720021400102
Alitzel López Sánchez, Manuel Lafond

Clustering genes in similarity graphs is a popular approach for orthology prediction. Most algorithms group genes without considering their species, which results in clusters that contain several paralogous genes. Moreover, clustering is known to be problematic when in-paralogs arise from ancient duplications. Recently, we proposed a two-step process that avoids these problems. First, we infer clusters of only orthologs (i.e. with only genes from distinct species), and second, we infer the missing inter-cluster orthologs. In this paper, we focus on the first step, which leads to a problem we call Colorful Clustering. In general, this is as hard as classical clustering. However, in similarity graphs, the number of species is usually small, as well as the neighborhood size of genes in other species. We therefore study the problem of clustering in which the number of colors is bounded by [Formula: see text], and each gene has at most [Formula: see text] neighbors in another species. We show that the well-known cluster editing formulation remains NP-hard even when [Formula: see text] and [Formula: see text]. We then propose a fixed-parameter algorithm in [Formula: see text] to find the single best cluster in the graph. We implemented this algorithm and included it in the aforementioned two-step approach. Experiments on simulated data show that this approach performs favorably to applying only an unconstrained clustering step.

在相似图中聚类基因是一种常用的同源预测方法。大多数算法对基因进行分组而不考虑它们的种类,这导致簇中包含几个相似的基因。此外,已知聚类是有问题的,当同源性产生于古老的重复。最近,我们提出了一个两步法来避免这些问题。首先,我们推断出只有同源物的簇(即只有来自不同物种的基因),其次,我们推断出缺失的簇间同源物。在本文中,我们关注的是第一步,这导致了一个问题,我们称之为彩色聚类。一般来说,这和经典聚类一样困难。然而,在相似图中,物种的数量通常较小,其他物种的基因邻域大小也较小。因此,我们研究了聚类问题,其中颜色的数量由[公式:见文]限定,并且每个基因在另一个物种中最多有[公式:见文]邻居。我们表明,即使在[公式:见文本]和[公式:见文本]时,著名的聚类编辑公式仍然是np困难的。然后,我们在[公式:见文本]中提出一种固定参数算法来寻找图中单个最佳聚类。我们实现了这个算法,并将其包含在前面提到的两步方法中。在模拟数据上的实验表明,该方法比只应用无约束聚类步骤具有更好的性能。
{"title":"Colorful orthology clustering in bounded-degree similarity graphs.","authors":"Alitzel López Sánchez,&nbsp;Manuel Lafond","doi":"10.1142/S0219720021400102","DOIUrl":"https://doi.org/10.1142/S0219720021400102","url":null,"abstract":"<p><p>Clustering genes in similarity graphs is a popular approach for orthology prediction. Most algorithms group genes without considering their species, which results in clusters that contain several paralogous genes. Moreover, clustering is known to be problematic when in-paralogs arise from ancient duplications. Recently, we proposed a two-step process that avoids these problems. First, we infer clusters of only orthologs (i.e. with only genes from distinct species), and second, we infer the missing inter-cluster orthologs. In this paper, we focus on the first step, which leads to a problem we call Colorful Clustering. In general, this is as hard as classical clustering. However, in similarity graphs, the number of species is usually small, as well as the neighborhood size of genes in other species. We therefore study the problem of clustering in which the number of colors is bounded by [Formula: see text], and each gene has at most [Formula: see text] neighbors in another species. We show that the well-known <i>cluster editing</i> formulation remains NP-hard even when [Formula: see text] and [Formula: see text]. We then propose a fixed-parameter algorithm in [Formula: see text] to find the single best cluster in the graph. We implemented this algorithm and included it in the aforementioned two-step approach. Experiments on simulated data show that this approach performs favorably to applying only an unconstrained clustering step.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140010"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39889213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Involving repetitive regions in scaffolding improvement. 涉及重复区域的脚手架改进。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-12-01 Epub Date: 2021-12-17 DOI: 10.1142/S0219720021400163
Quentin Delorme, Rémy Costa, Yasmine Mansour, Anna-Sophie Fiston-Lavier, Annie Chateau

In this paper, we investigate througth a premilinary study the influence of repeat elements during the assembly process. We analyze the link between the presence and the nature of one type of repeat element, called transposable element (TE) and misassembly events in genome assemblies. We propose to improve assemblies by taking into account the presence of repeat elements, including TEs, during the scaffolding step. We analyze the results and relate the misassemblies to TEs before and after correction.

在本文中,我们通过一个初步的研究,探讨了重复元件在装配过程中的影响。我们分析了一种重复元件的存在和性质之间的联系,称为转座元件(TE)和基因组组装中的错误组装事件。我们建议在脚手架步骤中考虑到重复元素(包括te)的存在来改进装配。我们对结果进行了分析,并将错误装配与纠正前后的TEs联系起来。
{"title":"Involving repetitive regions in scaffolding improvement.","authors":"Quentin Delorme,&nbsp;Rémy Costa,&nbsp;Yasmine Mansour,&nbsp;Anna-Sophie Fiston-Lavier,&nbsp;Annie Chateau","doi":"10.1142/S0219720021400163","DOIUrl":"https://doi.org/10.1142/S0219720021400163","url":null,"abstract":"<p><p>In this paper, we investigate througth a premilinary study the influence of repeat elements during the assembly process. We analyze the link between the presence and the nature of one type of repeat element, called transposable element (TE) and misassembly events in genome assemblies. We propose to improve assemblies by taking into account the presence of repeat elements, including TEs, during the scaffolding step. We analyze the results and relate the misassemblies to TEs before and after correction.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140016"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39614909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BOPAL 2.0 and a study of tRNA and rRNA gene evolution in Clostridium. BOPAL 2.0与梭状芽胞杆菌tRNA和rRNA基因进化研究。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-12-01 Epub Date: 2021-11-13 DOI: 10.1142/S0219720021400072
Meghan Chua, Anthony Tan, Olivier Tremblay-Savard

We present BOPAL 2.0, an improved version of the BOPAL algorithm for the evolutionary history inference of tRNA and rRNA genes in bacterial genomes. Our approach can infer complete evolutionary scenarios and ancestral gene orders on a phylogeny and considers a wide range of events such as duplications, deletions, substitutions, inversions and transpositions. It is based on the fact that tRNA and rRNA genes are often organized in operons/clusters in bacteria, and this information is used to help identify orthologous genes for each genome comparison. BOPAL 2.0 introduces new features, such as a triple-wise alignment step, context-aware singleton matching and a second pass of the algorithm. Evaluation on simulated datasets shows that BOPAL 2.0 outperforms the original BOPAL in terms of the accuracy of inferred events and ancestral genomes. We also present a study of the tRNA/rRNA gene evolution in the Clostridium genus, in which the organization of these genes is very divergent. Our results indicate that tRNA and rRNA genes in Clostridium have evolved through numerous duplications, losses, transpositions and substitutions, but very few inversions were inferred.

我们提出了BOPAL 2.0,这是BOPAL算法的改进版本,用于细菌基因组中tRNA和rRNA基因的进化史推断。我们的方法可以推断出完整的进化场景和祖先基因在系统发育上的顺序,并考虑了广泛的事件,如重复、缺失、替换、倒置和转位。这是基于tRNA和rRNA基因在细菌中经常组织在操纵子/簇中的事实,并且该信息用于帮助识别每个基因组比较的同源基因。BOPAL 2.0引入了一些新特性,比如三重对齐步骤、上下文感知的单例匹配和算法的第二次遍历。对模拟数据集的评估表明,BOPAL 2.0在推断事件和祖先基因组的准确性方面优于原始的BOPAL。我们也提出了在梭状芽孢杆菌属的tRNA/rRNA基因进化的研究,其中这些基因的组织是非常不同的。我们的研究结果表明,梭状芽胞杆菌中的tRNA和rRNA基因经过多次复制、丢失、转位和替换而进化,但很少推断出反转。
{"title":"BOPAL 2.0 and a study of tRNA and rRNA gene evolution in <i>Clostridium</i>.","authors":"Meghan Chua,&nbsp;Anthony Tan,&nbsp;Olivier Tremblay-Savard","doi":"10.1142/S0219720021400072","DOIUrl":"https://doi.org/10.1142/S0219720021400072","url":null,"abstract":"<p><p>We present BOPAL 2.0, an improved version of the BOPAL algorithm for the evolutionary history inference of tRNA and rRNA genes in bacterial genomes. Our approach can infer complete evolutionary scenarios and ancestral gene orders on a phylogeny and considers a wide range of events such as duplications, deletions, substitutions, inversions and transpositions. It is based on the fact that tRNA and rRNA genes are often organized in operons/clusters in bacteria, and this information is used to help identify orthologous genes for each genome comparison. BOPAL 2.0 introduces new features, such as a triple-wise alignment step, context-aware singleton matching and a second pass of the algorithm. Evaluation on simulated datasets shows that BOPAL 2.0 outperforms the original BOPAL in terms of the accuracy of inferred events and ancestral genomes. We also present a study of the tRNA/rRNA gene evolution in the <i>Clostridium</i> genus, in which the organization of these genes is very divergent. Our results indicate that tRNA and rRNA genes in <i>Clostridium</i> have evolved through numerous duplications, losses, transpositions and substitutions, but very few inversions were inferred.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140007"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39889207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Population-specific adaptation in malaria-endemic regions of asia. 亚洲疟疾流行地区的人口特异性适应。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-12-01 Epub Date: 2021-11-10 DOI: 10.1142/S0219720021400060
Elena S Gusareva, Paolo Alberto Lorenzini, Nurul Adilah Binte Ramli, Amit Gourav Ghosh, Hie Lim Kim

Evolutionary mechanisms of adaptation to malaria are understudied in Asian endemic regions despite a high prevalence of malaria in the region. In our research, we performed a genome-wide screening for footprints of natural selection against malaria by comparing eight Asian population groups from malaria-endemic regions with two non-endemic population groups from Europe and Mongolia. We identified 285 adaptive genes showing robust selection signals across three statistical methods, iHS, XP-EHH, and PBS. Interestingly, most of the identified genes (82%) were found to be under selection in a single population group, while adaptive genes shared across populations were rare. This is likely due to the independent adaptation history in different endemic populations. The gene ontology (GO) analysis for the 285 adaptive genes highlighted their functional processes linked to neuronal organizations or nervous system development. These genes could be related to cerebral malaria and may reduce the inflammatory response and the severity of malaria symptoms. Remarkably, our novel population genomic approach identified population-specific adaptive genes potentially against malaria infection without the need for patient samples or individual medical records.

尽管亚洲疟疾流行率很高,但该地区对疟疾适应的进化机制研究不足。在我们的研究中,我们通过比较来自疟疾流行地区的8个亚洲人群与来自欧洲和蒙古的两个非疟疾流行人群,对自然选择对疟疾的影响进行了全基因组筛选。我们通过三种统计方法(iHS、XP-EHH和PBS)鉴定出285个适应性基因,显示出强大的选择信号。有趣的是,大多数已鉴定的基因(82%)被发现在单一种群中处于选择状态,而跨种群共享的适应性基因则很少见。这可能是由于不同地方性种群的独立适应历史。285个适应性基因的基因本体(GO)分析突出了它们与神经元组织或神经系统发育相关的功能过程。这些基因可能与脑型疟疾有关,并可能减轻炎症反应和疟疾症状的严重程度。值得注意的是,我们的新种群基因组方法在不需要患者样本或个人医疗记录的情况下确定了可能对抗疟疾感染的种群特异性适应性基因。
{"title":"Population-specific adaptation in malaria-endemic regions of asia.","authors":"Elena S Gusareva,&nbsp;Paolo Alberto Lorenzini,&nbsp;Nurul Adilah Binte Ramli,&nbsp;Amit Gourav Ghosh,&nbsp;Hie Lim Kim","doi":"10.1142/S0219720021400060","DOIUrl":"https://doi.org/10.1142/S0219720021400060","url":null,"abstract":"<p><p>Evolutionary mechanisms of adaptation to malaria are understudied in Asian endemic regions despite a high prevalence of malaria in the region. In our research, we performed a genome-wide screening for footprints of natural selection against malaria by comparing eight Asian population groups from malaria-endemic regions with two non-endemic population groups from Europe and Mongolia. We identified 285 adaptive genes showing robust selection signals across three statistical methods, iHS, XP-EHH, and PBS. Interestingly, most of the identified genes (82%) were found to be under selection in a single population group, while adaptive genes shared across populations were rare. This is likely due to the independent adaptation history in different endemic populations. The gene ontology (GO) analysis for the 285 adaptive genes highlighted their functional processes linked to neuronal organizations or nervous system development. These genes could be related to cerebral malaria and may reduce the inflammatory response and the severity of malaria symptoms. Remarkably, our novel population genomic approach identified population-specific adaptive genes potentially against malaria infection without the need for patient samples or individual medical records.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140006"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39692975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Evidence for exon shuffling is sensitive to model choice. 外显子洗牌的证据对模型选择很敏感。
IF 1 4区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2021-12-01 Epub Date: 2021-11-19 DOI: 10.1142/S0219720021400138
Xiaoyue Cui, Maureen Stolzer, Dannie Durand

The exon shuffling theory posits that intronic recombination creates new domain combinations, facilitating the evolution of novel protein function. This theory predicts that introns will be preferentially situated near domain boundaries. Many studies have sought evidence for exon shuffling by testing the correspondence between introns and domain boundaries against chance intron positioning. Here, we present an empirical investigation of how the choice of null model influences significance. Although genome-wide studies have used a uniform null model, exclusively, more realistic null models have been proposed for single gene studies. We extended these models for genome-wide analyses and applied them to 21 metazoan and fungal genomes. Our results show that compared with the other two models, the uniform model does not recapitulate genuine exon lengths, dramatically underestimates the probability of chance agreement, and overestimates the significance of intron-domain correspondence by as much as 100 orders of magnitude. Model choice had much greater impact on the assessment of exon shuffling in fungal genomes than in metazoa, leading to different evolutionary conclusions in seven of the 16 fungal genomes tested. Genome-wide studies that use this overly permissive null model may exaggerate the importance of exon shuffling as a general mechanism of multidomain evolution.

外显子改组理论认为内含子重组创造了新的结构域组合,促进了新的蛋白质功能的进化。该理论预测内含子将优先位于区域边界附近。许多研究通过测试内含子和结构域边界之间的对应关系来寻找外显子洗牌的证据,以防止内含子偶然定位。在这里,我们提出了零模型的选择如何影响显著性的实证调查。虽然全基因组研究使用了统一的零模型,但对于单基因研究,已经提出了更现实的零模型。我们将这些模型扩展到全基因组分析,并将其应用于21个后生动物和真菌基因组。我们的研究结果表明,与其他两种模型相比,统一模型没有概括出真实的外显子长度,严重低估了偶然一致的概率,并且高估了内含子域对应的重要性,高达100个数量级。与后生动物相比,模型选择对真菌基因组外显子洗选的影响要大得多,这导致16个真菌基因组中有7个得出了不同的进化结论。使用这种过于宽松的零模型的全基因组研究可能夸大了外显子洗牌作为多域进化的一般机制的重要性。
{"title":"Evidence for exon shuffling is sensitive to model choice.","authors":"Xiaoyue Cui,&nbsp;Maureen Stolzer,&nbsp;Dannie Durand","doi":"10.1142/S0219720021400138","DOIUrl":"https://doi.org/10.1142/S0219720021400138","url":null,"abstract":"<p><p>The exon shuffling theory posits that intronic recombination creates new domain combinations, facilitating the evolution of novel protein function. This theory predicts that introns will be preferentially situated near domain boundaries. Many studies have sought evidence for exon shuffling by testing the correspondence between introns and domain boundaries against chance intron positioning. Here, we present an empirical investigation of how the choice of null model influences significance. Although genome-wide studies have used a uniform null model, exclusively, more realistic null models have been proposed for single gene studies. We extended these models for genome-wide analyses and applied them to 21 metazoan and fungal genomes. Our results show that compared with the other two models, the uniform model does not recapitulate genuine exon lengths, dramatically underestimates the probability of chance agreement, and overestimates the significance of intron-domain correspondence by as much as 100 orders of magnitude. Model choice had much greater impact on the assessment of exon shuffling in fungal genomes than in metazoa, leading to different evolutionary conclusions in seven of the 16 fungal genomes tested. Genome-wide studies that use this overly permissive null model may exaggerate the importance of exon shuffling as a general mechanism of multidomain evolution.</p>","PeriodicalId":48910,"journal":{"name":"Journal of Bioinformatics and Computational Biology","volume":"19 6","pages":"2140013"},"PeriodicalIF":1.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39645906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Bioinformatics and Computational Biology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1