Tony C Gatts, Chris deRoux, Linnea E Lane, Monica Berggren, Elizabeth A Rehmann, Emily N Zak, Trinity Bartel, Luna L’Argent, Daniel B. Sloan, Evan S. Forsythe
{"title":"Phylogenomic prediction of interaction networks in the presence of gene duplication","authors":"Tony C Gatts, Chris deRoux, Linnea E Lane, Monica Berggren, Elizabeth A Rehmann, Emily N Zak, Trinity Bartel, Luna L’Argent, Daniel B. Sloan, Evan S. Forsythe","doi":"10.1101/2024.08.06.606904","DOIUrl":null,"url":null,"abstract":"Assigning gene function from genome sequences is a rate-limiting step in molecular biology research. A protein’s position within an interaction network can potentially provide insights into its molecular mechanisms. Phylogenetic analyses of evolutionary rate covariation (ERC) have been shown to be effective for large-scale prediction of functional interactions from protein sequence data. However, gene duplication, gene loss, and other sources of phylogenetic incongruence are barriers for analyzing ERC on a genome-wide basis. Here, we developed ERCnet, a bioinformatic program designed to overcome these challenges, facilitating efficient all-vs-all ERC analyses for large protein sequence datasets. We compiled a sample set of 35 angiosperm genomes to test the performance of ERCnet, including its sensitivity to user-defined analysis parameters such as input dataset size, branch-length measurement strategy, and significance threshold for defining ERC hits. We find that our novel ‘branch-by-branch’ length measurements outperforms ‘root-to-tip’ approaches in most cases, offering a valuable new strategy for performing ERC even in the presence of extensive gene duplication. Further, we demonstrate that the number of genomes and the species composition both have profound effects on the genes that are predicted to interact. Our systematic exploration of the performance of ERCnet provides a roadmap for design of future ERC analyses to predict functional interactions in a wide array of genomic datasets. ERCnet code is freely available at https://github.com/EvanForsythe/ERCnet.","PeriodicalId":505198,"journal":{"name":"bioRxiv","volume":"19 26","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.06.606904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Assigning gene function from genome sequences is a rate-limiting step in molecular biology research. A protein’s position within an interaction network can potentially provide insights into its molecular mechanisms. Phylogenetic analyses of evolutionary rate covariation (ERC) have been shown to be effective for large-scale prediction of functional interactions from protein sequence data. However, gene duplication, gene loss, and other sources of phylogenetic incongruence are barriers for analyzing ERC on a genome-wide basis. Here, we developed ERCnet, a bioinformatic program designed to overcome these challenges, facilitating efficient all-vs-all ERC analyses for large protein sequence datasets. We compiled a sample set of 35 angiosperm genomes to test the performance of ERCnet, including its sensitivity to user-defined analysis parameters such as input dataset size, branch-length measurement strategy, and significance threshold for defining ERC hits. We find that our novel ‘branch-by-branch’ length measurements outperforms ‘root-to-tip’ approaches in most cases, offering a valuable new strategy for performing ERC even in the presence of extensive gene duplication. Further, we demonstrate that the number of genomes and the species composition both have profound effects on the genes that are predicted to interact. Our systematic exploration of the performance of ERCnet provides a roadmap for design of future ERC analyses to predict functional interactions in a wide array of genomic datasets. ERCnet code is freely available at https://github.com/EvanForsythe/ERCnet.