首页 > 最新文献

Bioinformatics advances最新文献

英文 中文
survivalContour: visualizing predicted survival via colored contour plots. survivalContour:通过彩色等高线图直观显示预测存活率。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-25 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae105
Yushu Shi, Liangliang Zhang, Kim-Anh Do, Robert R Jenq, Christine B Peterson

Summary: Advances in survival analysis have facilitated unprecedented flexibility in data modeling, yet there remains a lack of tools for illustrating the influence of continuous covariates on predicted survival outcomes. We propose the utilization of a colored contour plot to depict the predicted survival probabilities over time. Our approach is capable of supporting conventional models, including the Cox and Fine-Gray models. However, its capability shines when coupled with cutting-edge machine learning models such as random survival forests and deep neural networks.

Availability and implementation: We provide a Shiny app at https://biostatistics.mdanderson.org/shinyapps/survivalContour/ and an R package available at https://github.com/YushuShi/survivalContour as implementations of this tool.

摘要:生存分析技术的进步为数据建模带来了前所未有的灵活性,但目前仍缺乏说明连续协变量对预测生存结果影响的工具。我们建议使用彩色等值线图来描述随时间变化的预测生存概率。我们的方法能够支持传统模型,包括 Cox 和 Fine-Gray 模型。然而,当与随机生存森林和深度神经网络等前沿机器学习模型结合使用时,我们的方法将大放异彩:我们在 https://biostatistics.mdanderson.org/shinyapps/survivalContour/ 上提供了一个 Shiny 应用程序,并在 https://github.com/YushuShi/survivalContour 上提供了一个 R 软件包作为该工具的实现。
{"title":"survivalContour: visualizing predicted survival via colored contour plots.","authors":"Yushu Shi, Liangliang Zhang, Kim-Anh Do, Robert R Jenq, Christine B Peterson","doi":"10.1093/bioadv/vbae105","DOIUrl":"10.1093/bioadv/vbae105","url":null,"abstract":"<p><strong>Summary: </strong>Advances in survival analysis have facilitated unprecedented flexibility in data modeling, yet there remains a lack of tools for illustrating the influence of continuous covariates on predicted survival outcomes. We propose the utilization of a colored contour plot to depict the predicted survival probabilities over time. Our approach is capable of supporting conventional models, including the Cox and Fine-Gray models. However, its capability shines when coupled with cutting-edge machine learning models such as random survival forests and deep neural networks.</p><p><strong>Availability and implementation: </strong>We provide a Shiny app at https://biostatistics.mdanderson.org/shinyapps/survivalContour/ and an R package available at https://github.com/YushuShi/survivalContour as implementations of this tool.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae105"},"PeriodicalIF":2.4,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11290613/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141861796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genotype imputation in F2 crosses of inbred lines. 近交系 F2 杂交中的基因型估算。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-23 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae107
Saul Pierotti, Bettina Welz, Mireia Osuna-López, Tomas Fitzgerald, Joachim Wittbrodt, Ewan Birney

Motivation: Crosses among inbred lines are a fundamental tool for the discovery of genetic loci associated with phenotypes of interest. In organisms for which large reference panels or SNP chips are not available, imputation from low-pass whole-genome sequencing is an effective method for obtaining genotype data from a large number of individuals. To date, a structured analysis of the conditions required for optimal genotype imputation has not been performed.

Results: We report a systematic exploration of the effect of several design variables on imputation performance in F2 crosses of inbred medaka lines using the imputation software STITCH. We determined that, depending on the number of samples, imputation performance reaches a plateau when increasing the per-sample sequencing coverage. We also systematically explored the trade-offs between cost, imputation accuracy, and sample numbers. We developed a computational pipeline to streamline the process, enabling other researchers to perform a similar cost-benefit analysis on their population of interest.

Availability and implementation: The source code for the pipeline is available at https://github.com/birneylab/stitchimpute. While our pipeline has been developed and tested for an F2 population, the software can also be used to analyse populations with a different structure.

动机近交系杂交是发现与相关表型有关的遗传位点的基本工具。在没有大型参考面板或 SNP 芯片的生物体中,低通滤波器全基因组测序的估算是从大量个体中获取基因型数据的有效方法。迄今为止,尚未对最佳基因型归因所需的条件进行结构性分析:结果:我们报告了在近交系青鳉 F2 杂交中,利用归因软件 STITCH 系统地探讨了几个设计变量对归因性能的影响。我们发现,根据样本数量的不同,当提高每个样本的测序覆盖率时,估算性能会达到一个高点。我们还系统地探索了成本、估算准确性和样本数量之间的权衡。我们开发了一个计算管道来简化这一过程,使其他研究人员也能对他们感兴趣的人群进行类似的成本效益分析:该管道的源代码可在 https://github.com/birneylab/stitchimpute 上获取。虽然我们的管道是针对 F2 群体开发和测试的,但该软件也可用于分析不同结构的群体。
{"title":"Genotype imputation in F2 crosses of inbred lines.","authors":"Saul Pierotti, Bettina Welz, Mireia Osuna-López, Tomas Fitzgerald, Joachim Wittbrodt, Ewan Birney","doi":"10.1093/bioadv/vbae107","DOIUrl":"10.1093/bioadv/vbae107","url":null,"abstract":"<p><strong>Motivation: </strong>Crosses among inbred lines are a fundamental tool for the discovery of genetic loci associated with phenotypes of interest. In organisms for which large reference panels or SNP chips are not available, imputation from low-pass whole-genome sequencing is an effective method for obtaining genotype data from a large number of individuals. To date, a structured analysis of the conditions required for optimal genotype imputation has not been performed.</p><p><strong>Results: </strong>We report a systematic exploration of the effect of several design variables on imputation performance in F2 crosses of inbred medaka lines using the imputation software STITCH. We determined that, depending on the number of samples, imputation performance reaches a plateau when increasing the per-sample sequencing coverage. We also systematically explored the trade-offs between cost, imputation accuracy, and sample numbers. We developed a computational pipeline to streamline the process, enabling other researchers to perform a similar cost-benefit analysis on their population of interest.</p><p><strong>Availability and implementation: </strong>The source code for the pipeline is available at https://github.com/birneylab/stitchimpute. While our pipeline has been developed and tested for an F2 population, the software can also be used to analyse populations with a different structure.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae107"},"PeriodicalIF":2.4,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11286293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141794100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model. 利用基于化学和基因描述的集合变换器模型从生物医学文献中挖掘药物与靶点的相互作用。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-22 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae106
Jehad Aldahdooh, Ziaurrehman Tanoli, Jing Tang

Motivation: Drug-target interactions (DTIs) play a pivotal role in drug discovery, as it aims to identify potential drug targets and elucidate their mechanism of action. In recent years, the application of natural language processing (NLP), particularly when combined with pre-trained language models, has gained considerable momentum in the biomedical domain, with the potential to mine vast amounts of texts to facilitate the efficient extraction of DTIs from the literature.

Results: In this article, we approach the task of DTIs as an entity-relationship extraction problem, utilizing different pre-trained transformer language models, such as BERT, to extract DTIs. Our results indicate that an ensemble approach, by combining gene descriptions from the Entrez Gene database with chemical descriptions from the Comparative Toxicogenomics Database (CTD), is critical for achieving optimal performance. The proposed model achieves an F1 score of 80.6 on the hidden DrugProt test set, which is the top-ranked performance among all the submitted models in the official evaluation. Furthermore, we conduct a comparative analysis to evaluate the effectiveness of various gene textual descriptions sourced from Entrez Gene and UniProt databases to gain insights into their impact on the performance. Our findings highlight the potential of NLP-based text mining using gene and chemical descriptions to improve drug-target extraction tasks.

Availability and implementation: Datasets utilized in this study are accessible at https://dtis.drugtargetcommons.org/.

动机药物-靶点相互作用(DTIs)在药物发现中起着举足轻重的作用,因为它旨在确定潜在的药物靶点并阐明其作用机制。近年来,自然语言处理(NLP)的应用,尤其是与预先训练的语言模型相结合的应用,在生物医学领域获得了相当大的发展势头,有可能挖掘大量文本,促进从文献中有效提取 DTIs:在本文中,我们将 DTIs 任务视为实体关系提取问题,利用不同的预训练转换语言模型(如 BERT)来提取 DTIs。我们的研究结果表明,将 Entrez 基因数据库中的基因描述与比较毒物基因组学数据库(CTD)中的化学描述相结合的组合方法对于实现最佳性能至关重要。所提出的模型在隐藏的 DrugProt 测试集上取得了 80.6 的 F1 分数,在官方评估中所有提交的模型中名列前茅。此外,我们还进行了对比分析,以评估来自 Entrez Gene 和 UniProt 数据库的各种基因文本描述的有效性,从而深入了解它们对性能的影响。我们的研究结果凸显了利用基因和化学描述进行基于 NLP 的文本挖掘以改进药物靶标提取任务的潜力:本研究中使用的数据集可在 https://dtis.drugtargetcommons.org/ 上访问。
{"title":"Mining drug-target interactions from biomedical literature using chemical and gene descriptions-based ensemble transformer model.","authors":"Jehad Aldahdooh, Ziaurrehman Tanoli, Jing Tang","doi":"10.1093/bioadv/vbae106","DOIUrl":"10.1093/bioadv/vbae106","url":null,"abstract":"<p><strong>Motivation: </strong>Drug-target interactions (DTIs) play a pivotal role in drug discovery, as it aims to identify potential drug targets and elucidate their mechanism of action. In recent years, the application of natural language processing (NLP), particularly when combined with pre-trained language models, has gained considerable momentum in the biomedical domain, with the potential to mine vast amounts of texts to facilitate the efficient extraction of DTIs from the literature.</p><p><strong>Results: </strong>In this article, we approach the task of DTIs as an entity-relationship extraction problem, utilizing different pre-trained transformer language models, such as BERT, to extract DTIs. Our results indicate that an ensemble approach, by combining gene descriptions from the Entrez Gene database with chemical descriptions from the Comparative Toxicogenomics Database (CTD), is critical for achieving optimal performance. The proposed model achieves an <i>F</i>1 score of 80.6 on the hidden DrugProt test set, which is the top-ranked performance among all the submitted models in the official evaluation. Furthermore, we conduct a comparative analysis to evaluate the effectiveness of various gene textual descriptions sourced from Entrez Gene and UniProt databases to gain insights into their impact on the performance. Our findings highlight the potential of NLP-based text mining using gene and chemical descriptions to improve drug-target extraction tasks.</p><p><strong>Availability and implementation: </strong>Datasets utilized in this study are accessible at https://dtis.drugtargetcommons.org/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae106"},"PeriodicalIF":2.4,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11293871/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141876854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Quantitative transcriptomic and epigenomic data analysis: a primer. 更正:定量转录组学和表观基因组学数据分析:入门。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-18 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae091

[This corrects the article DOI: 10.1093/bioadv/vbae019.].

[此处更正了文章 DOI:10.1093/bioadv/vbae019]。
{"title":"Correction to: Quantitative transcriptomic and epigenomic data analysis: a primer.","authors":"","doi":"10.1093/bioadv/vbae091","DOIUrl":"https://doi.org/10.1093/bioadv/vbae091","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.1093/bioadv/vbae019.].</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae091"},"PeriodicalIF":2.4,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11257713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ScyNet: Visualizing interactions in community metabolic models. ScyNet:可视化群落代谢模型中的相互作用
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-17 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae104
Michael Predl, Kilian Gandolf, Michael Hofer, Thomas Rattei

Motivation: Genome-scale community metabolic models are used to gain mechanistic insights into interactions between community members. However, existing tools for visualizing metabolic models only cater to the needs of single organism models.

Results: ScyNet is a Cytoscape app for visualizing community metabolic models, generating networks with reduced complexity by focusing on interactions between community members. ScyNet can incorporate the state of a metabolic model via fluxes or flux ranges, which is shown in a previously published simplified cystic fibrosis airway community model.

Availability and implementation: ScyNet is freely available under an MIT licence and can be retrieved via the Cytoscape App Store (apps.cytoscape.org/apps/scynet). The source code is available at Github (github.com/univieCUBE/ScyNet).

动机基因组尺度的群落代谢模型可用于从机理上深入了解群落成员之间的相互作用。然而,现有的代谢模型可视化工具只能满足单个生物体模型的需求:ScyNet是一款用于可视化群落代谢模型的Cytoscape应用程序,通过关注群落成员之间的相互作用,生成复杂度更低的网络。ScyNet可以通过通量或通量范围纳入代谢模型的状态,这在之前发表的简化囊性纤维化气道群落模型中有所体现:ScyNet在MIT许可下免费提供,可通过Cytoscape应用商店(apps.cytoscape.org/apps/scynet)获取。源代码可在 Github (github.com/univieCUBE/ScyNet) 上获取。
{"title":"ScyNet: Visualizing interactions in community metabolic models.","authors":"Michael Predl, Kilian Gandolf, Michael Hofer, Thomas Rattei","doi":"10.1093/bioadv/vbae104","DOIUrl":"10.1093/bioadv/vbae104","url":null,"abstract":"<p><strong>Motivation: </strong>Genome-scale community metabolic models are used to gain mechanistic insights into interactions between community members. However, existing tools for visualizing metabolic models only cater to the needs of single organism models.</p><p><strong>Results: </strong>ScyNet is a Cytoscape app for visualizing community metabolic models, generating networks with reduced complexity by focusing on interactions between community members. ScyNet can incorporate the state of a metabolic model via fluxes or flux ranges, which is shown in a previously published simplified cystic fibrosis airway community model.</p><p><strong>Availability and implementation: </strong>ScyNet is freely available under an MIT licence and can be retrieved via the Cytoscape App Store (apps.cytoscape.org/apps/scynet). The source code is available at Github (github.com/univieCUBE/ScyNet).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae104"},"PeriodicalIF":2.4,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11315608/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141918224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge graph embeddings in the biomedical domain: are they useful? A look at link prediction, rule learning, and downstream polypharmacy tasks. 生物医学领域的知识图嵌入:有用吗?链接预测、规则学习和下游多药任务的研究。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-17 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae097
Aryo Pradipta Gema, Dominik Grabarczyk, Wolf De Wulf, Piyush Borole, Javier Antonio Alfaro, Pasquale Minervini, Antonio Vergari, Ajitha Rajan

Summary: Knowledge graphs (KGs) are powerful tools for representing and organizing complex biomedical data. They empower researchers, physicians, and scientists by facilitating rapid access to biomedical information, enabling the discernment of patterns or insights, and fostering the formulation of decisions and the generation of novel knowledge. To automate these activities, several KG embedding algorithms have been proposed to learn from and complete KGs. However, the efficacy of these embedding algorithms appears limited when applied to biomedical KGs, prompting questions about whether they can be useful in this field. To that end, we explore several widely used KG embedding models and evaluate their performance and applications using a recent biomedical KG, BioKG. We also demonstrate that by using recent best practices for training KG embeddings, it is possible to improve performance over BioKG. Additionally, we address interpretability concerns that naturally arise with such machine learning methods. In particular, we examine rule-based methods that aim to address these concerns by making interpretable predictions using learned rules, achieving comparable performance. Finally, we discuss a realistic use case where a pretrained BioKG embedding is further trained for a specific task, in this case, four polypharmacy scenarios where the goal is to predict missing links or entities in another downstream KGs in four polypharmacy scenarios. We conclude that in the right scenarios, biomedical KG embeddings can be effective and useful.

Availability and implementation: Our code and data is available at https://github.com/aryopg/biokge.

摘要:知识图谱(KG)是表示和组织复杂生物医学数据的强大工具。知识图谱能帮助研究人员、医生和科学家快速获取生物医学信息,辨别模式或见解,促进决策的制定和新知识的产生。为了实现这些活动的自动化,人们提出了几种 KG 嵌入算法来学习和完成 KG。然而,当这些嵌入算法应用于生物医学 KG 时,其功效似乎有限,从而引发了这些算法在这一领域是否有用的问题。为此,我们探索了几种广泛使用的 KG 嵌入模型,并使用最新的生物医学 KG BioKG 评估了它们的性能和应用。我们还证明,通过使用最新的最佳实践来训练 KG 嵌入,可以提高 BioKG 的性能。此外,我们还解决了此类机器学习方法自然产生的可解释性问题。特别是,我们研究了基于规则的方法,这些方法旨在通过使用学习到的规则进行可解释的预测来解决这些问题,从而实现可比较的性能。最后,我们讨论了一个现实的使用案例,即针对特定任务进一步训练预训练的 BioKG 嵌入,在本案例中,我们讨论了四个多药场景,目标是预测四个多药场景中另一个下游 KG 中缺失的链接或实体。我们的结论是,在正确的场景中,生物医学 KG 嵌入是有效和有用的:我们的代码和数据可在 https://github.com/aryopg/biokge 上获取。
{"title":"Knowledge graph embeddings in the biomedical domain: are they useful? A look at link prediction, rule learning, and downstream polypharmacy tasks.","authors":"Aryo Pradipta Gema, Dominik Grabarczyk, Wolf De Wulf, Piyush Borole, Javier Antonio Alfaro, Pasquale Minervini, Antonio Vergari, Ajitha Rajan","doi":"10.1093/bioadv/vbae097","DOIUrl":"10.1093/bioadv/vbae097","url":null,"abstract":"<p><strong>Summary: </strong>Knowledge graphs (KGs) are powerful tools for representing and organizing complex biomedical data. They empower researchers, physicians, and scientists by facilitating rapid access to biomedical information, enabling the discernment of patterns or insights, and fostering the formulation of decisions and the generation of novel knowledge. To automate these activities, several KG embedding algorithms have been proposed to learn from and complete KGs. However, the efficacy of these embedding algorithms appears limited when applied to biomedical KGs, prompting questions about whether they can be useful in this field. To that end, we explore several widely used KG embedding models and evaluate their performance and applications using a recent biomedical KG, BioKG. We also demonstrate that by using recent best practices for training KG embeddings, it is possible to improve performance over BioKG. Additionally, we address interpretability concerns that naturally arise with such machine learning methods. In particular, we examine rule-based methods that aim to address these concerns by making interpretable predictions using learned rules, achieving comparable performance. Finally, we discuss a realistic use case where a pretrained BioKG embedding is further trained for a specific task, in this case, four polypharmacy scenarios where the goal is to predict missing links or entities in another downstream KGs in four polypharmacy scenarios. We conclude that in the right scenarios, biomedical KG embeddings can be effective and useful.</p><p><strong>Availability and implementation: </strong>Our code and data is available at https://github.com/aryopg/biokge.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae097"},"PeriodicalIF":2.4,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11538020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142592396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms. TemBERTure:利用深度学习和注意力机制推进蛋白质热稳定性预测。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-13 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae103
Chiara Rodella, Symela Lazaridi, Thomas Lemmin

Motivation: Understanding protein thermostability is essential for numerous biotechnological applications, but traditional experimental methods are time-consuming, expensive, and error-prone. Recently, deep learning (DL) techniques from natural language processing (NLP) was extended to the field of biology, since the primary sequence of proteins can be viewed as a string of amino acids that follow a physicochemical grammar.

Results: In this study, we developed TemBERTure, a DL framework that predicts thermostability class and melting temperature from protein sequences. Our findings emphasize the importance of data diversity for training robust models, especially by including sequences from a wider range of organisms. Additionally, we suggest using attention scores from Deep Learning models to gain deeper insights into protein thermostability. Analyzing these scores in conjunction with the 3D protein structure can enhance understanding of the complex interactions among amino acid properties, their positioning, and the surrounding microenvironment. By addressing the limitations of current prediction methods and introducing new exploration avenues, this research paves the way for more accurate and informative protein thermostability predictions, ultimately accelerating advancements in protein engineering.

Availability and implementation: TemBERTure model and the data are available at: https://github.com/ibmm-unibe-ch/TemBERTure.

动机:了解蛋白质的热稳定性对许多生物技术应用至关重要,但传统的实验方法耗时长、成本高且容易出错。最近,来自自然语言处理(NLP)的深度学习(DL)技术被扩展到了生物学领域,因为蛋白质的主序列可以被看作是一串遵循物理化学语法的氨基酸:在这项研究中,我们开发了一个 DL 框架 TemBERTure,它可以根据蛋白质序列预测热稳定性等级和熔化温度。我们的研究结果强调了数据多样性对训练稳健模型的重要性,尤其是通过纳入更广泛的生物体序列。此外,我们建议使用深度学习模型的注意力分数来深入了解蛋白质的热稳定性。将这些分数与蛋白质的三维结构结合起来进行分析,可以加深对氨基酸特性、其定位和周围微环境之间复杂相互作用的理解。通过解决当前预测方法的局限性并引入新的探索途径,这项研究为更准确、更翔实的蛋白质耐热性预测铺平了道路,最终将加速蛋白质工程的进步:TemBERTure 模型和数据可在以下网址获取:https://github.com/ibmm-unibe-ch/TemBERTure。
{"title":"TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms.","authors":"Chiara Rodella, Symela Lazaridi, Thomas Lemmin","doi":"10.1093/bioadv/vbae103","DOIUrl":"10.1093/bioadv/vbae103","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding protein thermostability is essential for numerous biotechnological applications, but traditional experimental methods are time-consuming, expensive, and error-prone. Recently, deep learning (DL) techniques from natural language processing (NLP) was extended to the field of biology, since the primary sequence of proteins can be viewed as a string of amino acids that follow a physicochemical grammar.</p><p><strong>Results: </strong>In this study, we developed TemBERTure, a DL framework that predicts thermostability class and melting temperature from protein sequences. Our findings emphasize the importance of data diversity for training robust models, especially by including sequences from a wider range of organisms. Additionally, we suggest using attention scores from Deep Learning models to gain deeper insights into protein thermostability. Analyzing these scores in conjunction with the 3D protein structure can enhance understanding of the complex interactions among amino acid properties, their positioning, and the surrounding microenvironment. By addressing the limitations of current prediction methods and introducing new exploration avenues, this research paves the way for more accurate and informative protein thermostability predictions, ultimately accelerating advancements in protein engineering.</p><p><strong>Availability and implementation: </strong>TemBERTure model and the data are available at: https://github.com/ibmm-unibe-ch/TemBERTure.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae103"},"PeriodicalIF":2.4,"publicationDate":"2024-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11262459/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141749771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
loco-pipe: an automated pipeline for population genomics with low-coverage whole-genome sequencing. loco-pipe:利用低覆盖率全基因组测序进行群体基因组学研究的自动化管道。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-11 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae098
Zehua T Zhou, Gregory L Owens, Wesley A Larson, Runyang Nicolas Lou, Peter H Sudmant

Summary: We developed loco-pipe, a Snakemake pipeline that seamlessly streamlines a set of essential population genomic analyses for low-coverage whole genome sequencing (lcWGS) data. loco-pipe is highly automated, easily customizable, massively parallelized, and thus is a valuable tool for both new and experienced users of lcWGS.

Availability and implementation: loco-pipe is published under the GPLv3. It is freely available on GitHub (github.com/sudmantlab/loco-pipe) and archived on Zenodo (doi.org/10.5281/zenodo.10425920).

摘要:我们开发了一个Snakemake管道--loco-pipe,它可以无缝简化低覆盖率全基因组测序(lcWGS)数据的一系列基本群体基因组分析。loco-pipe高度自动化、易于定制、大规模并行化,因此对于lcWGS的新用户和有经验的用户来说都是一个有价值的工具。它可在 GitHub(github.com/sudmantlab/loco-pipe)上免费获取,并在 Zenodo(doi.org/10.5281/zenodo.10425920)上存档。
{"title":"loco-pipe: an automated pipeline for population genomics with low-coverage whole-genome sequencing.","authors":"Zehua T Zhou, Gregory L Owens, Wesley A Larson, Runyang Nicolas Lou, Peter H Sudmant","doi":"10.1093/bioadv/vbae098","DOIUrl":"10.1093/bioadv/vbae098","url":null,"abstract":"<p><strong>Summary: </strong>We developed loco-pipe, a Snakemake pipeline that seamlessly streamlines a set of essential population genomic analyses for low-coverage whole genome sequencing (lcWGS) data. loco-pipe is highly automated, easily customizable, massively parallelized, and thus is a valuable tool for both new and experienced users of lcWGS.</p><p><strong>Availability and implementation: </strong>loco-pipe is published under the GPLv3. It is freely available on GitHub (github.com/sudmantlab/loco-pipe) and archived on Zenodo (doi.org/10.5281/zenodo.10425920).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae098"},"PeriodicalIF":2.4,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11246161/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141617759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
INCAWrapper: a Python wrapper for INCA for seamless data import, -export, and -processing. INCAWrapper:INCA 的 Python 封装器,用于无缝导入、导出和处理数据。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-07-04 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae100
Matthias Mattanovich, Viktor Hesselberg-Thomsen, Annette Lien, Dovydas Vaitkus, Victoria Sara Saad, Douglas McCloskey

Motivation: INCA is a powerful tool for metabolic flux analysis, however, import and export of data and results can be tedious and limit the use of INCA in automated workflows.

Results: The INCAWrapper enables the use of INCA purely through Python, which allows the use of INCA in common data science workflows.

Availability and implementation: The INCAWrapper is implemented in Python and can be found at https://github.com/biosustain/incawrapper. It is freely available under an MIT License. To run INCA, the user needs their own MATLAB and INCA licenses. INCA is freely available for noncommercial use at mfa.vueinnovations.com.

动机INCA 是一种强大的代谢通量分析工具,然而,数据和结果的导入和导出可能很繁琐,限制了 INCA 在自动化工作流中的使用:INCAWrapper可让用户纯粹通过Python使用INCA,从而在常见的数据科学工作流中使用INCA:INCAWrapper 使用 Python 实现,可在 https://github.com/biosustain/incawrapper 上找到。它在 MIT 许可下免费提供。要运行 INCA,用户需要自己的 MATLAB 和 INCA 许可证。INCA 可在 mfa.vueinnovations.com 免费用于非商业用途。
{"title":"INCAWrapper: a Python wrapper for INCA for seamless data import, -export, and -processing.","authors":"Matthias Mattanovich, Viktor Hesselberg-Thomsen, Annette Lien, Dovydas Vaitkus, Victoria Sara Saad, Douglas McCloskey","doi":"10.1093/bioadv/vbae100","DOIUrl":"10.1093/bioadv/vbae100","url":null,"abstract":"<p><strong>Motivation: </strong>INCA is a powerful tool for metabolic flux analysis, however, import and export of data and results can be tedious and limit the use of INCA in automated workflows.</p><p><strong>Results: </strong>The INCAWrapper enables the use of INCA purely through Python, which allows the use of INCA in common data science workflows.</p><p><strong>Availability and implementation: </strong>The INCAWrapper is implemented in Python and can be found at https://github.com/biosustain/incawrapper. It is freely available under an MIT License. To run INCA, the user needs their own MATLAB and INCA licenses. INCA is freely available for noncommercial use at mfa.vueinnovations.com.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae100"},"PeriodicalIF":2.4,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11245311/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141617758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A customizable secure DIY web application for accessing, sharing, and browsing aggregate experimental results and metadata. 可定制的安全 DIY 网络应用程序,用于访问、共享和浏览综合实验结果和元数据。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-06-28 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae087
Jaewoo Lee, Mehita Achuthan, Lucas Chen, Paulina Carmona-Mora

Summary: A problem spanning across many research fields is that processed data and research results are often scattered, which makes data access, analysis, extraction, and team sharing more challenging. We have developed a platform for researchers to easily manage tabular data with features like browsing, bookmarking, and linking to external open knowledge bases. The source code, originally designed for genomics research, is customizable for use by other fields or data, providing a no- to low-cost DIY system for research teams.

Availability and implementation: The source code of our DIY app is available on https://github.com/Carmona-MoraUCD/Human-Genomics-Browser. It can be downloaded and run by anyone with a web browser, Python3, and Node.js on their machine. The web application is licensed under the MIT license.

摘要:横跨许多研究领域的一个问题是,处理过的数据和研究成果往往是分散的,这使得数据访问、分析、提取和团队共享更具挑战性。我们为研究人员开发了一个平台,通过浏览、书签和链接外部开放知识库等功能,轻松管理表格数据。源代码最初是为基因组学研究设计的,可定制用于其他领域或数据,为研究团队提供了一个无成本到低成本的 DIY 系统:我们 DIY 应用程序的源代码可在 https://github.com/Carmona-MoraUCD/Human-Genomics-Browser 上获取。可用性和实施:我们的 DIY 应用程序源代码可在 上获取。任何人只要有网络浏览器、Python3 和 Node.js,就可以下载并运行该应用程序。该网络应用程序采用 MIT 许可。
{"title":"A customizable secure DIY web application for accessing, sharing, and browsing aggregate experimental results and metadata.","authors":"Jaewoo Lee, Mehita Achuthan, Lucas Chen, Paulina Carmona-Mora","doi":"10.1093/bioadv/vbae087","DOIUrl":"10.1093/bioadv/vbae087","url":null,"abstract":"<p><strong>Summary: </strong>A problem spanning across many research fields is that processed data and research results are often scattered, which makes data access, analysis, extraction, and team sharing more challenging. We have developed a platform for researchers to easily manage tabular data with features like browsing, bookmarking, and linking to external open knowledge bases. The source code, originally designed for genomics research, is customizable for use by other fields or data, providing a no- to low-cost DIY system for research teams.</p><p><strong>Availability and implementation: </strong>The source code of our DIY app is available on https://github.com/Carmona-MoraUCD/Human-Genomics-Browser. It can be downloaded and run by anyone with a web browser, Python3, and Node.js on their machine. The web application is licensed under the MIT license.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae087"},"PeriodicalIF":2.4,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11257709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics advances
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1