首页 > 最新文献

Bioinformatics advances最新文献

英文 中文
CAPTVRED: an automated pipeline for viral tracking and discovery from capture-based metagenomics samples. CAPTVRED:从基于捕获的元基因组学样本中自动追踪和发现病毒的管道。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-08 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae150
Maria Tarradas-Alemany, Sandra Martínez-Puchol, Cristina Mejías-Molina, Marta Itarte, Marta Rusiñol, Sílvia Bofill-Mas, Josep F Abril

Summary: Target Enrichment Sequencing or Capture-based metagenomics has emerged as an approach of interest for viral metagenomics in complex samples. However, these datasets are usually analyzed with standard downstream Bioinformatics analyses. CAPTVRED (Capture-based metagenomics Analysis Pipeline for tracking ViRal species from Environmental Datasets), has been designed to assess the virome present in complex samples, specially focused on those obtained by Target Enrichment Sequencing approach. This work aims to provide a user-friendly tool that complements this sequencing approach for the total or partial virome description, especially from environmental matrices. It includes a setup module which allows preparation and adjustment of the pipeline to any capture panel directed to a set of species of interest. The tool also aims to reduce time and computational cost, as well as to provide comprehensive, reproducible, and accessible results while being easy to costume, set up, and install.

Availability and implementation: Source code and test datasets are freely available at github repository: https://github.com/CompGenLabUB/CAPTVRED.git.

摘要:靶标富集测序或基于捕获的元基因组学已成为复杂样本中病毒元基因组学的一种有效方法。然而,这些数据集通常都要进行标准的下游生物信息学分析。CAPTVRED(基于捕获的元基因组学分析管道,用于从环境数据集中追踪病毒物种)旨在评估复杂样本中存在的病毒群,尤其侧重于通过目标富集测序方法获得的样本。这项工作旨在提供一种用户友好型工具,对这种测序方法进行补充,以描述全部或部分病毒群,尤其是环境基质中的病毒群。该工具包括一个设置模块,可针对一组感兴趣的物种准备和调整管道,以适应任何捕获面板。该工具还旨在减少时间和计算成本,并提供全面、可重复和可访问的结果,同时易于安装、设置和安装:源代码和测试数据集可在 github 存储库中免费获取:https://github.com/CompGenLabUB/CAPTVRED.git。
{"title":"CAPTVRED: an automated pipeline for viral tracking and discovery from capture-based metagenomics samples.","authors":"Maria Tarradas-Alemany, Sandra Martínez-Puchol, Cristina Mejías-Molina, Marta Itarte, Marta Rusiñol, Sílvia Bofill-Mas, Josep F Abril","doi":"10.1093/bioadv/vbae150","DOIUrl":"https://doi.org/10.1093/bioadv/vbae150","url":null,"abstract":"<p><strong>Summary: </strong>Target Enrichment Sequencing or Capture-based metagenomics has emerged as an approach of interest for viral metagenomics in complex samples. However, these datasets are usually analyzed with standard downstream Bioinformatics analyses. CAPTVRED (<i>Capture-based metagenomics Analysis Pipeline for tracking ViRal species from Environmental Datasets</i>), has been designed to assess the virome present in complex samples, specially focused on those obtained by Target Enrichment Sequencing approach. This work aims to provide a user-friendly tool that complements this sequencing approach for the total or partial virome description, especially from environmental matrices. It includes a setup module which allows preparation and adjustment of the pipeline to any capture panel directed to a set of species of interest. The tool also aims to reduce time and computational cost, as well as to provide comprehensive, reproducible, and accessible results while being easy to costume, set up, and install.</p><p><strong>Availability and implementation: </strong>Source code and test datasets are freely available at github repository: https://github.com/CompGenLabUB/CAPTVRED.git.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae150"},"PeriodicalIF":2.4,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11495672/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DynProfiler: a Python package for comprehensive analysis and interpretation of signaling dynamics leveraged by deep learning techniques. DynProfiler:利用深度学习技术对信号动态进行综合分析和解释的 Python 软件包。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-07 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae145
Masato Tsutsui, Mariko Okada

Summary: Signaling dynamics encode important features and regulatory mechanisms of biological systems, and recent studies have reported the use of simulated signaling dynamics with mechanistic modeling as biomarkers for human diseases. Given the success of deep learning techniques, it is expected that they can extract informative patterns from simulation results more effectively than traditional approaches involving manual feature selection, which can be used for subsequent analyses, such as patient stratification and survival prediction. Here, we propose DynProfiler, which utilizes the entire signaling dynamics, including intermediate variables, as input and leverages deep learning techniques to extract informative features without requiring any labels. Furthermore, DynProfiler incorporates a modern explainable AI solution to provide quantitative time-dependent importance scores for each dynamics. Using simulated dynamics of patients with breast cancer as an example, we demonstrate DynProfiler's ability to extract high-quality features that can predict mortality risk and identify important dynamics, highlighting upregulated phosphorylated GSK3β as a biomarker for poor prognosis. Overall, this tool can be useful for clinical application, as well as for elucidating biological system dynamics.

Availability and implementation: The DynProfiler Python library is available in GitHub at https://github.com/okadalabipr/DynProfiler.

摘要:信号动力学编码了生物系统的重要特征和调控机制,最近的研究报道了利用模拟信号动力学机理模型作为人类疾病的生物标志物。鉴于深度学习技术的成功,与传统的人工特征选择方法相比,深度学习技术有望更有效地从模拟结果中提取信息模式,并将其用于患者分层和生存预测等后续分析。在此,我们提出了 DynProfiler,它利用包括中间变量在内的整个信号动态作为输入,并利用深度学习技术提取信息特征,而无需任何标签。此外,DynProfiler 还采用了现代可解释人工智能解决方案,为每个动力学提供量化的随时间变化的重要性评分。以乳腺癌患者的模拟动态为例,我们展示了 DynProfiler 提取高质量特征的能力,这些特征可以预测死亡风险并识别重要动态,突出显示上调的磷酸化 GSK3β 是不良预后的生物标志物。总之,该工具可用于临床应用以及阐明生物系统动力学:DynProfiler Python 库可在 GitHub 上获取:https://github.com/okadalabipr/DynProfiler。
{"title":"DynProfiler: a Python package for comprehensive analysis and interpretation of signaling dynamics leveraged by deep learning techniques.","authors":"Masato Tsutsui, Mariko Okada","doi":"10.1093/bioadv/vbae145","DOIUrl":"10.1093/bioadv/vbae145","url":null,"abstract":"<p><strong>Summary: </strong>Signaling dynamics encode important features and regulatory mechanisms of biological systems, and recent studies have reported the use of simulated signaling dynamics with mechanistic modeling as biomarkers for human diseases. Given the success of deep learning techniques, it is expected that they can extract informative patterns from simulation results more effectively than traditional approaches involving manual feature selection, which can be used for subsequent analyses, such as patient stratification and survival prediction. Here, we propose DynProfiler, which utilizes the entire signaling dynamics, including intermediate variables, as input and leverages deep learning techniques to extract informative features without requiring any labels. Furthermore, DynProfiler incorporates a modern explainable AI solution to provide quantitative time-dependent importance scores for each dynamics. Using simulated dynamics of patients with breast cancer as an example, we demonstrate DynProfiler's ability to extract high-quality features that can predict mortality risk and identify important dynamics, highlighting upregulated phosphorylated GSK3β as a biomarker for poor prognosis. Overall, this tool can be useful for clinical application, as well as for elucidating biological system dynamics.</p><p><strong>Availability and implementation: </strong>The DynProfiler Python library is available in GitHub at https://github.com/okadalabipr/DynProfiler.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae145"},"PeriodicalIF":2.4,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11464416/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: Utilizing biological experimental data and molecular dynamics for the classification of mutational hotspots through machine learning. Correction to:利用生物实验数据和分子动力学,通过机器学习对突变热点进行分类。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-04 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae140

[This corrects the article DOI: 10.1093/bioadv/vbae125.].

[此处更正了文章 DOI:10.1093/bioadv/vbae125]。
{"title":"Correction to: Utilizing biological experimental data and molecular dynamics for the classification of mutational hotspots through machine learning.","authors":"","doi":"10.1093/bioadv/vbae140","DOIUrl":"https://doi.org/10.1093/bioadv/vbae140","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.1093/bioadv/vbae125.].</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae140"},"PeriodicalIF":2.4,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11453097/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142382608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
tidysbml: R/Bioconductor package for SBML extraction into dataframes. tidysbml:用于将 SBML 提取到数据帧中的 R/Bioconductor 软件包。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-03 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae148
Veronica Paparozzi, Christine Nardini

Summary: We present tidysbml, an R package able to perform compartments, species, and reactions data extraction from Systems Biology Markup Language (SBML) documents (up to Level 3) in tabular data structures (i.e. R dataframes) to easily access and handle the richness of the biological information. Thanks to its output format, the package facilitates data manipulation, enabling manageable construction, and therefore analysis, of custom networks, as well as data retrieval, by means of R packages such as igraph, RCy3, and biomaRt. Exemplar data (i.e. SBML files) are extracted from Reactome.

Availability and implementation: The tidysbml R package is distributed under CC BY 4.0 License and can be found publicly available in Bioconductor (https://bioconductor.org/packages/tidysbml) and on GitHub (https://github.com/veronicapaparozzi/tidysbml).

摘要:我们介绍的 tidysbml 是一个 R 软件包,它能够以表格数据结构(即 R 数据框)从系统生物学标记语言(SBML)文档(最高 3 级)中提取区系、物种和反应数据,从而轻松访问和处理丰富的生物信息。得益于其输出格式,该软件包方便了数据操作,可通过 igraph、RCy3 和 biomaRt 等 R 软件包管理自定义网络的构建和分析,以及数据检索。 示例数据(即 SBML 文件)从 Reactome.Availability 和实现中提取:tidysbml R 软件包以 CC BY 4.0 许可发布,可在 Bioconductor (https://bioconductor.org/packages/tidysbml) 和 GitHub (https://github.com/veronicapaparozzi/tidysbml) 上公开获取。
{"title":"tidysbml: R/Bioconductor package for SBML extraction into dataframes.","authors":"Veronica Paparozzi, Christine Nardini","doi":"10.1093/bioadv/vbae148","DOIUrl":"https://doi.org/10.1093/bioadv/vbae148","url":null,"abstract":"<p><strong>Summary: </strong>We present <i>tidysbml</i>, an R package able to perform <i>compartments</i>, <i>species</i>, and <i>reactions</i> data extraction from Systems Biology Markup Language (SBML) documents (up to Level 3) in tabular data structures (i.e. R dataframes) to easily access and handle the richness of the biological information. Thanks to its output format, the package facilitates data manipulation, enabling manageable construction, and therefore analysis, of custom networks, as well as data retrieval, by means of R packages such as <i>igraph</i>, <i>RCy3</i>, and <i>biomaRt</i>. Exemplar data (i.e. SBML files) are extracted from Reactome.</p><p><strong>Availability and implementation: </strong>The <i>tidysbml</i> R package is distributed under CC BY 4.0 License and can be found publicly available in Bioconductor (https://bioconductor.org/packages/tidysbml) and on GitHub (https://github.com/veronicapaparozzi/tidysbml).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae148"},"PeriodicalIF":2.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11479578/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A modern multi-omics data exploration experience with Panomicon. 利用 Panomicon 体验现代多组学数据探索。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-03 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae147
Rodolfo S Allendes Osorio, Yuji Kosugi, Johan T Nyström-Persson, Kenji Mizuguchi, Yayoi Natsume-Kitatani

Summary: To address the challenges of the storage, sharing, and analysis of multi-omics data, here we introduce the newest version of Panomicon, which includes the improvement of the underlying data model, the introduction of new registration and control access service, together with the seamless integration with other services (like TargetMine for data enrichment analysis), integrated in a completely new, more user friendly web application.

Availability and implementation: Panomicon is available online at https://panomicon.nibiohn.go.jp. Unregistered users can access the publicly available data uploaded to Panomicon using the following account: user: guest, password: anonymous. Source code for the application is also freely available under a GNU license at https://github.com/Toxygates/Panomicon/. A brief user guide for the new features of Panomicon is provided as supplementary material online.

摘要:为了解决多组学数据的存储、共享和分析难题,我们在此介绍 Panomicon 的最新版本,其中包括底层数据模型的改进、新注册和控制访问服务的引入,以及与其他服务(如用于数据富集分析的 TargetMine)的无缝集成,这些都集成在一个全新的、用户更友好的网络应用程序中:Panomicon 可通过 https://panomicon.nibiohn.go.jp 在线访问。未注册用户可使用以下账户访问上传到 Panomicon 的公开数据:用户:guest,密码:anonymous。应用程序的源代码也可在 GNU 许可证下免费获取,网址是 https://github.com/Toxygates/Panomicon/。有关 Panomicon 新功能的简要用户指南作为补充材料在线提供。
{"title":"A modern multi-omics data exploration experience with Panomicon.","authors":"Rodolfo S Allendes Osorio, Yuji Kosugi, Johan T Nyström-Persson, Kenji Mizuguchi, Yayoi Natsume-Kitatani","doi":"10.1093/bioadv/vbae147","DOIUrl":"https://doi.org/10.1093/bioadv/vbae147","url":null,"abstract":"<p><strong>Summary: </strong>To address the challenges of the storage, sharing, and analysis of multi-omics data, here we introduce the newest version of Panomicon, which includes the improvement of the underlying data model, the introduction of new registration and control access service, together with the seamless integration with other services (like TargetMine for data enrichment analysis), integrated in a completely new, more user friendly web application.</p><p><strong>Availability and implementation: </strong>Panomicon is available online at https://panomicon.nibiohn.go.jp. Unregistered users can access the publicly available data uploaded to Panomicon using the following account: user: guest, password: anonymous. Source code for the application is also freely available under a GNU license at https://github.com/Toxygates/Panomicon/. A brief user guide for the new features of Panomicon is provided as supplementary material online.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae147"},"PeriodicalIF":2.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520228/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
iTraNet: a web-based platform for integrated trans-omics network visualization and analysis. iTraNet:基于网络的跨组学网络可视化综合分析平台。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-30 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae141
Hikaru Sugimoto, Keigo Morita, Dongzi Li, Yunfan Bai, Matthias Mattanovich, Shinya Kuroda

Motivation: Visualization and analysis of biological networks play crucial roles in understanding living systems. Biological networks include diverse types, from gene regulatory networks and protein-protein interactions to metabolic networks. Metabolic networks include substrates, products, and enzymes, which are regulated by allosteric mechanisms and gene expression. However, the analysis of these diverse omics types is challenging due to the diversity of databases and the complexity of network analysis.

Results: We developed iTraNet, a web application that visualizes and analyses trans-omics networks involving four types of networks: gene regulatory networks, protein-protein interactions, metabolic networks, and metabolite exchange networks. Using iTraNet, we found that in wild-type mice, hub molecules within the network tended to respond to glucose administration, whereas in ob/ob mice, this tendency disappeared. With its ability to facilitate network analysis, we anticipate that iTraNet will help researchers gain insights into living systems.

Availability and implementation: iTraNet is available at https://itranet.streamlit.app/.

动机生物网络的可视化和分析在了解生命系统方面发挥着至关重要的作用。生物网络包括多种类型,从基因调控网络、蛋白质-蛋白质相互作用到代谢网络。代谢网络包括底物、产物和酶,它们受到异构机制和基因表达的调控。然而,由于数据库的多样性和网络分析的复杂性,对这些不同类型的 omics 进行分析具有挑战性:我们开发了 iTraNet,它是一种网络应用程序,用于可视化和分析涉及四种类型网络的跨组学网络:基因调控网络、蛋白质-蛋白质相互作用、代谢网络和代谢物交换网络。利用 iTraNet,我们发现在野生型小鼠中,网络内的枢纽分子倾向于对葡萄糖给药做出反应,而在肥胖/肥胖小鼠中,这种倾向消失了。由于 iTraNet 能够促进网络分析,我们预计它将帮助研究人员深入了解生命系统。可用性和实施:iTraNet 可在 https://itranet.streamlit.app/ 上获取。
{"title":"iTraNet: a web-based platform for integrated trans-omics network visualization and analysis.","authors":"Hikaru Sugimoto, Keigo Morita, Dongzi Li, Yunfan Bai, Matthias Mattanovich, Shinya Kuroda","doi":"10.1093/bioadv/vbae141","DOIUrl":"https://doi.org/10.1093/bioadv/vbae141","url":null,"abstract":"<p><strong>Motivation: </strong>Visualization and analysis of biological networks play crucial roles in understanding living systems. Biological networks include diverse types, from gene regulatory networks and protein-protein interactions to metabolic networks. Metabolic networks include substrates, products, and enzymes, which are regulated by allosteric mechanisms and gene expression. However, the analysis of these diverse omics types is challenging due to the diversity of databases and the complexity of network analysis.</p><p><strong>Results: </strong>We developed iTraNet, a web application that visualizes and analyses trans-omics networks involving four types of networks: gene regulatory networks, protein-protein interactions, metabolic networks, and metabolite exchange networks. Using iTraNet, we found that in wild-type mice, hub molecules within the network tended to respond to glucose administration, whereas in <i>ob/ob</i> mice, this tendency disappeared. With its ability to facilitate network analysis, we anticipate that iTraNet will help researchers gain insights into living systems.</p><p><strong>Availability and implementation: </strong>iTraNet is available at https://itranet.streamlit.app/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae141"},"PeriodicalIF":2.4,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNAelem: an algorithm for discovering sequence-structure motifs in RNA bound by RNA-binding proteins. RNAelem:一种发现由 RNA 结合蛋白结合的 RNA 中序列结构图案的算法。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-28 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae144
Hiroshi Miyake, Risa Karakida Kawaguchi, Hisanori Kiryu

Motivation: RNA-binding proteins (RBPs) play a crucial role in the post-transcriptional regulation of RNA. Given their importance, analyzing the specific RNA patterns recognized by RBPs has become a significant research focus in bioinformatics. Deep Neural Networks have enhanced the accuracy of prediction for RBP-binding sites, yet understanding the structural basis of RBP-binding specificity from these models is challenging due to their limited interpretability. To address this, we developed RNAelem, which combines profile context-free grammar and the Turner energy model for RNA secondary structure to predict sequence-structure motifs in RBP-binding regions.

Results: RNAelem exhibited superior detection accuracy compared to existing tools for RNA sequences with structural motifs. Upon applying RNAelem to the eCLIP database, we were not only able to reproduce many known primary sequence motifs in the absence of secondary structures, but also discovered many secondary structural motifs that contained sequence-nonspecific insertion regions. Furthermore, the high interpretability of RNAelem yielded insightful findings such as long-range base-pairing interactions in the binding region of the U2AF protein.

Availability and implementation: The code is available at https://github.com/iyak/RNAelem.

动机RNA 结合蛋白(RBPs)在 RNA 转录后调控中发挥着至关重要的作用。鉴于其重要性,分析 RBPs 识别的特定 RNA 模式已成为生物信息学的一个重要研究重点。深度神经网络提高了 RBP 结合位点预测的准确性,但由于其可解释性有限,从这些模型中了解 RBP 结合特异性的结构基础具有挑战性。为了解决这个问题,我们开发了 RNAelem,它结合了 RNA 二级结构的剖面无上下文语法和 Turner 能量模型,来预测 RBP 结合区域的序列结构主题:结果:与现有工具相比,RNAelem 对具有结构基调的 RNA 序列的检测准确率更高。将 RNAelem 应用于 eCLIP 数据库后,我们不仅能够在没有二级结构的情况下重现许多已知的一级序列主题,而且还发现了许多包含序列非特异性插入区域的二级结构主题。此外,RNAelem 的高可解释性还产生了一些有见地的发现,如 U2AF 蛋白结合区的长程碱基配对相互作用:代码见 https://github.com/iyak/RNAelem。
{"title":"RNAelem: an algorithm for discovering sequence-structure motifs in RNA bound by RNA-binding proteins.","authors":"Hiroshi Miyake, Risa Karakida Kawaguchi, Hisanori Kiryu","doi":"10.1093/bioadv/vbae144","DOIUrl":"https://doi.org/10.1093/bioadv/vbae144","url":null,"abstract":"<p><strong>Motivation: </strong>RNA-binding proteins (RBPs) play a crucial role in the post-transcriptional regulation of RNA. Given their importance, analyzing the specific RNA patterns recognized by RBPs has become a significant research focus in bioinformatics. Deep Neural Networks have enhanced the accuracy of prediction for RBP-binding sites, yet understanding the structural basis of RBP-binding specificity from these models is challenging due to their limited interpretability. To address this, we developed RNAelem, which combines profile context-free grammar and the Turner energy model for RNA secondary structure to predict sequence-structure motifs in RBP-binding regions.</p><p><strong>Results: </strong>RNAelem exhibited superior detection accuracy compared to existing tools for RNA sequences with structural motifs. Upon applying RNAelem to the eCLIP database, we were not only able to reproduce many known primary sequence motifs in the absence of secondary structures, but also discovered many secondary structural motifs that contained sequence-nonspecific insertion regions. Furthermore, the high interpretability of RNAelem yielded insightful findings such as long-range base-pairing interactions in the binding region of the U2AF protein.</p><p><strong>Availability and implementation: </strong>The code is available at https://github.com/iyak/RNAelem.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae144"},"PeriodicalIF":2.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations. FAVOR-GPT:全基因组变异功能注释的自然语言生成界面。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-28 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae143
Thomas Cheng Li, Hufeng Zhou, Vineet Verma, Xiangru Tang, Yanjun Shao, Eric Van Buren, Zhiping Weng, Mark Gerstein, Benjamin Neale, Shamil R Sunyaev, Xihong Lin

Motivation: Functional Annotation of genomic Variants Online Resources (FAVOR) offers multi-faceted, whole genome variant functional annotations, which is essential for Whole Genome and Exome Sequencing (WGS/WES) analysis and the functional prioritization of disease-associated variants. A versatile chatbot designed to facilitate informative interpretation and interactive, user-centric summary of the whole genome variant functional annotation data in the FAVOR database is needed.

Results: We have developed FAVOR-GPT, a generative natural language interface powered by integrating large language models (LLMs) and FAVOR. It is developed based on the Retrieval Augmented Generation (RAG) approach, and complements the original FAVOR portal, enhancing usability for users, especially those without specialized expertise. FAVOR-GPT simplifies raw annotations by providing interpretable explanations and result summaries in response to the user's prompt. It shows high accuracy when cross-referencing with the FAVOR database, underscoring the robustness of the retrieval framework.

Availability and implementation: Researchers can access FAVOR-GPT at FAVOR's main website (https://favor.genohub.org).

动因:基因组变异在线资源功能注释(FAVOR)提供了多方面的全基因组变异功能注释,这对于全基因组和外显子组测序(WGS/WES)分析以及疾病相关变异的功能优先排序至关重要。我们需要一个多功能聊天机器人,以方便对 FAVOR 数据库中的全基因组变异体功能注释数据进行信息解读和以用户为中心的交互式总结:我们开发了 FAVOR-GPT,这是一个通过整合大型语言模型(LLMs)和 FAVOR 来驱动的生成式自然语言界面。它是基于检索增强生成(RAG)方法开发的,是对原有 FAVOR 门户网站的补充,提高了用户的可用性,尤其是那些没有专业知识的用户。FAVOR-GPT 根据用户的提示提供可解释的解释和结果摘要,从而简化了原始注释。在与 FAVOR 数据库交叉引用时,它显示出很高的准确性,突出了检索框架的稳健性:研究人员可从 FAVOR 的主网站 (https://favor.genohub.org) 访问 FAVOR-GPT。
{"title":"FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations.","authors":"Thomas Cheng Li, Hufeng Zhou, Vineet Verma, Xiangru Tang, Yanjun Shao, Eric Van Buren, Zhiping Weng, Mark Gerstein, Benjamin Neale, Shamil R Sunyaev, Xihong Lin","doi":"10.1093/bioadv/vbae143","DOIUrl":"10.1093/bioadv/vbae143","url":null,"abstract":"<p><strong>Motivation: </strong>Functional Annotation of genomic Variants Online Resources (FAVOR) offers multi-faceted, whole genome variant functional annotations, which is essential for Whole Genome and Exome Sequencing (WGS/WES) analysis and the functional prioritization of disease-associated variants. A versatile chatbot designed to facilitate informative interpretation and interactive, user-centric summary of the whole genome variant functional annotation data in the FAVOR database is needed.</p><p><strong>Results: </strong>We have developed FAVOR-GPT, a generative natural language interface powered by integrating large language models (LLMs) and FAVOR. It is developed based on the Retrieval Augmented Generation (RAG) approach, and complements the original FAVOR portal, enhancing usability for users, especially those without specialized expertise. FAVOR-GPT simplifies raw annotations by providing interpretable explanations and result summaries in response to the user's prompt. It shows high accuracy when cross-referencing with the FAVOR database, underscoring the robustness of the retrieval framework.</p><p><strong>Availability and implementation: </strong>Researchers can access FAVOR-GPT at FAVOR's main website (https://favor.genohub.org).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae143"},"PeriodicalIF":2.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meeting the challenge of genomic analysis: a collaboratively developed workshop for pangenomics and topological data analysis. 迎接基因组分析的挑战:合作开发的泛基因组学和拓扑数据分析研讨会。
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-27 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae139
Haydeé Contreras-Peruyero, Shaday Guerrero-Flores, Claudia Zirión-Martínez, Paulina M Mejía-Ponce, Marisol Navarro-Miranda, J Abel Lovaco-Flores, José M Ibarra-Rodríguez, Anton Pashkov, Cuauhtémoc Licona-Cassani, Nelly Sélem-Mojica

Motivation: As genomics data analysis becomes increasingly intricate, researchers face the challenge of mastering various software tools. The rise of Pangenomics analysis, which examines the complete set of genes in a group of genomes, is particularly transformative in understanding genetic diversity. Our interdisciplinary team of biologists and mathematicians developed a short Pangenomics Workshop covering Bash, Python scripting, Pangenome, and Topological Data Analysis. These skills provide deeper insights into genetic variations and their implications in Evolutionary Biology. The workshop uses a Conda environment for reproducibility and accessibility. Developed in The Carpentries Incubator infrastructure, the workshop aims to equip researchers with essential skills for Pangenomics research. By emphasizing the role of a community of practice, this work underscores its significance in empowering multidisciplinary professionals to collaboratively develop training that adheres to best practices.

Results: Our workshop delivers tangible outcomes by enhancing the skill sets of Computational Biology professionals. Participants gain hands-on experience using real data from the first described pangenome. We share our paths toward creating an open-source, multidisciplinary, and public resource where learners can develop expertise in Pangenomic Analysis. This initiative goes beyond advancing individual capabilities, aligning with the broader mission of addressing educational needs in Computational Biology.

Availability and implementation: https://carpentries-incubator.github.io/pangenomics-workshop/.

动机随着基因组学数据分析变得越来越复杂,研究人员面临着掌握各种软件工具的挑战。庞基因组学(Pangenomics)分析可以检查一组基因组中的全套基因,它的兴起对于理解遗传多样性尤其具有变革性意义。我们由生物学家和数学家组成的跨学科团队开发了一个简短的 Pangenomics 讲习班,内容包括 Bash、Python 脚本、Pangenome 和拓扑数据分析。这些技能有助于深入了解遗传变异及其对进化生物学的影响。该工作坊使用 Conda 环境,具有可重复性和可访问性。该讲习班是在 Carpentries 孵化器基础设施的基础上开发的,旨在让研究人员掌握庞基因组学研究的基本技能。通过强调实践社区的作用,这项工作强调了其在增强多学科专业人员合作开发符合最佳实践的培训方面的意义:我们的研讨会通过提高计算生物学专业人员的技能组合取得了切实的成果。学员们通过使用首次描述的泛基因组的真实数据获得了实践经验。我们分享了创建一个开源、多学科和公共资源的路径,学习者可以在此开发庞基因组分析方面的专业知识。这项计划不仅能提高个人能力,还能满足计算生物学教育需求的更广泛使命。可用性和实施:https://carpentries-incubator.github.io/pangenomics-workshop/。
{"title":"Meeting the challenge of genomic analysis: a collaboratively developed workshop for pangenomics and topological data analysis.","authors":"Haydeé Contreras-Peruyero, Shaday Guerrero-Flores, Claudia Zirión-Martínez, Paulina M Mejía-Ponce, Marisol Navarro-Miranda, J Abel Lovaco-Flores, José M Ibarra-Rodríguez, Anton Pashkov, Cuauhtémoc Licona-Cassani, Nelly Sélem-Mojica","doi":"10.1093/bioadv/vbae139","DOIUrl":"10.1093/bioadv/vbae139","url":null,"abstract":"<p><strong>Motivation: </strong>As genomics data analysis becomes increasingly intricate, researchers face the challenge of mastering various software tools. The rise of Pangenomics analysis, which examines the complete set of genes in a group of genomes, is particularly transformative in understanding genetic diversity. Our interdisciplinary team of biologists and mathematicians developed a short Pangenomics Workshop covering Bash, Python scripting, Pangenome, and Topological Data Analysis. These skills provide deeper insights into genetic variations and their implications in Evolutionary Biology. The workshop uses a Conda environment for reproducibility and accessibility. Developed in The Carpentries Incubator infrastructure, the workshop aims to equip researchers with essential skills for Pangenomics research. By emphasizing the role of a community of practice, this work underscores its significance in empowering multidisciplinary professionals to collaboratively develop training that adheres to best practices.</p><p><strong>Results: </strong>Our workshop delivers tangible outcomes by enhancing the skill sets of Computational Biology professionals. Participants gain hands-on experience using real data from the first described pangenome. We share our paths toward creating an open-source, multidisciplinary, and public resource where learners can develop expertise in Pangenomic Analysis. This initiative goes beyond advancing individual capabilities, aligning with the broader mission of addressing educational needs in Computational Biology.</p><p><strong>Availability and implementation: </strong>https://carpentries-incubator.github.io/pangenomics-workshop/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae139"},"PeriodicalIF":2.4,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142559613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How pairs of insertion mutations impact protein structure: an exhaustive computational study.
IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-27 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae138
Changrui Li, Yang Zheng, Filip Jagodzinski

Summary: Understanding how amino acid insertion mutations affect protein structure can inform pharmaceutical efforts targeting diseases that are caused by protein mutants. In silico simulation of mutations complements experiments performed on physical proteins which are time and cost prohibitive. We have computationally generated the exhaustive sets of two amino acid insertion mutations for five protein structures in the Protein Data Bank. To probe and identify how pairs of insertions affect structural stability and flexibility, we tally the count of hydrogen bonds and analyze a variety of metrics of each mutant. We identify hotspots where pairs of insertions have a pronounced effect, and study how amino acid properties such as size and type, and insertion into alpha helices, affect a protein's structure. The findings show that although there are some residues, Proline and Tryptophan specifically, which if inserted have a significant impact on the protein's structure, there is also a great deal of variance in the effects of the exhaustive insertions both for any single protein, and across the five proteins. That suggests that computational or otherwise quantitative efforts should consider large representative sample sizes especially when training models to make predictions about the effects of insertions.

Availability and implementation: The data underlying this article is available at https://multimute.cs.wwu.edu.

摘要:了解氨基酸插入突变如何影响蛋白质结构,可以为针对蛋白质突变引起的疾病的制药工作提供信息。在物理蛋白质上进行的实验既费时又费钱,而对突变进行硅学模拟则是对这些实验的补充。我们通过计算为蛋白质数据库中的五个蛋白质结构生成了两个氨基酸插入突变的详尽集合。为了探究和确定成对的插入如何影响结构的稳定性和灵活性,我们统计了氢键的数量,并分析了每个突变体的各种指标。我们确定了成对插入影响明显的热点,并研究了氨基酸的特性(如大小和类型)以及插入α螺旋对蛋白质结构的影响。研究结果表明,虽然有一些残基,特别是脯氨酸和色氨酸,一旦插入就会对蛋白质的结构产生重大影响,但无论是对任何一种蛋白质,还是对五种蛋白质来说,详尽插入的影响都存在很大差异。这表明计算或其他定量工作应考虑具有代表性的大样本量,尤其是在训练模型以预测插入效应时:本文所依据的数据可从 https://multimute.cs.wwu.edu 网站获取。
{"title":"How pairs of insertion mutations impact protein structure: an exhaustive computational study.","authors":"Changrui Li, Yang Zheng, Filip Jagodzinski","doi":"10.1093/bioadv/vbae138","DOIUrl":"10.1093/bioadv/vbae138","url":null,"abstract":"<p><strong>Summary: </strong>Understanding how amino acid insertion mutations affect protein structure can inform pharmaceutical efforts targeting diseases that are caused by protein mutants. <i>In silico</i> simulation of mutations complements experiments performed on physical proteins which are time and cost prohibitive. We have computationally generated the exhaustive sets of two amino acid insertion mutations for five protein structures in the Protein Data Bank. To probe and identify how pairs of insertions affect structural stability and flexibility, we tally the count of hydrogen bonds and analyze a variety of metrics of each mutant. We identify hotspots where pairs of insertions have a pronounced effect, and study how amino acid properties such as size and type, and insertion into alpha helices, affect a protein's structure. The findings show that although there are some residues, Proline and Tryptophan specifically, which if inserted have a significant impact on the protein's structure, there is also a great deal of variance in the effects of the exhaustive insertions both for any single protein, and across the five proteins. That suggests that computational or otherwise quantitative efforts should consider large representative sample sizes especially when training models to make predictions about the effects of insertions.</p><p><strong>Availability and implementation: </strong>The data underlying this article is available at https://multimute.cs.wwu.edu.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae138"},"PeriodicalIF":2.4,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142831007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics advances
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1