Pub Date : 2024-10-08eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae150
Maria Tarradas-Alemany, Sandra Martínez-Puchol, Cristina Mejías-Molina, Marta Itarte, Marta Rusiñol, Sílvia Bofill-Mas, Josep F Abril
Summary: Target Enrichment Sequencing or Capture-based metagenomics has emerged as an approach of interest for viral metagenomics in complex samples. However, these datasets are usually analyzed with standard downstream Bioinformatics analyses. CAPTVRED (Capture-based metagenomics Analysis Pipeline for tracking ViRal species from Environmental Datasets), has been designed to assess the virome present in complex samples, specially focused on those obtained by Target Enrichment Sequencing approach. This work aims to provide a user-friendly tool that complements this sequencing approach for the total or partial virome description, especially from environmental matrices. It includes a setup module which allows preparation and adjustment of the pipeline to any capture panel directed to a set of species of interest. The tool also aims to reduce time and computational cost, as well as to provide comprehensive, reproducible, and accessible results while being easy to costume, set up, and install.
Availability and implementation: Source code and test datasets are freely available at github repository: https://github.com/CompGenLabUB/CAPTVRED.git.
{"title":"CAPTVRED: an automated pipeline for viral tracking and discovery from capture-based metagenomics samples.","authors":"Maria Tarradas-Alemany, Sandra Martínez-Puchol, Cristina Mejías-Molina, Marta Itarte, Marta Rusiñol, Sílvia Bofill-Mas, Josep F Abril","doi":"10.1093/bioadv/vbae150","DOIUrl":"https://doi.org/10.1093/bioadv/vbae150","url":null,"abstract":"<p><strong>Summary: </strong>Target Enrichment Sequencing or Capture-based metagenomics has emerged as an approach of interest for viral metagenomics in complex samples. However, these datasets are usually analyzed with standard downstream Bioinformatics analyses. CAPTVRED (<i>Capture-based metagenomics Analysis Pipeline for tracking ViRal species from Environmental Datasets</i>), has been designed to assess the virome present in complex samples, specially focused on those obtained by Target Enrichment Sequencing approach. This work aims to provide a user-friendly tool that complements this sequencing approach for the total or partial virome description, especially from environmental matrices. It includes a setup module which allows preparation and adjustment of the pipeline to any capture panel directed to a set of species of interest. The tool also aims to reduce time and computational cost, as well as to provide comprehensive, reproducible, and accessible results while being easy to costume, set up, and install.</p><p><strong>Availability and implementation: </strong>Source code and test datasets are freely available at github repository: https://github.com/CompGenLabUB/CAPTVRED.git.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae150"},"PeriodicalIF":2.4,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11495672/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-07eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae145
Masato Tsutsui, Mariko Okada
Summary: Signaling dynamics encode important features and regulatory mechanisms of biological systems, and recent studies have reported the use of simulated signaling dynamics with mechanistic modeling as biomarkers for human diseases. Given the success of deep learning techniques, it is expected that they can extract informative patterns from simulation results more effectively than traditional approaches involving manual feature selection, which can be used for subsequent analyses, such as patient stratification and survival prediction. Here, we propose DynProfiler, which utilizes the entire signaling dynamics, including intermediate variables, as input and leverages deep learning techniques to extract informative features without requiring any labels. Furthermore, DynProfiler incorporates a modern explainable AI solution to provide quantitative time-dependent importance scores for each dynamics. Using simulated dynamics of patients with breast cancer as an example, we demonstrate DynProfiler's ability to extract high-quality features that can predict mortality risk and identify important dynamics, highlighting upregulated phosphorylated GSK3β as a biomarker for poor prognosis. Overall, this tool can be useful for clinical application, as well as for elucidating biological system dynamics.
Availability and implementation: The DynProfiler Python library is available in GitHub at https://github.com/okadalabipr/DynProfiler.
{"title":"DynProfiler: a Python package for comprehensive analysis and interpretation of signaling dynamics leveraged by deep learning techniques.","authors":"Masato Tsutsui, Mariko Okada","doi":"10.1093/bioadv/vbae145","DOIUrl":"10.1093/bioadv/vbae145","url":null,"abstract":"<p><strong>Summary: </strong>Signaling dynamics encode important features and regulatory mechanisms of biological systems, and recent studies have reported the use of simulated signaling dynamics with mechanistic modeling as biomarkers for human diseases. Given the success of deep learning techniques, it is expected that they can extract informative patterns from simulation results more effectively than traditional approaches involving manual feature selection, which can be used for subsequent analyses, such as patient stratification and survival prediction. Here, we propose DynProfiler, which utilizes the entire signaling dynamics, including intermediate variables, as input and leverages deep learning techniques to extract informative features without requiring any labels. Furthermore, DynProfiler incorporates a modern explainable AI solution to provide quantitative time-dependent importance scores for each dynamics. Using simulated dynamics of patients with breast cancer as an example, we demonstrate DynProfiler's ability to extract high-quality features that can predict mortality risk and identify important dynamics, highlighting upregulated phosphorylated GSK3β as a biomarker for poor prognosis. Overall, this tool can be useful for clinical application, as well as for elucidating biological system dynamics.</p><p><strong>Availability and implementation: </strong>The DynProfiler Python library is available in GitHub at https://github.com/okadalabipr/DynProfiler.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae145"},"PeriodicalIF":2.4,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11464416/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142402170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-04eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae140
[This corrects the article DOI: 10.1093/bioadv/vbae125.].
[此处更正了文章 DOI:10.1093/bioadv/vbae125]。
{"title":"Correction to: Utilizing biological experimental data and molecular dynamics for the classification of mutational hotspots through machine learning.","authors":"","doi":"10.1093/bioadv/vbae140","DOIUrl":"https://doi.org/10.1093/bioadv/vbae140","url":null,"abstract":"<p><p>[This corrects the article DOI: 10.1093/bioadv/vbae125.].</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae140"},"PeriodicalIF":2.4,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11453097/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142382608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-03eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae148
Veronica Paparozzi, Christine Nardini
Summary: We present tidysbml, an R package able to perform compartments, species, and reactions data extraction from Systems Biology Markup Language (SBML) documents (up to Level 3) in tabular data structures (i.e. R dataframes) to easily access and handle the richness of the biological information. Thanks to its output format, the package facilitates data manipulation, enabling manageable construction, and therefore analysis, of custom networks, as well as data retrieval, by means of R packages such as igraph, RCy3, and biomaRt. Exemplar data (i.e. SBML files) are extracted from Reactome.
Availability and implementation: The tidysbml R package is distributed under CC BY 4.0 License and can be found publicly available in Bioconductor (https://bioconductor.org/packages/tidysbml) and on GitHub (https://github.com/veronicapaparozzi/tidysbml).
摘要:我们介绍的 tidysbml 是一个 R 软件包,它能够以表格数据结构(即 R 数据框)从系统生物学标记语言(SBML)文档(最高 3 级)中提取区系、物种和反应数据,从而轻松访问和处理丰富的生物信息。得益于其输出格式,该软件包方便了数据操作,可通过 igraph、RCy3 和 biomaRt 等 R 软件包管理自定义网络的构建和分析,以及数据检索。 示例数据(即 SBML 文件)从 Reactome.Availability 和实现中提取:tidysbml R 软件包以 CC BY 4.0 许可发布,可在 Bioconductor (https://bioconductor.org/packages/tidysbml) 和 GitHub (https://github.com/veronicapaparozzi/tidysbml) 上公开获取。
{"title":"tidysbml: R/Bioconductor package for SBML extraction into dataframes.","authors":"Veronica Paparozzi, Christine Nardini","doi":"10.1093/bioadv/vbae148","DOIUrl":"https://doi.org/10.1093/bioadv/vbae148","url":null,"abstract":"<p><strong>Summary: </strong>We present <i>tidysbml</i>, an R package able to perform <i>compartments</i>, <i>species</i>, and <i>reactions</i> data extraction from Systems Biology Markup Language (SBML) documents (up to Level 3) in tabular data structures (i.e. R dataframes) to easily access and handle the richness of the biological information. Thanks to its output format, the package facilitates data manipulation, enabling manageable construction, and therefore analysis, of custom networks, as well as data retrieval, by means of R packages such as <i>igraph</i>, <i>RCy3</i>, and <i>biomaRt</i>. Exemplar data (i.e. SBML files) are extracted from Reactome.</p><p><strong>Availability and implementation: </strong>The <i>tidysbml</i> R package is distributed under CC BY 4.0 License and can be found publicly available in Bioconductor (https://bioconductor.org/packages/tidysbml) and on GitHub (https://github.com/veronicapaparozzi/tidysbml).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae148"},"PeriodicalIF":2.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11479578/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-03eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae147
Rodolfo S Allendes Osorio, Yuji Kosugi, Johan T Nyström-Persson, Kenji Mizuguchi, Yayoi Natsume-Kitatani
Summary: To address the challenges of the storage, sharing, and analysis of multi-omics data, here we introduce the newest version of Panomicon, which includes the improvement of the underlying data model, the introduction of new registration and control access service, together with the seamless integration with other services (like TargetMine for data enrichment analysis), integrated in a completely new, more user friendly web application.
Availability and implementation: Panomicon is available online at https://panomicon.nibiohn.go.jp. Unregistered users can access the publicly available data uploaded to Panomicon using the following account: user: guest, password: anonymous. Source code for the application is also freely available under a GNU license at https://github.com/Toxygates/Panomicon/. A brief user guide for the new features of Panomicon is provided as supplementary material online.
{"title":"A modern multi-omics data exploration experience with Panomicon.","authors":"Rodolfo S Allendes Osorio, Yuji Kosugi, Johan T Nyström-Persson, Kenji Mizuguchi, Yayoi Natsume-Kitatani","doi":"10.1093/bioadv/vbae147","DOIUrl":"https://doi.org/10.1093/bioadv/vbae147","url":null,"abstract":"<p><strong>Summary: </strong>To address the challenges of the storage, sharing, and analysis of multi-omics data, here we introduce the newest version of Panomicon, which includes the improvement of the underlying data model, the introduction of new registration and control access service, together with the seamless integration with other services (like TargetMine for data enrichment analysis), integrated in a completely new, more user friendly web application.</p><p><strong>Availability and implementation: </strong>Panomicon is available online at https://panomicon.nibiohn.go.jp. Unregistered users can access the publicly available data uploaded to Panomicon using the following account: user: guest, password: anonymous. Source code for the application is also freely available under a GNU license at https://github.com/Toxygates/Panomicon/. A brief user guide for the new features of Panomicon is provided as supplementary material online.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae147"},"PeriodicalIF":2.4,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11520228/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: Visualization and analysis of biological networks play crucial roles in understanding living systems. Biological networks include diverse types, from gene regulatory networks and protein-protein interactions to metabolic networks. Metabolic networks include substrates, products, and enzymes, which are regulated by allosteric mechanisms and gene expression. However, the analysis of these diverse omics types is challenging due to the diversity of databases and the complexity of network analysis.
Results: We developed iTraNet, a web application that visualizes and analyses trans-omics networks involving four types of networks: gene regulatory networks, protein-protein interactions, metabolic networks, and metabolite exchange networks. Using iTraNet, we found that in wild-type mice, hub molecules within the network tended to respond to glucose administration, whereas in ob/ob mice, this tendency disappeared. With its ability to facilitate network analysis, we anticipate that iTraNet will help researchers gain insights into living systems.
Availability and implementation: iTraNet is available at https://itranet.streamlit.app/.
{"title":"iTraNet: a web-based platform for integrated trans-omics network visualization and analysis.","authors":"Hikaru Sugimoto, Keigo Morita, Dongzi Li, Yunfan Bai, Matthias Mattanovich, Shinya Kuroda","doi":"10.1093/bioadv/vbae141","DOIUrl":"https://doi.org/10.1093/bioadv/vbae141","url":null,"abstract":"<p><strong>Motivation: </strong>Visualization and analysis of biological networks play crucial roles in understanding living systems. Biological networks include diverse types, from gene regulatory networks and protein-protein interactions to metabolic networks. Metabolic networks include substrates, products, and enzymes, which are regulated by allosteric mechanisms and gene expression. However, the analysis of these diverse omics types is challenging due to the diversity of databases and the complexity of network analysis.</p><p><strong>Results: </strong>We developed iTraNet, a web application that visualizes and analyses trans-omics networks involving four types of networks: gene regulatory networks, protein-protein interactions, metabolic networks, and metabolite exchange networks. Using iTraNet, we found that in wild-type mice, hub molecules within the network tended to respond to glucose administration, whereas in <i>ob/ob</i> mice, this tendency disappeared. With its ability to facilitate network analysis, we anticipate that iTraNet will help researchers gain insights into living systems.</p><p><strong>Availability and implementation: </strong>iTraNet is available at https://itranet.streamlit.app/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae141"},"PeriodicalIF":2.4,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11493990/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142513909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: RNA-binding proteins (RBPs) play a crucial role in the post-transcriptional regulation of RNA. Given their importance, analyzing the specific RNA patterns recognized by RBPs has become a significant research focus in bioinformatics. Deep Neural Networks have enhanced the accuracy of prediction for RBP-binding sites, yet understanding the structural basis of RBP-binding specificity from these models is challenging due to their limited interpretability. To address this, we developed RNAelem, which combines profile context-free grammar and the Turner energy model for RNA secondary structure to predict sequence-structure motifs in RBP-binding regions.
Results: RNAelem exhibited superior detection accuracy compared to existing tools for RNA sequences with structural motifs. Upon applying RNAelem to the eCLIP database, we were not only able to reproduce many known primary sequence motifs in the absence of secondary structures, but also discovered many secondary structural motifs that contained sequence-nonspecific insertion regions. Furthermore, the high interpretability of RNAelem yielded insightful findings such as long-range base-pairing interactions in the binding region of the U2AF protein.
Availability and implementation: The code is available at https://github.com/iyak/RNAelem.
{"title":"RNAelem: an algorithm for discovering sequence-structure motifs in RNA bound by RNA-binding proteins.","authors":"Hiroshi Miyake, Risa Karakida Kawaguchi, Hisanori Kiryu","doi":"10.1093/bioadv/vbae144","DOIUrl":"https://doi.org/10.1093/bioadv/vbae144","url":null,"abstract":"<p><strong>Motivation: </strong>RNA-binding proteins (RBPs) play a crucial role in the post-transcriptional regulation of RNA. Given their importance, analyzing the specific RNA patterns recognized by RBPs has become a significant research focus in bioinformatics. Deep Neural Networks have enhanced the accuracy of prediction for RBP-binding sites, yet understanding the structural basis of RBP-binding specificity from these models is challenging due to their limited interpretability. To address this, we developed RNAelem, which combines profile context-free grammar and the Turner energy model for RNA secondary structure to predict sequence-structure motifs in RBP-binding regions.</p><p><strong>Results: </strong>RNAelem exhibited superior detection accuracy compared to existing tools for RNA sequences with structural motifs. Upon applying RNAelem to the eCLIP database, we were not only able to reproduce many known primary sequence motifs in the absence of secondary structures, but also discovered many secondary structural motifs that contained sequence-nonspecific insertion regions. Furthermore, the high interpretability of RNAelem yielded insightful findings such as long-range base-pairing interactions in the binding region of the U2AF protein.</p><p><strong>Availability and implementation: </strong>The code is available at https://github.com/iyak/RNAelem.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae144"},"PeriodicalIF":2.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-28eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae143
Thomas Cheng Li, Hufeng Zhou, Vineet Verma, Xiangru Tang, Yanjun Shao, Eric Van Buren, Zhiping Weng, Mark Gerstein, Benjamin Neale, Shamil R Sunyaev, Xihong Lin
Motivation: Functional Annotation of genomic Variants Online Resources (FAVOR) offers multi-faceted, whole genome variant functional annotations, which is essential for Whole Genome and Exome Sequencing (WGS/WES) analysis and the functional prioritization of disease-associated variants. A versatile chatbot designed to facilitate informative interpretation and interactive, user-centric summary of the whole genome variant functional annotation data in the FAVOR database is needed.
Results: We have developed FAVOR-GPT, a generative natural language interface powered by integrating large language models (LLMs) and FAVOR. It is developed based on the Retrieval Augmented Generation (RAG) approach, and complements the original FAVOR portal, enhancing usability for users, especially those without specialized expertise. FAVOR-GPT simplifies raw annotations by providing interpretable explanations and result summaries in response to the user's prompt. It shows high accuracy when cross-referencing with the FAVOR database, underscoring the robustness of the retrieval framework.
Availability and implementation: Researchers can access FAVOR-GPT at FAVOR's main website (https://favor.genohub.org).
{"title":"FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations.","authors":"Thomas Cheng Li, Hufeng Zhou, Vineet Verma, Xiangru Tang, Yanjun Shao, Eric Van Buren, Zhiping Weng, Mark Gerstein, Benjamin Neale, Shamil R Sunyaev, Xihong Lin","doi":"10.1093/bioadv/vbae143","DOIUrl":"10.1093/bioadv/vbae143","url":null,"abstract":"<p><strong>Motivation: </strong>Functional Annotation of genomic Variants Online Resources (FAVOR) offers multi-faceted, whole genome variant functional annotations, which is essential for Whole Genome and Exome Sequencing (WGS/WES) analysis and the functional prioritization of disease-associated variants. A versatile chatbot designed to facilitate informative interpretation and interactive, user-centric summary of the whole genome variant functional annotation data in the FAVOR database is needed.</p><p><strong>Results: </strong>We have developed FAVOR-GPT, a generative natural language interface powered by integrating large language models (LLMs) and FAVOR. It is developed based on the Retrieval Augmented Generation (RAG) approach, and complements the original FAVOR portal, enhancing usability for users, especially those without specialized expertise. FAVOR-GPT simplifies raw annotations by providing interpretable explanations and result summaries in response to the user's prompt. It shows high accuracy when cross-referencing with the FAVOR database, underscoring the robustness of the retrieval framework.</p><p><strong>Availability and implementation: </strong>Researchers can access FAVOR-GPT at FAVOR's main website (https://favor.genohub.org).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae143"},"PeriodicalIF":2.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-27eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae139
Haydeé Contreras-Peruyero, Shaday Guerrero-Flores, Claudia Zirión-Martínez, Paulina M Mejía-Ponce, Marisol Navarro-Miranda, J Abel Lovaco-Flores, José M Ibarra-Rodríguez, Anton Pashkov, Cuauhtémoc Licona-Cassani, Nelly Sélem-Mojica
Motivation: As genomics data analysis becomes increasingly intricate, researchers face the challenge of mastering various software tools. The rise of Pangenomics analysis, which examines the complete set of genes in a group of genomes, is particularly transformative in understanding genetic diversity. Our interdisciplinary team of biologists and mathematicians developed a short Pangenomics Workshop covering Bash, Python scripting, Pangenome, and Topological Data Analysis. These skills provide deeper insights into genetic variations and their implications in Evolutionary Biology. The workshop uses a Conda environment for reproducibility and accessibility. Developed in The Carpentries Incubator infrastructure, the workshop aims to equip researchers with essential skills for Pangenomics research. By emphasizing the role of a community of practice, this work underscores its significance in empowering multidisciplinary professionals to collaboratively develop training that adheres to best practices.
Results: Our workshop delivers tangible outcomes by enhancing the skill sets of Computational Biology professionals. Participants gain hands-on experience using real data from the first described pangenome. We share our paths toward creating an open-source, multidisciplinary, and public resource where learners can develop expertise in Pangenomic Analysis. This initiative goes beyond advancing individual capabilities, aligning with the broader mission of addressing educational needs in Computational Biology.
Availability and implementation: https://carpentries-incubator.github.io/pangenomics-workshop/.
{"title":"Meeting the challenge of genomic analysis: a collaboratively developed workshop for pangenomics and topological data analysis.","authors":"Haydeé Contreras-Peruyero, Shaday Guerrero-Flores, Claudia Zirión-Martínez, Paulina M Mejía-Ponce, Marisol Navarro-Miranda, J Abel Lovaco-Flores, José M Ibarra-Rodríguez, Anton Pashkov, Cuauhtémoc Licona-Cassani, Nelly Sélem-Mojica","doi":"10.1093/bioadv/vbae139","DOIUrl":"10.1093/bioadv/vbae139","url":null,"abstract":"<p><strong>Motivation: </strong>As genomics data analysis becomes increasingly intricate, researchers face the challenge of mastering various software tools. The rise of Pangenomics analysis, which examines the complete set of genes in a group of genomes, is particularly transformative in understanding genetic diversity. Our interdisciplinary team of biologists and mathematicians developed a short Pangenomics Workshop covering Bash, Python scripting, Pangenome, and Topological Data Analysis. These skills provide deeper insights into genetic variations and their implications in Evolutionary Biology. The workshop uses a Conda environment for reproducibility and accessibility. Developed in The Carpentries Incubator infrastructure, the workshop aims to equip researchers with essential skills for Pangenomics research. By emphasizing the role of a community of practice, this work underscores its significance in empowering multidisciplinary professionals to collaboratively develop training that adheres to best practices.</p><p><strong>Results: </strong>Our workshop delivers tangible outcomes by enhancing the skill sets of Computational Biology professionals. Participants gain hands-on experience using real data from the first described pangenome. We share our paths toward creating an open-source, multidisciplinary, and public resource where learners can develop expertise in Pangenomic Analysis. This initiative goes beyond advancing individual capabilities, aligning with the broader mission of addressing educational needs in Computational Biology.</p><p><strong>Availability and implementation: </strong>https://carpentries-incubator.github.io/pangenomics-workshop/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae139"},"PeriodicalIF":2.4,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142559613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-27eCollection Date: 2024-01-01DOI: 10.1093/bioadv/vbae138
Changrui Li, Yang Zheng, Filip Jagodzinski
Summary: Understanding how amino acid insertion mutations affect protein structure can inform pharmaceutical efforts targeting diseases that are caused by protein mutants. In silico simulation of mutations complements experiments performed on physical proteins which are time and cost prohibitive. We have computationally generated the exhaustive sets of two amino acid insertion mutations for five protein structures in the Protein Data Bank. To probe and identify how pairs of insertions affect structural stability and flexibility, we tally the count of hydrogen bonds and analyze a variety of metrics of each mutant. We identify hotspots where pairs of insertions have a pronounced effect, and study how amino acid properties such as size and type, and insertion into alpha helices, affect a protein's structure. The findings show that although there are some residues, Proline and Tryptophan specifically, which if inserted have a significant impact on the protein's structure, there is also a great deal of variance in the effects of the exhaustive insertions both for any single protein, and across the five proteins. That suggests that computational or otherwise quantitative efforts should consider large representative sample sizes especially when training models to make predictions about the effects of insertions.
Availability and implementation: The data underlying this article is available at https://multimute.cs.wwu.edu.
{"title":"How pairs of insertion mutations impact protein structure: an exhaustive computational study.","authors":"Changrui Li, Yang Zheng, Filip Jagodzinski","doi":"10.1093/bioadv/vbae138","DOIUrl":"10.1093/bioadv/vbae138","url":null,"abstract":"<p><strong>Summary: </strong>Understanding how amino acid insertion mutations affect protein structure can inform pharmaceutical efforts targeting diseases that are caused by protein mutants. <i>In silico</i> simulation of mutations complements experiments performed on physical proteins which are time and cost prohibitive. We have computationally generated the exhaustive sets of two amino acid insertion mutations for five protein structures in the Protein Data Bank. To probe and identify how pairs of insertions affect structural stability and flexibility, we tally the count of hydrogen bonds and analyze a variety of metrics of each mutant. We identify hotspots where pairs of insertions have a pronounced effect, and study how amino acid properties such as size and type, and insertion into alpha helices, affect a protein's structure. The findings show that although there are some residues, Proline and Tryptophan specifically, which if inserted have a significant impact on the protein's structure, there is also a great deal of variance in the effects of the exhaustive insertions both for any single protein, and across the five proteins. That suggests that computational or otherwise quantitative efforts should consider large representative sample sizes especially when training models to make predictions about the effects of insertions.</p><p><strong>Availability and implementation: </strong>The data underlying this article is available at https://multimute.cs.wwu.edu.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae138"},"PeriodicalIF":2.4,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142831007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}