Bioinformatics advances最新文献_第3页

Sphae: an automated toolkit for predicting phage therapy candidates from sequencing data.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances

Pub Date : 2025-01-17 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf004

Bhavya Papudeshi, Michael J Roach, Vijini Mallawaarachchi, George Bouras, Susanna R Grigson, Sarah K Giles, Clarice M Harker, Abbey L K Hutton, Anita Tarasenko, Laura K Inglis, Alejandro A Vega, Cole Souza, Lance Boling, Hamza Hajama, Ana Georgina Cobián Güemes, Anca M Segall, Elizabeth A Dinsdale, Robert A Edwards

Motivation: Phage therapy offers a viable alternative for bacterial infections amid rising antimicrobial resistance. Its success relies on selecting safe and effective phage candidates that require comprehensive genomic screening to identify potential risks. However, this process is often labor intensive and time-consuming, hindering rapid clinical deployment.

Results: We developed Sphae, an automated bioinformatics pipeline designed to streamline the therapeutic potential of a phage in under 10 minutes. Using Snakemake workflow manager, Sphae integrates tools for quality control, assembly, genome assessment, and annotation tailored specifically for phage biology. Sphae automates the detection of key genomic markers, including virulence factors, antimicrobial resistance genes, and lysogeny indicators such as integrase, recombinase, and transposase, which could preclude therapeutic use. Among the 65 phage sequences analyzed, 28 showed therapeutic potential, 8 failed due to low sequencing depth, 22 contained prophage or virulent markers, and 23 had multiple phage genomes. This workflow produces a report to assess phage safety and therapy suitability quickly. Sphae is scalable and portable, facilitating efficient deployment across most high-performance computing and cloud platforms, accelerating the genomic evaluation process.

Availability and implementation: Sphae source code is freely available at https://github.com/linsalrob/sphae, with installation supported on Conda, PyPi, Docker containers.

{"title":"Sphae: an automated toolkit for predicting phage therapy candidates from sequencing data.","authors":"Bhavya Papudeshi, Michael J Roach, Vijini Mallawaarachchi, George Bouras, Susanna R Grigson, Sarah K Giles, Clarice M Harker, Abbey L K Hutton, Anita Tarasenko, Laura K Inglis, Alejandro A Vega, Cole Souza, Lance Boling, Hamza Hajama, Ana Georgina Cobián Güemes, Anca M Segall, Elizabeth A Dinsdale, Robert A Edwards","doi":"10.1093/bioadv/vbaf004","DOIUrl":"10.1093/bioadv/vbaf004","url":null,"abstract":"Motivation: Phage therapy offers a viable alternative for bacterial infections amid rising antimicrobial resistance. Its success relies on selecting safe and effective phage candidates that require comprehensive genomic screening to identify potential risks. However, this process is often labor intensive and time-consuming, hindering rapid clinical deployment.Results: We developed Sphae, an automated bioinformatics pipeline designed to streamline the therapeutic potential of a phage in under 10 minutes. Using Snakemake workflow manager, Sphae integrates tools for quality control, assembly, genome assessment, and annotation tailored specifically for phage biology. Sphae automates the detection of key genomic markers, including virulence factors, antimicrobial resistance genes, and lysogeny indicators such as integrase, recombinase, and transposase, which could preclude therapeutic use. Among the 65 phage sequences analyzed, 28 showed therapeutic potential, 8 failed due to low sequencing depth, 22 contained prophage or virulent markers, and 23 had multiple phage genomes. This workflow produces a report to assess phage safety and therapy suitability quickly. Sphae is scalable and portable, facilitating efficient deployment across most high-performance computing and cloud platforms, accelerating the genomic evaluation process.Availability and implementation: Sphae source code is freely available at https://github.com/linsalrob/sphae, with installation supported on Conda, PyPi, Docker containers.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf004"},"PeriodicalIF":2.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783317/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

estiMAge: development of a DNA methylation clock to estimate the methylation age of single cells.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances

Pub Date : 2025-01-16 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf005

Zoe Saßmannshausen, Lisa Blank, Llorenç Solé-Boldo, Frank Lyko, Günter Raddatz

Motivation: Since their introduction about 10 years ago, methylation clocks have provided broad insights into the biological age of different species, tissues, and in the context of several diseases or aging. However, their application to single-cell methylation data remains a major challenge, because of the inherent sparsity of such data, as many CpG sites are not covered. A methylation clock applicable on single-cell level could help to further disentangle the processes that drive the ticking of epigenetic clocks.

Results: We have developed estiMAge ("estimation of Methylation Age"), a framework that exploits redundancy in methylation data to substitute missing CpGs of trained methylation clocks in single cells. Using Euclidean distance as a measure of similarity, we determine which CpGs covary with the required CpG sites of an epigenetic clock and can be used as surrogates for clock CpGs not covered in single-cell experiments. estiMAge is thus a tool that can be applied to standard epigenetic clocks built on elastic net regression, to achieve bulk and single-cell resolution. We show that estiMAge can accurately predict the ages of young and old hepatocytes and can be used to generate single-cell versions of publicly available epigenetic clocks.

Availability and implementation: The source code and instructions for usage of estiMAge are available at https://github.com/DivEpigenetics/estiMAge.

{"title":"estiMAge: development of a DNA methylation clock to estimate the methylation age of single cells.","authors":"Zoe Saßmannshausen, Lisa Blank, Llorenç Solé-Boldo, Frank Lyko, Günter Raddatz","doi":"10.1093/bioadv/vbaf005","DOIUrl":"10.1093/bioadv/vbaf005","url":null,"abstract":"Motivation: Since their introduction about 10 years ago, methylation clocks have provided broad insights into the biological age of different species, tissues, and in the context of several diseases or aging. However, their application to single-cell methylation data remains a major challenge, because of the inherent sparsity of such data, as many CpG sites are not covered. A methylation clock applicable on single-cell level could help to further disentangle the processes that drive the ticking of epigenetic clocks.Results: We have developed estiMAge (\"estimation of Methylation Age\"), a framework that exploits redundancy in methylation data to substitute missing CpGs of trained methylation clocks in single cells. Using Euclidean distance as a measure of similarity, we determine which CpGs covary with the required CpG sites of an epigenetic clock and can be used as surrogates for clock CpGs not covered in single-cell experiments. estiMAge is thus a tool that can be applied to standard epigenetic clocks built on elastic net regression, to achieve bulk and single-cell resolution. We show that estiMAge can accurately predict the ages of young and old hepatocytes and can be used to generate single-cell versions of publicly available epigenetic clocks.Availability and implementation: The source code and instructions for usage of estiMAge are available at https://github.com/DivEpigenetics/estiMAge.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf005"},"PeriodicalIF":2.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11769677/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrative co-registration of elemental imaging and histopathology for enhanced spatial multimodal analysis of tissue sections through TRACE. 结合元素成像和组织病理学，通过TRACE增强组织切片的空间多模态分析。

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances

Pub Date : 2025-01-07 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf001

Yunrui Lu, Serin Han, Aruesha Srivastava, Neha Shaik, Matthew Chan, Alos Diallo, Naina Kumar, Nishita Paruchuri, Hrishikesh Deosthali, Vismay Ravikumar, Kevin Cornell, Elijah Stommel, Tracy Punshon, Brian Jackson, Fred Kolling, Linda Vahdat, Louis Vaickus, Jonathan Marotti, Sunita Ho, Joshua Levy

Summary: Elemental imaging provides detailed profiling of metal bioaccumulation, offering more precision than bulk analysis by targeting specific tissue areas. However, accurately identifying comparable tissue regions from elemental maps is challenging, requiring the integration of hematoxylin and eosin (H&E) slides for effective comparison. Facilitating the streamlined co-registration of whole slide images (WSI) and elemental maps, TRACE enhances the analysis of tissue regions and elemental abundance in various pathological conditions. Through an interactive containerized web application, TRACE features real-time annotation editing, advanced statistical tools, and data export, supporting comprehensive spatial analysis. Notably, it allows for comparison of elemental abundances across annotated tissue structures and enables integration with other spatial data types through WSI co-registration.

Availability and implementation: Available on the following platforms-GitHub: jlevy44/trace_app, PyPI: trace_app, Docker: joshualevy44/trace_app, Singularity: docker://joshualevy44/trace_app.

元素成像提供了金属生物积累的详细剖面，通过针对特定组织区域提供比批量分析更精确的分析。然而，从元素图中准确识别可比的组织区域是具有挑战性的，需要整合苏木精和伊红（H&E）载玻片进行有效的比较。促进整个幻灯片图像（WSI）和元素图的流线型共配准，TRACE增强了各种病理条件下组织区域和元素丰度的分析。通过交互式容器化web应用程序，TRACE具有实时注释编辑、高级统计工具和数据导出功能，支持全面的空间分析。值得注意的是，它允许在注释的组织结构之间比较元素丰度，并通过WSI共同注册实现与其他空间数据类型的集成。可用性和实现：可用于以下平台- github: jlevy44/trace_app， PyPI: trace_app, Docker: joshualevy44/trace_app, Singularity: Docker: //joshualevy44/trace_app。

{"title":"Integrative co-registration of elemental imaging and histopathology for enhanced spatial multimodal analysis of tissue sections through TRACE.","authors":"Yunrui Lu, Serin Han, Aruesha Srivastava, Neha Shaik, Matthew Chan, Alos Diallo, Naina Kumar, Nishita Paruchuri, Hrishikesh Deosthali, Vismay Ravikumar, Kevin Cornell, Elijah Stommel, Tracy Punshon, Brian Jackson, Fred Kolling, Linda Vahdat, Louis Vaickus, Jonathan Marotti, Sunita Ho, Joshua Levy","doi":"10.1093/bioadv/vbaf001","DOIUrl":"10.1093/bioadv/vbaf001","url":null,"abstract":"Summary: Elemental imaging provides detailed profiling of metal bioaccumulation, offering more precision than bulk analysis by targeting specific tissue areas. However, accurately identifying comparable tissue regions from elemental maps is challenging, requiring the integration of hematoxylin and eosin (H&E) slides for effective comparison. Facilitating the streamlined co-registration of whole slide images (WSI) and elemental maps, TRACE enhances the analysis of tissue regions and elemental abundance in various pathological conditions. Through an interactive containerized web application, TRACE features real-time annotation editing, advanced statistical tools, and data export, supporting comprehensive spatial analysis. Notably, it allows for comparison of elemental abundances across annotated tissue structures and enables integration with other spatial data types through WSI co-registration.Availability and implementation: Available on the following platforms-GitHub: jlevy44/trace_app, PyPI: trace_app, Docker: joshualevy44/trace_app, Singularity: docker://joshualevy44/trace_app.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf001"},"PeriodicalIF":2.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742137/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143017257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VAREANT: a bioinformatics application for gene variant reduction and annotation.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances

Pub Date : 2024-12-31 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbae210

Rishabh Narayanan, William DeGroat, Elizabeth Peker, Saman Zeeshan, Zeeshan Ahmed

Motivation: The analysis of high-quality genomic variant data may offer a more complete understanding of the human genome, enabling researchers to identify novel biomarkers, stratify patients based on disease risk factors, and decipher underlying biological pathways. Although the availability of genomic data has sharply increased in recent years, the accessibility of bioinformatic tools to aid in its preparation is still lacking. Limitations with processing genomic data primarily include its large volume, associated computational and storage costs, and difficulty in identifying targeted and relevant information.

Results: We present VAREANT, an accessible and configurable bioinformatic application to support the preparation of variant data into a usable analysis-ready format. VAREANT is comprised of three standalone modules: (i) Pre-processing, (ii) Variant Annotation, (iii) AI/ML Data Preparation. Pre-processing supports the fine-grained filtering of complex variant datasets to eliminate extraneous data. Variant Annotation allows for the addition of variant metadata from the latest public annotation databases for subsequent analysis and interpretation. AI/ML Data Preparation supports the user in creating AI/ML-ready datasets suitable for immediate analysis with minimal pre-processing required. We have successfully tested and validated our tool on numerous variable-sized datasets and implemented VAREANT in two case studies involving patients with cardiovascular diseases.

Availability and implementation: The open-source code of VAREANT is available at GitHub: https://github.com/drzeeshanahmed/Gene_VAREANT.

{"title":"VAREANT: a bioinformatics application for gene variant reduction and annotation.","authors":"Rishabh Narayanan, William DeGroat, Elizabeth Peker, Saman Zeeshan, Zeeshan Ahmed","doi":"10.1093/bioadv/vbae210","DOIUrl":"10.1093/bioadv/vbae210","url":null,"abstract":"Motivation: The analysis of high-quality genomic variant data may offer a more complete understanding of the human genome, enabling researchers to identify novel biomarkers, stratify patients based on disease risk factors, and decipher underlying biological pathways. Although the availability of genomic data has sharply increased in recent years, the accessibility of bioinformatic tools to aid in its preparation is still lacking. Limitations with processing genomic data primarily include its large volume, associated computational and storage costs, and difficulty in identifying targeted and relevant information.Results: We present VAREANT, an accessible and configurable bioinformatic application to support the preparation of variant data into a usable analysis-ready format. VAREANT is comprised of three standalone modules: (i) Pre-processing, (ii) Variant Annotation, (iii) AI/ML Data Preparation. Pre-processing supports the fine-grained filtering of complex variant datasets to eliminate extraneous data. Variant Annotation allows for the addition of variant metadata from the latest public annotation databases for subsequent analysis and interpretation. AI/ML Data Preparation supports the user in creating AI/ML-ready datasets suitable for immediate analysis with minimal pre-processing required. We have successfully tested and validated our tool on numerous variable-sized datasets and implemented VAREANT in two case studies involving patients with cardiovascular diseases.Availability and implementation: The open-source code of VAREANT is available at GitHub: https://github.com/drzeeshanahmed/Gene_VAREANT.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae210"},"PeriodicalIF":2.4,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11802749/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143384256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting CRISPR-Cas9 off-target effects in human primary cells using bidirectional LSTM with BERT embedding. 利用BERT嵌入的双向LSTM预测CRISPR-Cas9在人原代细胞中的脱靶效应

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances

Pub Date : 2024-12-30 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbae184

Orhan Sari, Ziying Liu, Youlian Pan, Xiaojian Shao

Motivation: Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 system is a ground-breaking genome editing tool, which has revolutionized cell and gene therapies. One of the essential components involved in this system that ensures its success is the design of an optimal single-guide RNA (sgRNA) with high on-target cleavage efficiency and low off-target effects. This is challenging as many conditions need to be considered, and empirically testing every design is time-consuming and costly. In silico prediction using machine learning models provides high-performance alternatives.

Results: We present CrisprBERT, a deep learning model incorporating a Bidirectional Encoder Representations from Transformers (BERT) architecture to provide a high-dimensional embedding for paired sgRNA and DNA sequences and Bidirectional Long Short-term Memory networks for learning, to predict the off-target effects of sgRNAs utilizing only the sgRNAs and their paired DNA sequences. We proposed doublet stack encoding to capture the local energy configuration of the Cas9 binding and applied the BERT model to learn the contextual embedding of the doublet pairs. Our results showed that the new model achieved better performance than state-of-the-art deep learning models regarding single split and leave-one-sgRNA-out cross-validations as well as independent testing.

Availability and implementation: The CrisprBERT is available at GitHub: https://github.com/OSsari/CrisprBERT.

集群规则间隔短回文重复(CRISPR)-Cas9系统是一种突破性的基因组编辑工具，它彻底改变了细胞和基因治疗。确保该系统成功的重要组成部分之一是设计具有高靶向切割效率和低脱靶效应的最佳单导RNA （sgRNA）。这是具有挑战性的，因为需要考虑许多条件，并且对每个设计进行经验测试既耗时又昂贵。使用机器学习模型的计算机预测提供了高性能的替代方案。结果：我们提出了CrisprBERT，这是一个深度学习模型，结合了来自变形变压器的双向编码器表示（BERT）架构，为配对的sgRNA和DNA序列以及双向长短期记忆网络提供高维嵌入，用于学习，仅利用sgRNA及其配对的DNA序列来预测sgRNA的脱靶效应。我们提出了双重态堆栈编码来捕获Cas9结合的局部能量配置，并应用BERT模型来学习双重态对的上下文嵌入。我们的研究结果表明，新模型在单个分裂和留下一个sgrna的交叉验证以及独立测试方面取得了比最先进的深度学习模型更好的性能。可用性和实现：CrisprBERT可以在GitHub上获得：https://github.com/OSsari/CrisprBERT。

{"title":"Predicting CRISPR-Cas9 off-target effects in human primary cells using bidirectional LSTM with BERT embedding.","authors":"Orhan Sari, Ziying Liu, Youlian Pan, Xiaojian Shao","doi":"10.1093/bioadv/vbae184","DOIUrl":"https://doi.org/10.1093/bioadv/vbae184","url":null,"abstract":"Motivation: Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 system is a ground-breaking genome editing tool, which has revolutionized cell and gene therapies. One of the essential components involved in this system that ensures its success is the design of an optimal single-guide RNA (sgRNA) with high on-target cleavage efficiency and low off-target effects. This is challenging as many conditions need to be considered, and empirically testing every design is time-consuming and costly. In silico prediction using machine learning models provides high-performance alternatives.Results: We present CrisprBERT, a deep learning model incorporating a Bidirectional Encoder Representations from Transformers (BERT) architecture to provide a high-dimensional embedding for paired sgRNA and DNA sequences and Bidirectional Long Short-term Memory networks for learning, to predict the off-target effects of sgRNAs utilizing only the sgRNAs and their paired DNA sequences. We proposed doublet stack encoding to capture the local energy configuration of the Cas9 binding and applied the BERT model to learn the contextual embedding of the doublet pairs. Our results showed that the new model achieved better performance than state-of-the-art deep learning models regarding single split and leave-one-sgRNA-out cross-validations as well as independent testing.Availability and implementation: The CrisprBERT is available at GitHub: https://github.com/OSsari/CrisprBERT.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae184"},"PeriodicalIF":2.4,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11696696/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142933934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Genal: a Python toolkit for genetic risk scoring and Mendelian randomization. 通用：用于遗传风险评分和孟德尔随机化的Python工具包。

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances

Pub Date : 2024-12-24 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbae207

Cyprien A Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N Sheth, Guido J Falcone, Julian N Acosta

Motivation: The expansion of genetic association data from genome-wide association studies has increased the importance of methodologies like Polygenic Risk Scores (PRS) and Mendelian Randomization (MR) in genetic epidemiology. However, their application is often impeded by complex, multi-step workflows requiring specialized expertise and the use of disparate tools with varying data formatting requirements. Existing solutions are frequently standalone packages or command-line based-largely due to dependencies on tools like PLINK-limiting accessibility for researchers without computational experience. Given Python's popularity and ease of use, there is a need for an integrated, user-friendly Python toolkit to streamline PRS and MR analyses.

Results: We introduce Genal, a Python package that consolidates SNP-level data handling, cleaning, clumping, PRS computation, and MR analyses into a single, cohesive toolkit. By eliminating the need for multiple R packages and for command-line interaction by wrapping around PLINK, Genal lowers the barrier for medical scientists to perform complex genetic epidemiology studies. Genal draws on concepts from several well-established tools, ensuring that users have access to rigorous statistical techniques in the intuitive Python environment. Additionally, Genal leverages parallel processing for MR methods, including MR-PRESSO, significantly reducing the computational time required for these analyses.

Availability and implementation: The package is available on Pypi (https://pypi.org/project/genal-python/), the code is openly available on Github with a tutorial: https://github.com/CypRiv/genal, and the documentation can be found on readthedocs: https://genal.rtfd.io.

动机：来自全基因组关联研究的遗传关联数据的扩展增加了多基因风险评分（PRS）和孟德尔随机化（MR）等方法在遗传流行病学中的重要性。然而，它们的应用常常受到复杂的、多步骤的工作流程的阻碍，这些工作流程需要专门的专业知识和使用具有不同数据格式化需求的不同工具。现有的解决方案通常是独立的软件包或基于命令行——很大程度上是由于依赖于plink等工具——限制了没有计算经验的研究人员的可访问性。鉴于Python的流行和易用性，需要一个集成的、用户友好的Python工具包来简化PRS和MR分析。结果：我们介绍了general，这是一个Python包，它将snp级别的数据处理、清理、聚集、PRS计算和MR分析整合到一个单一的、内聚的工具包中。通过包装PLINK，消除了对多个R包和命令行交互的需求，general降低了医学科学家进行复杂遗传流行病学研究的障碍。general从几个成熟的工具中汲取概念，确保用户能够在直观的Python环境中访问严格的统计技术。此外，通用利用并行处理MR方法，包括MR- presso，大大减少了这些分析所需的计算时间。可用性和实现：该包可在Pypi （https://pypi.org/project/genal-python/）上获得，代码可在Github上公开获得，并提供教程：https://github.com/CypRiv/genal，文档可在readthedocs: https://genal.rtfd.io上找到。

{"title":"Genal: a Python toolkit for genetic risk scoring and Mendelian randomization.","authors":"Cyprien A Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N Sheth, Guido J Falcone, Julian N Acosta","doi":"10.1093/bioadv/vbae207","DOIUrl":"https://doi.org/10.1093/bioadv/vbae207","url":null,"abstract":"Motivation: The expansion of genetic association data from genome-wide association studies has increased the importance of methodologies like Polygenic Risk Scores (PRS) and Mendelian Randomization (MR) in genetic epidemiology. However, their application is often impeded by complex, multi-step workflows requiring specialized expertise and the use of disparate tools with varying data formatting requirements. Existing solutions are frequently standalone packages or command-line based-largely due to dependencies on tools like PLINK-limiting accessibility for researchers without computational experience. Given Python's popularity and ease of use, there is a need for an integrated, user-friendly Python toolkit to streamline PRS and MR analyses.Results: We introduce Genal, a Python package that consolidates SNP-level data handling, cleaning, clumping, PRS computation, and MR analyses into a single, cohesive toolkit. By eliminating the need for multiple R packages and for command-line interaction by wrapping around PLINK, Genal lowers the barrier for medical scientists to perform complex genetic epidemiology studies. Genal draws on concepts from several well-established tools, ensuring that users have access to rigorous statistical techniques in the intuitive Python environment. Additionally, Genal leverages parallel processing for MR methods, including MR-PRESSO, significantly reducing the computational time required for these analyses.Availability and implementation: The package is available on Pypi (https://pypi.org/project/genal-python/), the code is openly available on Github with a tutorial: https://github.com/CypRiv/genal, and the documentation can be found on readthedocs: https://genal.rtfd.io.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae207"},"PeriodicalIF":2.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11706532/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142959799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

QOMIC: quantum optimization for motif identification. QOMIC：用于图案识别的量子优化。

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances

Pub Date : 2024-12-24 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbae208

Hoang M Ngo, Tamim Khatib, My T Thai, Tamer Kahveci

Motivation: Network motif identification (MI) problem aims to find topological patterns in biological networks. Identifying disjoint motifs is a computationally challenging problem using classical computers. Quantum computers enable solving high complexity problems which do not scale using classical computers. In this article, we develop the first quantum solution, called QOMIC (Quantum Optimization for Motif IdentifiCation), to the MI problem. QOMIC transforms the MI problem using a integer model, which serves as the foundation to develop our quantum solution. We develop and implement the quantum circuit to find motif locations in the given network using this model.

Results: Our experiments demonstrate that QOMIC outperforms the existing solutions developed for the classical computer, in term of motif counts. We also observe that QOMIC can efficiently find motifs in human regulatory networks associated with five neurodegenerative diseases: Alzheimer's, Parkinson's, Huntington's, Amyotrophic Lateral Sclerosis, and Motor Neurone Disease.

Availability and implementation: Our implementation can be found in https://github.com/ngominhhoang/Quantum-Motif-Identification.git.

动机：网络基序识别（Network motif identification， MI）问题旨在寻找生物网络中的拓扑模式。使用经典计算机识别不相交母题是一个具有计算挑战性的问题。量子计算机能够解决经典计算机无法扩展的高复杂性问题。在本文中，我们开发了第一个量子解决方案，称为QOMIC（量子优化的Motif识别），以MI问题。QOMIC使用整数模型来转换MI问题，这是我们开发量子解决方案的基础。我们利用这个模型开发并实现了在给定网络中寻找基序位置的量子电路。结果：我们的实验表明，在基序计数方面，QOMIC优于传统计算机开发的现有解决方案。我们还观察到QOMIC可以有效地找到与五种神经退行性疾病相关的人类调控网络中的基元：阿尔茨海默病、帕金森病、亨廷顿病、肌萎缩侧索硬化症和运动神经元病。可用性和实现：我们的实现可以在https://github.com/ngominhhoang/Quantum-Motif-Identification.git中找到。

{"title":"QOMIC: quantum optimization for motif identification.","authors":"Hoang M Ngo, Tamim Khatib, My T Thai, Tamer Kahveci","doi":"10.1093/bioadv/vbae208","DOIUrl":"10.1093/bioadv/vbae208","url":null,"abstract":"Motivation: Network motif identification (MI) problem aims to find topological patterns in biological networks. Identifying disjoint motifs is a computationally challenging problem using classical computers. Quantum computers enable solving high complexity problems which do not scale using classical computers. In this article, we develop the first quantum solution, called QOMIC (Quantum Optimization for Motif IdentifiCation), to the MI problem. QOMIC transforms the MI problem using a integer model, which serves as the foundation to develop our quantum solution. We develop and implement the quantum circuit to find motif locations in the given network using this model.Results: Our experiments demonstrate that QOMIC outperforms the existing solutions developed for the classical computer, in term of motif counts. We also observe that QOMIC can efficiently find motifs in human regulatory networks associated with five neurodegenerative diseases: Alzheimer's, Parkinson's, Huntington's, Amyotrophic Lateral Sclerosis, and Motor Neurone Disease.Availability and implementation: Our implementation can be found in https://github.com/ngominhhoang/Quantum-Motif-Identification.git.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae208"},"PeriodicalIF":2.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11725347/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identification of promising dipeptidyl peptidase-4 and protein tyrosine phosphatase 1B inhibitors from selected terpenoids through molecular modeling.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances

Pub Date : 2024-12-19 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbae205

Oludare M Ogunyemi, Gideon A Gyebi, Femi Olawale, Ibrahim M Ibrahim, Opeyemi Iwaloye, Modupe M Fabusiwa, Stephen Omowaye, Omotade I Oloyede, Charles O Olaiya

Motivation: Investigating novel drug-target interactions is crucial for expanding the chemical space of emerging therapeutic targets in human diseases. Herein, we explored the interactions of dipeptidyl peptidase-4 and protein tyrosine phosphatase 1B with selected terpenoids from African antidiabetic plants.

Results: Using molecular docking, molecular dynamics simulations, molecular mechanics with generalized Born and surface area solvation-free energy, and density functional theory analyses, the study revealed dipeptidyl peptidase-4 as a promising target. Cucurbitacin B, 6-oxoisoiguesterin, and 20-epi-isoiguesterinol were identified as potential dipeptidyl peptidase-4 inhibitors with strong binding affinities. These triterpenoids interacted with key catalytic and hydrophobic pockets of dipeptidyl peptidase-4, demonstrating structural stability and flexibility under dynamic conditions, as indicated by dynamics simulation parameters. The free energy analysis further supported the binding affinities in dynamic environments. Quantum mechanical calculations revealed favorable highest occupied molecular orbital and lowest unoccupied molecular orbital energy profiles, indicating the suitability of the hits as proton donors and acceptors, which likely enhance their molecular interactions with the targets. Moreover, the terpenoids showed desirable drug-like properties, suggesting their potential as safe and effective dipeptidyl peptidase-4 inhibitors. These findings may pave the way for the development of novel antidiabetic agents and nutraceuticals based on these promising in silico hits.

Availability and implementation: Not applicable.

{"title":"Identification of promising dipeptidyl peptidase-4 and protein tyrosine phosphatase 1B inhibitors from selected terpenoids through molecular modeling.","authors":"Oludare M Ogunyemi, Gideon A Gyebi, Femi Olawale, Ibrahim M Ibrahim, Opeyemi Iwaloye, Modupe M Fabusiwa, Stephen Omowaye, Omotade I Oloyede, Charles O Olaiya","doi":"10.1093/bioadv/vbae205","DOIUrl":"10.1093/bioadv/vbae205","url":null,"abstract":"Motivation: Investigating novel drug-target interactions is crucial for expanding the chemical space of emerging therapeutic targets in human diseases. Herein, we explored the interactions of dipeptidyl peptidase-4 and protein tyrosine phosphatase 1B with selected terpenoids from African antidiabetic plants.Results: Using molecular docking, molecular dynamics simulations, molecular mechanics with generalized Born and surface area solvation-free energy, and density functional theory analyses, the study revealed dipeptidyl peptidase-4 as a promising target. Cucurbitacin B, 6-oxoisoiguesterin, and 20-epi-isoiguesterinol were identified as potential dipeptidyl peptidase-4 inhibitors with strong binding affinities. These triterpenoids interacted with key catalytic and hydrophobic pockets of dipeptidyl peptidase-4, demonstrating structural stability and flexibility under dynamic conditions, as indicated by dynamics simulation parameters. The free energy analysis further supported the binding affinities in dynamic environments. Quantum mechanical calculations revealed favorable highest occupied molecular orbital and lowest unoccupied molecular orbital energy profiles, indicating the suitability of the hits as proton donors and acceptors, which likely enhance their molecular interactions with the targets. Moreover, the terpenoids showed desirable drug-like properties, suggesting their potential as safe and effective dipeptidyl peptidase-4 inhibitors. These findings may pave the way for the development of novel antidiabetic agents and nutraceuticals based on these promising in silico hits.Availability and implementation: Not applicable.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae205"},"PeriodicalIF":2.4,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11751579/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comparative analysis of gene expression profiling by statistical and machine learning approaches.

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances

Pub Date : 2024-12-18 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbae199

Myriam Bontonou, Anaïs Haget, Maria Boulougouri, Benjamin Audit, Pierre Borgnat, Jean-Michel Arbona

Motivation: Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example.

Results: Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain.

Availability and implementation: Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics.

{"title":"A comparative analysis of gene expression profiling by statistical and machine learning approaches.","authors":"Myriam Bontonou, Anaïs Haget, Maria Boulougouri, Benjamin Audit, Pierre Borgnat, Jean-Michel Arbona","doi":"10.1093/bioadv/vbae199","DOIUrl":"10.1093/bioadv/vbae199","url":null,"abstract":"Motivation: Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example.Results: Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain.Availability and implementation: Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae199"},"PeriodicalIF":2.4,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783302/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring the role of the Rab network in epithelial-to-mesenchymal transition. 探索 Rab 网络在上皮细胞向间质转化过程中的作用。

IF 2.4 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Bioinformatics advances

Pub Date : 2024-12-14 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbae200

Unmani Jaygude, Graham M Hughes, Jeremy C Simpson

Motivation: Rab GTPases (Rabs) are crucial for membrane trafficking within mammalian cells, and their dysfunction is implicated in many diseases. This gene family plays a role in several crucial cellular processes. Network analyses can uncover the complete repertoire of interaction patterns across the Rab network, informing disease research, opening new opportunities for therapeutic interventions.

Results: We examined Rabs and their interactors in the context of epithelial-to-mesenchymal transition (EMT), an indicator of cancer metastasizing to distant organs. A Rab network was first established from analysis of literature and was gradually expanded. Our Python module, resnet, assessed its network resilience and selected an optimally sized, resilient Rab network for further analyses. Pathway enrichment confirmed its role in EMT. We then identified 73 candidate genes showing a strong up-/down-regulation, across 10 cancer types, in patients with metastasized tumours compared to only primary-site tumours. We suggest that their encoded proteins might play a critical role in EMT, and further in vitro studies are needed to confirm their role as predictive markers of cancer metastasis. The use of resnet within the systematic analysis approach described here can be easily applied to assess other gene families and their role in biological events of interest.

Availability and implementation: Source code for resnet is freely available at https://github.com/Unmani199/resnet.

目的：rabb GTPases （Rabs）在哺乳动物细胞内的膜运输中起着至关重要的作用，其功能障碍与许多疾病有关。这个基因家族在几个关键的细胞过程中起作用。网络分析可以揭示Rab网络中相互作用模式的完整曲目，为疾病研究提供信息，为治疗干预开辟新的机会。结果：我们在上皮-间质转化（EMT）的背景下研究了Rabs及其相互作用物，EMT是癌症转移到远处器官的一个指标。拉布网络首先从文献分析中建立起来，并逐渐扩大。我们的Python模块resnet评估了其网络弹性，并选择了一个最佳大小的弹性Rab网络进行进一步分析。通路富集证实了其在EMT中的作用。然后，我们确定了73个候选基因，在10种癌症类型中，与仅原发部位肿瘤相比，在转移性肿瘤患者中表现出强烈的上调/下调。我们认为它们编码的蛋白可能在EMT中发挥关键作用，需要进一步的体外研究来证实它们作为癌症转移的预测标志物的作用。在这里描述的系统分析方法中使用resnet可以很容易地应用于评估其他基因家族及其在感兴趣的生物学事件中的作用。可用性和实现：resnet的源代码可在https://github.com/Unmani199/resnet免费获得。

{"title":"Exploring the role of the Rab network in epithelial-to-mesenchymal transition.","authors":"Unmani Jaygude, Graham M Hughes, Jeremy C Simpson","doi":"10.1093/bioadv/vbae200","DOIUrl":"10.1093/bioadv/vbae200","url":null,"abstract":"Motivation: Rab GTPases (Rabs) are crucial for membrane trafficking within mammalian cells, and their dysfunction is implicated in many diseases. This gene family plays a role in several crucial cellular processes. Network analyses can uncover the complete repertoire of interaction patterns across the Rab network, informing disease research, opening new opportunities for therapeutic interventions.Results: We examined Rabs and their interactors in the context of epithelial-to-mesenchymal transition (EMT), an indicator of cancer metastasizing to distant organs. A Rab network was first established from analysis of literature and was gradually expanded. Our Python module, resnet, assessed its network resilience and selected an optimally sized, resilient Rab network for further analyses. Pathway enrichment confirmed its role in EMT. We then identified 73 candidate genes showing a strong up-/down-regulation, across 10 cancer types, in patients with metastasized tumours compared to only primary-site tumours. We suggest that their encoded proteins might play a critical role in EMT, and further in vitro studies are needed to confirm their role as predictive markers of cancer metastasis. The use of resnet within the systematic analysis approach described here can be easily applied to assess other gene families and their role in biological events of interest.Availability and implementation: Source code for resnet is freely available at https://github.com/Unmani199/resnet.","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae200"},"PeriodicalIF":2.4,"publicationDate":"2024-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11684074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142907962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0