Pub Date : 2025-01-17eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf004
Bhavya Papudeshi, Michael J Roach, Vijini Mallawaarachchi, George Bouras, Susanna R Grigson, Sarah K Giles, Clarice M Harker, Abbey L K Hutton, Anita Tarasenko, Laura K Inglis, Alejandro A Vega, Cole Souza, Lance Boling, Hamza Hajama, Ana Georgina Cobián Güemes, Anca M Segall, Elizabeth A Dinsdale, Robert A Edwards
Motivation: Phage therapy offers a viable alternative for bacterial infections amid rising antimicrobial resistance. Its success relies on selecting safe and effective phage candidates that require comprehensive genomic screening to identify potential risks. However, this process is often labor intensive and time-consuming, hindering rapid clinical deployment.
Results: We developed Sphae, an automated bioinformatics pipeline designed to streamline the therapeutic potential of a phage in under 10 minutes. Using Snakemake workflow manager, Sphae integrates tools for quality control, assembly, genome assessment, and annotation tailored specifically for phage biology. Sphae automates the detection of key genomic markers, including virulence factors, antimicrobial resistance genes, and lysogeny indicators such as integrase, recombinase, and transposase, which could preclude therapeutic use. Among the 65 phage sequences analyzed, 28 showed therapeutic potential, 8 failed due to low sequencing depth, 22 contained prophage or virulent markers, and 23 had multiple phage genomes. This workflow produces a report to assess phage safety and therapy suitability quickly. Sphae is scalable and portable, facilitating efficient deployment across most high-performance computing and cloud platforms, accelerating the genomic evaluation process.
Availability and implementation: Sphae source code is freely available at https://github.com/linsalrob/sphae, with installation supported on Conda, PyPi, Docker containers.
{"title":"Sphae: an automated toolkit for predicting phage therapy candidates from sequencing data.","authors":"Bhavya Papudeshi, Michael J Roach, Vijini Mallawaarachchi, George Bouras, Susanna R Grigson, Sarah K Giles, Clarice M Harker, Abbey L K Hutton, Anita Tarasenko, Laura K Inglis, Alejandro A Vega, Cole Souza, Lance Boling, Hamza Hajama, Ana Georgina Cobián Güemes, Anca M Segall, Elizabeth A Dinsdale, Robert A Edwards","doi":"10.1093/bioadv/vbaf004","DOIUrl":"10.1093/bioadv/vbaf004","url":null,"abstract":"<p><strong>Motivation: </strong>Phage therapy offers a viable alternative for bacterial infections amid rising antimicrobial resistance. Its success relies on selecting safe and effective phage candidates that require comprehensive genomic screening to identify potential risks. However, this process is often labor intensive and time-consuming, hindering rapid clinical deployment.</p><p><strong>Results: </strong>We developed Sphae, an automated bioinformatics pipeline designed to streamline the therapeutic potential of a phage in under 10 minutes. Using Snakemake workflow manager, Sphae integrates tools for quality control, assembly, genome assessment, and annotation tailored specifically for phage biology. Sphae automates the detection of key genomic markers, including virulence factors, antimicrobial resistance genes, and lysogeny indicators such as integrase, recombinase, and transposase, which could preclude therapeutic use. Among the 65 phage sequences analyzed, 28 showed therapeutic potential, 8 failed due to low sequencing depth, 22 contained prophage or virulent markers, and 23 had multiple phage genomes. This workflow produces a report to assess phage safety and therapy suitability quickly. Sphae is scalable and portable, facilitating efficient deployment across most high-performance computing and cloud platforms, accelerating the genomic evaluation process.</p><p><strong>Availability and implementation: </strong>Sphae source code is freely available at https://github.com/linsalrob/sphae, with installation supported on Conda, PyPi, Docker containers.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf004"},"PeriodicalIF":2.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783317/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-16eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf005
Zoe Saßmannshausen, Lisa Blank, Llorenç Solé-Boldo, Frank Lyko, Günter Raddatz
Motivation: Since their introduction about 10 years ago, methylation clocks have provided broad insights into the biological age of different species, tissues, and in the context of several diseases or aging. However, their application to single-cell methylation data remains a major challenge, because of the inherent sparsity of such data, as many CpG sites are not covered. A methylation clock applicable on single-cell level could help to further disentangle the processes that drive the ticking of epigenetic clocks.
Results: We have developed estiMAge ("estimation of Methylation Age"), a framework that exploits redundancy in methylation data to substitute missing CpGs of trained methylation clocks in single cells. Using Euclidean distance as a measure of similarity, we determine which CpGs covary with the required CpG sites of an epigenetic clock and can be used as surrogates for clock CpGs not covered in single-cell experiments. estiMAge is thus a tool that can be applied to standard epigenetic clocks built on elastic net regression, to achieve bulk and single-cell resolution. We show that estiMAge can accurately predict the ages of young and old hepatocytes and can be used to generate single-cell versions of publicly available epigenetic clocks.
Availability and implementation: The source code and instructions for usage of estiMAge are available at https://github.com/DivEpigenetics/estiMAge.
{"title":"estiMAge: development of a DNA methylation clock to estimate the methylation age of single cells.","authors":"Zoe Saßmannshausen, Lisa Blank, Llorenç Solé-Boldo, Frank Lyko, Günter Raddatz","doi":"10.1093/bioadv/vbaf005","DOIUrl":"10.1093/bioadv/vbaf005","url":null,"abstract":"<p><strong>Motivation: </strong>Since their introduction about 10 years ago, methylation clocks have provided broad insights into the biological age of different species, tissues, and in the context of several diseases or aging. However, their application to single-cell methylation data remains a major challenge, because of the inherent sparsity of such data, as many CpG sites are not covered. A methylation clock applicable on single-cell level could help to further disentangle the processes that drive the ticking of epigenetic clocks.</p><p><strong>Results: </strong>We have developed estiMAge (\"estimation of Methylation Age\"), a framework that exploits redundancy in methylation data to substitute missing CpGs of trained methylation clocks in single cells. Using Euclidean distance as a measure of similarity, we determine which CpGs covary with the required CpG sites of an epigenetic clock and can be used as surrogates for clock CpGs not covered in single-cell experiments. estiMAge is thus a tool that can be applied to standard epigenetic clocks built on elastic net regression, to achieve bulk and single-cell resolution. We show that estiMAge can accurately predict the ages of young and old hepatocytes and can be used to generate single-cell versions of publicly available epigenetic clocks.</p><p><strong>Availability and implementation: </strong>The source code and instructions for usage of estiMAge are available at https://github.com/DivEpigenetics/estiMAge.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf005"},"PeriodicalIF":2.4,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11769677/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-07eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf001
Yunrui Lu, Serin Han, Aruesha Srivastava, Neha Shaik, Matthew Chan, Alos Diallo, Naina Kumar, Nishita Paruchuri, Hrishikesh Deosthali, Vismay Ravikumar, Kevin Cornell, Elijah Stommel, Tracy Punshon, Brian Jackson, Fred Kolling, Linda Vahdat, Louis Vaickus, Jonathan Marotti, Sunita Ho, Joshua Levy
Summary: Elemental imaging provides detailed profiling of metal bioaccumulation, offering more precision than bulk analysis by targeting specific tissue areas. However, accurately identifying comparable tissue regions from elemental maps is challenging, requiring the integration of hematoxylin and eosin (H&E) slides for effective comparison. Facilitating the streamlined co-registration of whole slide images (WSI) and elemental maps, TRACE enhances the analysis of tissue regions and elemental abundance in various pathological conditions. Through an interactive containerized web application, TRACE features real-time annotation editing, advanced statistical tools, and data export, supporting comprehensive spatial analysis. Notably, it allows for comparison of elemental abundances across annotated tissue structures and enables integration with other spatial data types through WSI co-registration.
Availability and implementation: Available on the following platforms-GitHub: jlevy44/trace_app, PyPI: trace_app, Docker: joshualevy44/trace_app, Singularity: docker://joshualevy44/trace_app.
{"title":"Integrative co-registration of elemental imaging and histopathology for enhanced spatial multimodal analysis of tissue sections through TRACE.","authors":"Yunrui Lu, Serin Han, Aruesha Srivastava, Neha Shaik, Matthew Chan, Alos Diallo, Naina Kumar, Nishita Paruchuri, Hrishikesh Deosthali, Vismay Ravikumar, Kevin Cornell, Elijah Stommel, Tracy Punshon, Brian Jackson, Fred Kolling, Linda Vahdat, Louis Vaickus, Jonathan Marotti, Sunita Ho, Joshua Levy","doi":"10.1093/bioadv/vbaf001","DOIUrl":"10.1093/bioadv/vbaf001","url":null,"abstract":"<p><strong>Summary: </strong>Elemental imaging provides detailed profiling of metal bioaccumulation, offering more precision than bulk analysis by targeting specific tissue areas. However, accurately identifying comparable tissue regions from elemental maps is challenging, requiring the integration of hematoxylin and eosin (H&E) slides for effective comparison. Facilitating the streamlined co-registration of whole slide images (WSI) and elemental maps, TRACE enhances the analysis of tissue regions and elemental abundance in various pathological conditions. Through an interactive containerized web application, TRACE features real-time annotation editing, advanced statistical tools, and data export, supporting comprehensive spatial analysis. Notably, it allows for comparison of elemental abundances across annotated tissue structures and enables integration with other spatial data types through WSI co-registration.</p><p><strong>Availability and implementation: </strong>Available on the following platforms-GitHub: <i>jlevy44/trace_app</i>, PyPI: <i>trace_app</i>, Docker: <i>joshualevy44/trace_app</i>, Singularity: docker://<i>joshualevy44/trace_app</i>.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf001"},"PeriodicalIF":2.4,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11742137/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143017257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-31eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbae210
Rishabh Narayanan, William DeGroat, Elizabeth Peker, Saman Zeeshan, Zeeshan Ahmed
Motivation: The analysis of high-quality genomic variant data may offer a more complete understanding of the human genome, enabling researchers to identify novel biomarkers, stratify patients based on disease risk factors, and decipher underlying biological pathways. Although the availability of genomic data has sharply increased in recent years, the accessibility of bioinformatic tools to aid in its preparation is still lacking. Limitations with processing genomic data primarily include its large volume, associated computational and storage costs, and difficulty in identifying targeted and relevant information.
Results: We present VAREANT, an accessible and configurable bioinformatic application to support the preparation of variant data into a usable analysis-ready format. VAREANT is comprised of three standalone modules: (i) Pre-processing, (ii) Variant Annotation, (iii) AI/ML Data Preparation. Pre-processing supports the fine-grained filtering of complex variant datasets to eliminate extraneous data. Variant Annotation allows for the addition of variant metadata from the latest public annotation databases for subsequent analysis and interpretation. AI/ML Data Preparation supports the user in creating AI/ML-ready datasets suitable for immediate analysis with minimal pre-processing required. We have successfully tested and validated our tool on numerous variable-sized datasets and implemented VAREANT in two case studies involving patients with cardiovascular diseases.
Availability and implementation: The open-source code of VAREANT is available at GitHub: https://github.com/drzeeshanahmed/Gene_VAREANT.
{"title":"VAREANT: a bioinformatics application for gene variant reduction and annotation.","authors":"Rishabh Narayanan, William DeGroat, Elizabeth Peker, Saman Zeeshan, Zeeshan Ahmed","doi":"10.1093/bioadv/vbae210","DOIUrl":"10.1093/bioadv/vbae210","url":null,"abstract":"<p><strong>Motivation: </strong>The analysis of high-quality genomic variant data may offer a more complete understanding of the human genome, enabling researchers to identify novel biomarkers, stratify patients based on disease risk factors, and decipher underlying biological pathways. Although the availability of genomic data has sharply increased in recent years, the accessibility of bioinformatic tools to aid in its preparation is still lacking. Limitations with processing genomic data primarily include its large volume, associated computational and storage costs, and difficulty in identifying targeted and relevant information.</p><p><strong>Results: </strong>We present VAREANT, an accessible and configurable bioinformatic application to support the preparation of variant data into a usable analysis-ready format. VAREANT is comprised of three standalone modules: (i) Pre-processing, (ii) Variant Annotation, (iii) AI/ML Data Preparation. Pre-processing supports the fine-grained filtering of complex variant datasets to eliminate extraneous data. Variant Annotation allows for the addition of variant metadata from the latest public annotation databases for subsequent analysis and interpretation. AI/ML Data Preparation supports the user in creating AI/ML-ready datasets suitable for immediate analysis with minimal pre-processing required. We have successfully tested and validated our tool on numerous variable-sized datasets and implemented VAREANT in two case studies involving patients with cardiovascular diseases.</p><p><strong>Availability and implementation: </strong>The open-source code of VAREANT is available at GitHub: https://github.com/drzeeshanahmed/Gene_VAREANT.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae210"},"PeriodicalIF":2.4,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11802749/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143384256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-30eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbae184
Orhan Sari, Ziying Liu, Youlian Pan, Xiaojian Shao
Motivation: Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 system is a ground-breaking genome editing tool, which has revolutionized cell and gene therapies. One of the essential components involved in this system that ensures its success is the design of an optimal single-guide RNA (sgRNA) with high on-target cleavage efficiency and low off-target effects. This is challenging as many conditions need to be considered, and empirically testing every design is time-consuming and costly. In silico prediction using machine learning models provides high-performance alternatives.
Results: We present CrisprBERT, a deep learning model incorporating a Bidirectional Encoder Representations from Transformers (BERT) architecture to provide a high-dimensional embedding for paired sgRNA and DNA sequences and Bidirectional Long Short-term Memory networks for learning, to predict the off-target effects of sgRNAs utilizing only the sgRNAs and their paired DNA sequences. We proposed doublet stack encoding to capture the local energy configuration of the Cas9 binding and applied the BERT model to learn the contextual embedding of the doublet pairs. Our results showed that the new model achieved better performance than state-of-the-art deep learning models regarding single split and leave-one-sgRNA-out cross-validations as well as independent testing.
Availability and implementation: The CrisprBERT is available at GitHub: https://github.com/OSsari/CrisprBERT.
{"title":"Predicting CRISPR-Cas9 off-target effects in human primary cells using bidirectional LSTM with BERT embedding.","authors":"Orhan Sari, Ziying Liu, Youlian Pan, Xiaojian Shao","doi":"10.1093/bioadv/vbae184","DOIUrl":"https://doi.org/10.1093/bioadv/vbae184","url":null,"abstract":"<p><strong>Motivation: </strong>Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas9 system is a ground-breaking genome editing tool, which has revolutionized cell and gene therapies. One of the essential components involved in this system that ensures its success is the design of an optimal single-guide RNA (sgRNA) with high on-target cleavage efficiency and low off-target effects. This is challenging as many conditions need to be considered, and empirically testing every design is time-consuming and costly. <i>In silico</i> prediction using machine learning models provides high-performance alternatives.</p><p><strong>Results: </strong>We present CrisprBERT, a deep learning model incorporating a Bidirectional Encoder Representations from Transformers (BERT) architecture to provide a high-dimensional embedding for paired sgRNA and DNA sequences and Bidirectional Long Short-term Memory networks for learning, to predict the off-target effects of sgRNAs utilizing only the sgRNAs and their paired DNA sequences. We proposed doublet stack encoding to capture the local energy configuration of the Cas9 binding and applied the BERT model to learn the contextual embedding of the doublet pairs. Our results showed that the new model achieved better performance than state-of-the-art deep learning models regarding single split and leave-one-sgRNA-out cross-validations as well as independent testing.</p><p><strong>Availability and implementation: </strong>The CrisprBERT is available at GitHub: https://github.com/OSsari/CrisprBERT.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae184"},"PeriodicalIF":2.4,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11696696/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142933934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-24eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbae207
Cyprien A Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N Sheth, Guido J Falcone, Julian N Acosta
Motivation: The expansion of genetic association data from genome-wide association studies has increased the importance of methodologies like Polygenic Risk Scores (PRS) and Mendelian Randomization (MR) in genetic epidemiology. However, their application is often impeded by complex, multi-step workflows requiring specialized expertise and the use of disparate tools with varying data formatting requirements. Existing solutions are frequently standalone packages or command-line based-largely due to dependencies on tools like PLINK-limiting accessibility for researchers without computational experience. Given Python's popularity and ease of use, there is a need for an integrated, user-friendly Python toolkit to streamline PRS and MR analyses.
Results: We introduce Genal, a Python package that consolidates SNP-level data handling, cleaning, clumping, PRS computation, and MR analyses into a single, cohesive toolkit. By eliminating the need for multiple R packages and for command-line interaction by wrapping around PLINK, Genal lowers the barrier for medical scientists to perform complex genetic epidemiology studies. Genal draws on concepts from several well-established tools, ensuring that users have access to rigorous statistical techniques in the intuitive Python environment. Additionally, Genal leverages parallel processing for MR methods, including MR-PRESSO, significantly reducing the computational time required for these analyses.
Availability and implementation: The package is available on Pypi (https://pypi.org/project/genal-python/), the code is openly available on Github with a tutorial: https://github.com/CypRiv/genal, and the documentation can be found on readthedocs: https://genal.rtfd.io.
{"title":"Genal: a Python toolkit for genetic risk scoring and Mendelian randomization.","authors":"Cyprien A Rivier, Santiago Clocchiatti-Tuozzo, Shufan Huo, Victor Torres-Lopez, Daniela Renedo, Kevin N Sheth, Guido J Falcone, Julian N Acosta","doi":"10.1093/bioadv/vbae207","DOIUrl":"https://doi.org/10.1093/bioadv/vbae207","url":null,"abstract":"<p><strong>Motivation: </strong>The expansion of genetic association data from genome-wide association studies has increased the importance of methodologies like Polygenic Risk Scores (PRS) and Mendelian Randomization (MR) in genetic epidemiology. However, their application is often impeded by complex, multi-step workflows requiring specialized expertise and the use of disparate tools with varying data formatting requirements. Existing solutions are frequently standalone packages or command-line based-largely due to dependencies on tools like PLINK-limiting accessibility for researchers without computational experience. Given Python's popularity and ease of use, there is a need for an integrated, user-friendly Python toolkit to streamline PRS and MR analyses.</p><p><strong>Results: </strong>We introduce Genal, a Python package that consolidates SNP-level data handling, cleaning, clumping, PRS computation, and MR analyses into a single, cohesive toolkit. By eliminating the need for multiple R packages and for command-line interaction by wrapping around PLINK, Genal lowers the barrier for medical scientists to perform complex genetic epidemiology studies. Genal draws on concepts from several well-established tools, ensuring that users have access to rigorous statistical techniques in the intuitive Python environment. Additionally, Genal leverages parallel processing for MR methods, including MR-PRESSO, significantly reducing the computational time required for these analyses.</p><p><strong>Availability and implementation: </strong>The package is available on Pypi (https://pypi.org/project/genal-python/), the code is openly available on Github with a tutorial: https://github.com/CypRiv/genal, and the documentation can be found on readthedocs: https://genal.rtfd.io.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae207"},"PeriodicalIF":2.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11706532/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142959799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-24eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbae208
Hoang M Ngo, Tamim Khatib, My T Thai, Tamer Kahveci
Motivation: Network motif identification (MI) problem aims to find topological patterns in biological networks. Identifying disjoint motifs is a computationally challenging problem using classical computers. Quantum computers enable solving high complexity problems which do not scale using classical computers. In this article, we develop the first quantum solution, called QOMIC (Quantum Optimization for Motif IdentifiCation), to the MI problem. QOMIC transforms the MI problem using a integer model, which serves as the foundation to develop our quantum solution. We develop and implement the quantum circuit to find motif locations in the given network using this model.
Results: Our experiments demonstrate that QOMIC outperforms the existing solutions developed for the classical computer, in term of motif counts. We also observe that QOMIC can efficiently find motifs in human regulatory networks associated with five neurodegenerative diseases: Alzheimer's, Parkinson's, Huntington's, Amyotrophic Lateral Sclerosis, and Motor Neurone Disease.
Availability and implementation: Our implementation can be found in https://github.com/ngominhhoang/Quantum-Motif-Identification.git.
{"title":"QOMIC: quantum optimization for motif identification.","authors":"Hoang M Ngo, Tamim Khatib, My T Thai, Tamer Kahveci","doi":"10.1093/bioadv/vbae208","DOIUrl":"10.1093/bioadv/vbae208","url":null,"abstract":"<p><strong>Motivation: </strong>Network motif identification (MI) problem aims to find topological patterns in biological networks. Identifying disjoint motifs is a computationally challenging problem using classical computers. Quantum computers enable solving high complexity problems which do not scale using classical computers. In this article, we develop the first quantum solution, called QOMIC (Quantum Optimization for Motif IdentifiCation), to the MI problem. QOMIC transforms the MI problem using a integer model, which serves as the foundation to develop our quantum solution. We develop and implement the quantum circuit to find motif locations in the given network using this model.</p><p><strong>Results: </strong>Our experiments demonstrate that QOMIC outperforms the existing solutions developed for the classical computer, in term of motif counts. We also observe that QOMIC can efficiently find motifs in human regulatory networks associated with five neurodegenerative diseases: Alzheimer's, Parkinson's, Huntington's, Amyotrophic Lateral Sclerosis, and Motor Neurone Disease.</p><p><strong>Availability and implementation: </strong>Our implementation can be found in https://github.com/ngominhhoang/Quantum-Motif-Identification.git.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae208"},"PeriodicalIF":2.4,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11725347/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-19eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbae205
Oludare M Ogunyemi, Gideon A Gyebi, Femi Olawale, Ibrahim M Ibrahim, Opeyemi Iwaloye, Modupe M Fabusiwa, Stephen Omowaye, Omotade I Oloyede, Charles O Olaiya
Motivation: Investigating novel drug-target interactions is crucial for expanding the chemical space of emerging therapeutic targets in human diseases. Herein, we explored the interactions of dipeptidyl peptidase-4 and protein tyrosine phosphatase 1B with selected terpenoids from African antidiabetic plants.
Results: Using molecular docking, molecular dynamics simulations, molecular mechanics with generalized Born and surface area solvation-free energy, and density functional theory analyses, the study revealed dipeptidyl peptidase-4 as a promising target. Cucurbitacin B, 6-oxoisoiguesterin, and 20-epi-isoiguesterinol were identified as potential dipeptidyl peptidase-4 inhibitors with strong binding affinities. These triterpenoids interacted with key catalytic and hydrophobic pockets of dipeptidyl peptidase-4, demonstrating structural stability and flexibility under dynamic conditions, as indicated by dynamics simulation parameters. The free energy analysis further supported the binding affinities in dynamic environments. Quantum mechanical calculations revealed favorable highest occupied molecular orbital and lowest unoccupied molecular orbital energy profiles, indicating the suitability of the hits as proton donors and acceptors, which likely enhance their molecular interactions with the targets. Moreover, the terpenoids showed desirable drug-like properties, suggesting their potential as safe and effective dipeptidyl peptidase-4 inhibitors. These findings may pave the way for the development of novel antidiabetic agents and nutraceuticals based on these promising in silico hits.
Availability and implementation: Not applicable.
{"title":"Identification of promising dipeptidyl peptidase-4 and protein tyrosine phosphatase 1B inhibitors from selected terpenoids through molecular modeling.","authors":"Oludare M Ogunyemi, Gideon A Gyebi, Femi Olawale, Ibrahim M Ibrahim, Opeyemi Iwaloye, Modupe M Fabusiwa, Stephen Omowaye, Omotade I Oloyede, Charles O Olaiya","doi":"10.1093/bioadv/vbae205","DOIUrl":"10.1093/bioadv/vbae205","url":null,"abstract":"<p><strong>Motivation: </strong>Investigating novel drug-target interactions is crucial for expanding the chemical space of emerging therapeutic targets in human diseases. Herein, we explored the interactions of dipeptidyl peptidase-4 and protein tyrosine phosphatase 1B with selected terpenoids from African antidiabetic plants.</p><p><strong>Results: </strong>Using molecular docking, molecular dynamics simulations, molecular mechanics with generalized Born and surface area solvation-free energy, and density functional theory analyses, the study revealed dipeptidyl peptidase-4 as a promising target. Cucurbitacin B, 6-oxoisoiguesterin, and 20-epi-isoiguesterinol were identified as potential dipeptidyl peptidase-4 inhibitors with strong binding affinities. These triterpenoids interacted with key catalytic and hydrophobic pockets of dipeptidyl peptidase-4, demonstrating structural stability and flexibility under dynamic conditions, as indicated by dynamics simulation parameters. The free energy analysis further supported the binding affinities in dynamic environments. Quantum mechanical calculations revealed favorable highest occupied molecular orbital and lowest unoccupied molecular orbital energy profiles, indicating the suitability of the hits as proton donors and acceptors, which likely enhance their molecular interactions with the targets. Moreover, the terpenoids showed desirable drug-like properties, suggesting their potential as safe and effective dipeptidyl peptidase-4 inhibitors. These findings may pave the way for the development of novel antidiabetic agents and nutraceuticals based on these promising <i>in silico</i> hits.</p><p><strong>Availability and implementation: </strong>Not applicable.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae205"},"PeriodicalIF":2.4,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11751579/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-18eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbae199
Myriam Bontonou, Anaïs Haget, Maria Boulougouri, Benjamin Audit, Pierre Borgnat, Jean-Michel Arbona
Motivation: Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example.
Results: Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain.
Availability and implementation: Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics.
{"title":"A comparative analysis of gene expression profiling by statistical and machine learning approaches.","authors":"Myriam Bontonou, Anaïs Haget, Maria Boulougouri, Benjamin Audit, Pierre Borgnat, Jean-Michel Arbona","doi":"10.1093/bioadv/vbae199","DOIUrl":"10.1093/bioadv/vbae199","url":null,"abstract":"<p><strong>Motivation: </strong>Many machine learning (ML) models developed to classify phenotype from gene expression data provide interpretations for their decisions, with the aim of understanding biological processes. For many models, including neural networks, interpretations are lists of genes ranked by their importance for the predictions, with top-ranked genes likely linked to the phenotype. In this article, we discuss the limitations of such approaches using integrated gradient, an explainability method developed for neural networks, as an example.</p><p><strong>Results: </strong>Experiments are performed on RNA sequencing data from public cancer databases. A collection of ML models, including multilayer perceptrons and graph neural networks, are trained to classify samples by cancer type. Gene rankings from integrated gradients are compared to genes highlighted by statistical feature selection methods such as DESeq2 and other learning methods measuring global feature contribution. Experiments show that a small set of top-ranked genes is sufficient to achieve good classification. However, similar performance is possible with lower-ranked genes, although larger sets are required. Moreover, significant differences in top-ranked genes, especially between statistical and learning methods, prevent a comprehensive biological understanding. In conclusion, while these methods identify pathology-specific biomarkers, the completeness of gene sets selected by explainability techniques for understanding biological processes remains uncertain.</p><p><strong>Availability and implementation: </strong>Python code and datasets are available at https://github.com/mbonto/XAI_in_genomics.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae199"},"PeriodicalIF":2.4,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11783302/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143082474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-14eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbae200
Unmani Jaygude, Graham M Hughes, Jeremy C Simpson
Motivation: Rab GTPases (Rabs) are crucial for membrane trafficking within mammalian cells, and their dysfunction is implicated in many diseases. This gene family plays a role in several crucial cellular processes. Network analyses can uncover the complete repertoire of interaction patterns across the Rab network, informing disease research, opening new opportunities for therapeutic interventions.
Results: We examined Rabs and their interactors in the context of epithelial-to-mesenchymal transition (EMT), an indicator of cancer metastasizing to distant organs. A Rab network was first established from analysis of literature and was gradually expanded. Our Python module, resnet, assessed its network resilience and selected an optimally sized, resilient Rab network for further analyses. Pathway enrichment confirmed its role in EMT. We then identified 73 candidate genes showing a strong up-/down-regulation, across 10 cancer types, in patients with metastasized tumours compared to only primary-site tumours. We suggest that their encoded proteins might play a critical role in EMT, and further in vitro studies are needed to confirm their role as predictive markers of cancer metastasis. The use of resnet within the systematic analysis approach described here can be easily applied to assess other gene families and their role in biological events of interest.
Availability and implementation: Source code for resnet is freely available at https://github.com/Unmani199/resnet.
{"title":"Exploring the role of the Rab network in epithelial-to-mesenchymal transition.","authors":"Unmani Jaygude, Graham M Hughes, Jeremy C Simpson","doi":"10.1093/bioadv/vbae200","DOIUrl":"10.1093/bioadv/vbae200","url":null,"abstract":"<p><strong>Motivation: </strong>Rab GTPases (Rabs) are crucial for membrane trafficking within mammalian cells, and their dysfunction is implicated in many diseases. This gene family plays a role in several crucial cellular processes. Network analyses can uncover the complete repertoire of interaction patterns across the Rab network, informing disease research, opening new opportunities for therapeutic interventions.</p><p><strong>Results: </strong>We examined Rabs and their interactors in the context of epithelial-to-mesenchymal transition (EMT), an indicator of cancer metastasizing to distant organs. A Rab network was first established from analysis of literature and was gradually expanded. Our Python module, <i>resnet</i>, assessed its network resilience and selected an optimally sized, resilient Rab network for further analyses. Pathway enrichment confirmed its role in EMT. We then identified 73 candidate genes showing a strong up-/down-regulation, across 10 cancer types, in patients with metastasized tumours compared to only primary-site tumours. We suggest that their encoded proteins might play a critical role in EMT, and further <i>in vitro</i> studies are needed to confirm their role as predictive markers of cancer metastasis. The use of <i>resnet</i> within the systematic analysis approach described here can be easily applied to assess other gene families and their role in biological events of interest.</p><p><strong>Availability and implementation: </strong>Source code for <i>resnet</i> is freely available at https://github.com/Unmani199/resnet.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbae200"},"PeriodicalIF":2.4,"publicationDate":"2024-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11684074/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142907962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}