Pub Date : 2025-03-04DOI: 10.1093/bioinformatics/btaf053
Keisuke Hirota, Felix Salim, Takuji Yamada
Motivation: Progress in sequencing technology has led to determination of large numbers of protein sequences, and large enzyme databases are now available. Although many computational tools for enzyme annotation were developed, sequence information is unavailable for many enzymes, known as orphan enzymes. These orphan enzymes hinder sequence similarity-based functional annotation, leading gaps in understanding the association between sequences and enzymatic reactions.
Results: Therefore, we developed DeepES, a deep learning-based tool for enzyme screening to identify orphan enzyme genes, focusing on biosynthetic gene clusters and reaction class. DeepES uses protein sequences as inputs and evaluates whether the input genes contain biosynthetic gene clusters of interest by integrating the outputs of the binary classifier for each reaction class. The validation results suggested that DeepES can capture functional similarity between protein sequences, and it can be implemented to explore orphan enzyme genes. By applying DeepES to 4744 metagenome-assembled genomes, we identified candidate genes for 236 orphan enzymes, including those involved in short-chain fatty acid production as a characteristic pathway in human gut bacteria.
Availability and implementation: DeepES is available at https://github.com/yamada-lab/DeepES. Model weights and the candidate genes are available at Zenodo (https://doi.org/10.5281/zenodo.11123900).
{"title":"DeepES: deep learning-based enzyme screening to identify orphan enzyme genes.","authors":"Keisuke Hirota, Felix Salim, Takuji Yamada","doi":"10.1093/bioinformatics/btaf053","DOIUrl":"10.1093/bioinformatics/btaf053","url":null,"abstract":"<p><strong>Motivation: </strong>Progress in sequencing technology has led to determination of large numbers of protein sequences, and large enzyme databases are now available. Although many computational tools for enzyme annotation were developed, sequence information is unavailable for many enzymes, known as orphan enzymes. These orphan enzymes hinder sequence similarity-based functional annotation, leading gaps in understanding the association between sequences and enzymatic reactions.</p><p><strong>Results: </strong>Therefore, we developed DeepES, a deep learning-based tool for enzyme screening to identify orphan enzyme genes, focusing on biosynthetic gene clusters and reaction class. DeepES uses protein sequences as inputs and evaluates whether the input genes contain biosynthetic gene clusters of interest by integrating the outputs of the binary classifier for each reaction class. The validation results suggested that DeepES can capture functional similarity between protein sequences, and it can be implemented to explore orphan enzyme genes. By applying DeepES to 4744 metagenome-assembled genomes, we identified candidate genes for 236 orphan enzymes, including those involved in short-chain fatty acid production as a characteristic pathway in human gut bacteria.</p><p><strong>Availability and implementation: </strong>DeepES is available at https://github.com/yamada-lab/DeepES. Model weights and the candidate genes are available at Zenodo (https://doi.org/10.5281/zenodo.11123900).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11881691/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143256487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-04DOI: 10.1093/bioinformatics/btae711
Wei Zhang, Tiantian Liu, Han Zhang, Yuanyuan Li
Motivation: Single-cell RNA sequencing (scRNA-seq) provides a powerful tool for studying cellular heterogeneity and complexity. However, dropout events in single-cell RNA-seq data severely hinder the effectiveness and accuracy of downstream analysis. Therefore, data preprocessing with imputation methods is crucial to scRNA-seq analysis.
Results: To address the issue of oversmoothing in smoothing-based imputation methods, the presented AcImpute, an unsupervised method that enhances imputation accuracy by constraining the smoothing weights among cells for genes with different expression levels. Compared with nine other imputation methods in cluster analysis and trajectory inference, the experimental results can demonstrate that AcImpute effectively restores gene expression, preserves inter-cell variability, preventing oversmoothing and improving clustering and trajectory inference performance.
Availability and implementation: The code is available at https://github.com/Liutto/AcImpute.
{"title":"AcImpute: a constraint-enhancing smooth-based approach for imputing single-cell RNA sequencing data.","authors":"Wei Zhang, Tiantian Liu, Han Zhang, Yuanyuan Li","doi":"10.1093/bioinformatics/btae711","DOIUrl":"10.1093/bioinformatics/btae711","url":null,"abstract":"<p><strong>Motivation: </strong>Single-cell RNA sequencing (scRNA-seq) provides a powerful tool for studying cellular heterogeneity and complexity. However, dropout events in single-cell RNA-seq data severely hinder the effectiveness and accuracy of downstream analysis. Therefore, data preprocessing with imputation methods is crucial to scRNA-seq analysis.</p><p><strong>Results: </strong>To address the issue of oversmoothing in smoothing-based imputation methods, the presented AcImpute, an unsupervised method that enhances imputation accuracy by constraining the smoothing weights among cells for genes with different expression levels. Compared with nine other imputation methods in cluster analysis and trajectory inference, the experimental results can demonstrate that AcImpute effectively restores gene expression, preserves inter-cell variability, preventing oversmoothing and improving clustering and trajectory inference performance.</p><p><strong>Availability and implementation: </strong>The code is available at https://github.com/Liutto/AcImpute.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11890269/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-04DOI: 10.1093/bioinformatics/btae754
Mina Namazi, Mohammadali Farahpoor, Erman Ayday, Fernando Pérez-González
Motivation: The affordability of genome sequencing and the widespread availability of genomic data have opened up new medical possibilities. Nevertheless, they also raise significant concerns regarding privacy due to the sensitive information they encompass. These privacy implications act as barriers to medical research and data availability. Researchers have proposed privacy-preserving techniques to address this, with cryptography-based methods showing the most promise. However, existing cryptography-based designs lack (i) interoperability, (ii) scalability, (iii) a high degree of privacy (i.e. compromise one to have the other), or (iv) multiparty analyses support (as most existing schemes process genomic information of each party individually). Overcoming these limitations is essential to unlocking the full potential of genomic data while ensuring privacy and data utility. Further research and development are needed to advance privacy-preserving techniques in genomics, focusing on achieving interoperability and scalability, preserving data utility, and enabling secure multiparty computation.
Results: This study aims to overcome the limitations of current cryptography-based techniques by employing a multi-key homomorphic encryption scheme. By utilizing this scheme, we have developed a comprehensive protocol capable of conducting diverse genomic analyses. Our protocol facilitates interoperability among individual genome processing and enables multiparty tests, analyses of genomic databases, and operations involving multiple databases. Consequently, our approach represents an innovative advancement in secure genomic data processing, offering enhanced protection and privacy measures.
Availability and implementation: All associated code and documentation are available at https://github.com/farahpoor/smkhe.
{"title":"Privacy-preserving framework for genomic computations via multi-key homomorphic encryption.","authors":"Mina Namazi, Mohammadali Farahpoor, Erman Ayday, Fernando Pérez-González","doi":"10.1093/bioinformatics/btae754","DOIUrl":"10.1093/bioinformatics/btae754","url":null,"abstract":"<p><strong>Motivation: </strong>The affordability of genome sequencing and the widespread availability of genomic data have opened up new medical possibilities. Nevertheless, they also raise significant concerns regarding privacy due to the sensitive information they encompass. These privacy implications act as barriers to medical research and data availability. Researchers have proposed privacy-preserving techniques to address this, with cryptography-based methods showing the most promise. However, existing cryptography-based designs lack (i) interoperability, (ii) scalability, (iii) a high degree of privacy (i.e. compromise one to have the other), or (iv) multiparty analyses support (as most existing schemes process genomic information of each party individually). Overcoming these limitations is essential to unlocking the full potential of genomic data while ensuring privacy and data utility. Further research and development are needed to advance privacy-preserving techniques in genomics, focusing on achieving interoperability and scalability, preserving data utility, and enabling secure multiparty computation.</p><p><strong>Results: </strong>This study aims to overcome the limitations of current cryptography-based techniques by employing a multi-key homomorphic encryption scheme. By utilizing this scheme, we have developed a comprehensive protocol capable of conducting diverse genomic analyses. Our protocol facilitates interoperability among individual genome processing and enables multiparty tests, analyses of genomic databases, and operations involving multiple databases. Consequently, our approach represents an innovative advancement in secure genomic data processing, offering enhanced protection and privacy measures.</p><p><strong>Availability and implementation: </strong>All associated code and documentation are available at https://github.com/farahpoor/smkhe.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11890293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143076693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-04DOI: 10.1093/bioinformatics/btaf087
Ali Hamraoui, Laurent Jourdren, Morgane Thomas-Chollier
Motivation: The combination of long-read sequencing technologies like Oxford Nanopore with single-cell RNA sequencing (scRNAseq) assays enables the detailed exploration of transcriptomic complexity, including isoform detection and quantification, by capturing full-length cDNAs. However, challenges remain, including the lack of advanced simulation tools that can effectively mimic the unique complexities of scRNAseq long-read datasets. Such tools are essential for the evaluation and optimization of isoform detection methods dedicated to single-cell long-read studies.
Results: We developed AsaruSim, a workflow that simulates synthetic single-cell long-read Nanopore datasets, closely mimicking real experimental data. AsaruSim employs a multi-step process that includes the creation of a synthetic count matrix, generation of perfect reads, optional PCR amplification, introduction of sequencing errors, and comprehensive quality control reporting. Applied to a dataset of human peripheral blood mononuclear cells, AsaruSim accurately reproduced experimental read characteristics.
Availability and implementation: The source code and full documentation are available at https://github.com/GenomiqueENS/AsaruSim.
{"title":"AsaruSim: a single-cell and spatial RNA-Seq Nanopore long-reads simulation workflow.","authors":"Ali Hamraoui, Laurent Jourdren, Morgane Thomas-Chollier","doi":"10.1093/bioinformatics/btaf087","DOIUrl":"10.1093/bioinformatics/btaf087","url":null,"abstract":"<p><strong>Motivation: </strong>The combination of long-read sequencing technologies like Oxford Nanopore with single-cell RNA sequencing (scRNAseq) assays enables the detailed exploration of transcriptomic complexity, including isoform detection and quantification, by capturing full-length cDNAs. However, challenges remain, including the lack of advanced simulation tools that can effectively mimic the unique complexities of scRNAseq long-read datasets. Such tools are essential for the evaluation and optimization of isoform detection methods dedicated to single-cell long-read studies.</p><p><strong>Results: </strong>We developed AsaruSim, a workflow that simulates synthetic single-cell long-read Nanopore datasets, closely mimicking real experimental data. AsaruSim employs a multi-step process that includes the creation of a synthetic count matrix, generation of perfect reads, optional PCR amplification, introduction of sequencing errors, and comprehensive quality control reporting. Applied to a dataset of human peripheral blood mononuclear cells, AsaruSim accurately reproduced experimental read characteristics.</p><p><strong>Availability and implementation: </strong>The source code and full documentation are available at https://github.com/GenomiqueENS/AsaruSim.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11897429/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143476919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Motivation: S-sulfhydration, a crucial post-translational protein modification, is pivotal in cellular recognition, signaling processes, and the development and progression of cardiovascular and neurological disorders, so identifying S-sulfhydration sites is crucial for studies in cell biology. Deep learning shows high efficiency and accuracy in identifying protein sites compared to traditional methods that often lack sensitivity and specificity in accurately locating nonsulfhydration sites. Therefore, we employ deep learning methods to tackle the challenge of pinpointing S-sulfhydration sites.
Results: In this work, we introduce a deep learning approach called Sul-BertGRU, designed specifically for predicting S-sulfhydration sites in proteins, which integrates multi-directional gated recurrent unit (GRU) and BERT. First, Sul-BertGRU proposes an information entropy-enhanced BERT (IE-BERT) to preprocess protein sequences and extract initial features. Subsequently, confidence learning is employed to eliminate potential S-sulfhydration samples from the nonsulfhydration samples and select reliable negative samples. Then, considering the directional nature of the modification process, protein sequences are categorized into left, right, and full sequences centered on cysteines. We build a multi-directional GRU to enhance the extraction of directional sequence features and model the details of the enzymatic reaction involved in S-sulfhydration. Ultimately, we apply a parallel multi-head self-attention mechanism alongside a convolutional neural network to deeply analyze sequence features that might be missed at a local level. Sul-BertGRU achieves sensitivity, specificity, precision, accuracy, Matthews correlation coefficient, and area under the curve scores of 85.82%, 68.24%, 74.80%, 77.44%, 55.13%, and 77.03%, respectively. Sul-BertGRU demonstrates exceptional performance and proves to be a reliable method for predicting protein S-sulfhydration sites.
Availability and implementation: The source code and data are available at https://github.com/Severus0902/Sul-BertGRU/.
{"title":"Sul-BertGRU: an ensemble deep learning method integrating information entropy-enhanced BERT and directional multi-GRU for S-sulfhydration sites prediction.","authors":"Xirun Wei, Qiao Ning, Kuiyang Che, Zhaowei Liu, Hui Li, Shikai Guo","doi":"10.1093/bioinformatics/btaf078","DOIUrl":"10.1093/bioinformatics/btaf078","url":null,"abstract":"<p><strong>Motivation: </strong>S-sulfhydration, a crucial post-translational protein modification, is pivotal in cellular recognition, signaling processes, and the development and progression of cardiovascular and neurological disorders, so identifying S-sulfhydration sites is crucial for studies in cell biology. Deep learning shows high efficiency and accuracy in identifying protein sites compared to traditional methods that often lack sensitivity and specificity in accurately locating nonsulfhydration sites. Therefore, we employ deep learning methods to tackle the challenge of pinpointing S-sulfhydration sites.</p><p><strong>Results: </strong>In this work, we introduce a deep learning approach called Sul-BertGRU, designed specifically for predicting S-sulfhydration sites in proteins, which integrates multi-directional gated recurrent unit (GRU) and BERT. First, Sul-BertGRU proposes an information entropy-enhanced BERT (IE-BERT) to preprocess protein sequences and extract initial features. Subsequently, confidence learning is employed to eliminate potential S-sulfhydration samples from the nonsulfhydration samples and select reliable negative samples. Then, considering the directional nature of the modification process, protein sequences are categorized into left, right, and full sequences centered on cysteines. We build a multi-directional GRU to enhance the extraction of directional sequence features and model the details of the enzymatic reaction involved in S-sulfhydration. Ultimately, we apply a parallel multi-head self-attention mechanism alongside a convolutional neural network to deeply analyze sequence features that might be missed at a local level. Sul-BertGRU achieves sensitivity, specificity, precision, accuracy, Matthews correlation coefficient, and area under the curve scores of 85.82%, 68.24%, 74.80%, 77.44%, 55.13%, and 77.03%, respectively. Sul-BertGRU demonstrates exceptional performance and proves to be a reliable method for predicting protein S-sulfhydration sites.</p><p><strong>Availability and implementation: </strong>The source code and data are available at https://github.com/Severus0902/Sul-BertGRU/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11908646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143470197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-04DOI: 10.1093/bioinformatics/btaf097
Brent S Pedersen, Aaron R Quinlan
Motivation: Variant call format (VCF) files are the standard output format for various software tools that identify genetic variation from DNA sequencing experiments. Downstream analyses require the ability to query, filter, and modify them simply and efficiently. Several tools are available to perform these operations from the command line, including BCFTools, vembrane, slivar, and others.
Results: Here, we introduce vcfexpress, a new, high-performance toolset for the analysis of VCF files, written in the Rust programming language. It is nearly as fast as BCFTools, but adds functionality to execute user expressions in the lua programming language for precise filtering and reporting of variants from a VCF or BCF file. We demonstrate performance and flexibility by comparing vcfexpress to other tools using the vembrane benchmark.
Availability and implementation: vcfexpress is available under the MIT license at https://github.com/brentp/vcfexpress with code used for the manuscript deposited in https://doi.org/10.5281/zenodo.14756838.
{"title":"Vcfexpress: flexible, rapid user-expressions to filter and format VCFs.","authors":"Brent S Pedersen, Aaron R Quinlan","doi":"10.1093/bioinformatics/btaf097","DOIUrl":"10.1093/bioinformatics/btaf097","url":null,"abstract":"<p><strong>Motivation: </strong>Variant call format (VCF) files are the standard output format for various software tools that identify genetic variation from DNA sequencing experiments. Downstream analyses require the ability to query, filter, and modify them simply and efficiently. Several tools are available to perform these operations from the command line, including BCFTools, vembrane, slivar, and others.</p><p><strong>Results: </strong>Here, we introduce vcfexpress, a new, high-performance toolset for the analysis of VCF files, written in the Rust programming language. It is nearly as fast as BCFTools, but adds functionality to execute user expressions in the lua programming language for precise filtering and reporting of variants from a VCF or BCF file. We demonstrate performance and flexibility by comparing vcfexpress to other tools using the vembrane benchmark.</p><p><strong>Availability and implementation: </strong>vcfexpress is available under the MIT license at https://github.com/brentp/vcfexpress with code used for the manuscript deposited in https://doi.org/10.5281/zenodo.14756838.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11904302/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-04DOI: 10.1093/bioinformatics/btaf080
Jaemin Jeon, Suwan Yu, Sangam Lee, Sang Cheol Kim, Hye-Yeong Jo, Inuk Jung, Kwangsoo Kim
Motivation: Correctly identifying epitope-binding T-cell receptors (TCRs) is important to both understand their underlying biological mechanism in association to some phenotype and accordingly develop T-cell mediated immunotherapy treatments. Although the importance of the CDR3 region in TCRs for epitope recognition is well recognized, methods for profiling their interactions in association to a certain disease or phenotype remains less studied. We developed EpicPred to identify phenotype-specific TCR-epitope interactions. EpicPred first predicts and removes unlikely TCR-epitope interactions to reduce false positives using the Open-set Recognition (OSR). Subsequently, multiple instance learning was used to identify TCR-epitope interactions specific to a cancer type or severity levels of COVID-19 infected patients.
Results: From six public TCR databases, 244 552 TCR sequences and 105 unique epitopes were used to predict epitope-binding TCRs and to filter out non-epitope-binding TCRs using the OSR method. The predicted interactions were used to further predict the phenotype groups in two cancer and four COVID-19 TCR-seq datasets of both bulk and single-cell resolution. EpicPred outperformed the competing methods in predicting the phenotypes, achieving an average AUROC of 0.80 ± 0.07.
Availability and implementation: The EpicPred Software is available at https://github.com/jaeminjj/EpicPred.
{"title":"EpicPred: predicting phenotypes driven by epitope-binding TCRs using attention-based multiple instance learning.","authors":"Jaemin Jeon, Suwan Yu, Sangam Lee, Sang Cheol Kim, Hye-Yeong Jo, Inuk Jung, Kwangsoo Kim","doi":"10.1093/bioinformatics/btaf080","DOIUrl":"10.1093/bioinformatics/btaf080","url":null,"abstract":"<p><strong>Motivation: </strong>Correctly identifying epitope-binding T-cell receptors (TCRs) is important to both understand their underlying biological mechanism in association to some phenotype and accordingly develop T-cell mediated immunotherapy treatments. Although the importance of the CDR3 region in TCRs for epitope recognition is well recognized, methods for profiling their interactions in association to a certain disease or phenotype remains less studied. We developed EpicPred to identify phenotype-specific TCR-epitope interactions. EpicPred first predicts and removes unlikely TCR-epitope interactions to reduce false positives using the Open-set Recognition (OSR). Subsequently, multiple instance learning was used to identify TCR-epitope interactions specific to a cancer type or severity levels of COVID-19 infected patients.</p><p><strong>Results: </strong>From six public TCR databases, 244 552 TCR sequences and 105 unique epitopes were used to predict epitope-binding TCRs and to filter out non-epitope-binding TCRs using the OSR method. The predicted interactions were used to further predict the phenotype groups in two cancer and four COVID-19 TCR-seq datasets of both bulk and single-cell resolution. EpicPred outperformed the competing methods in predicting the phenotypes, achieving an average AUROC of 0.80 ± 0.07.</p><p><strong>Availability and implementation: </strong>The EpicPred Software is available at https://github.com/jaeminjj/EpicPred.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11879650/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143470136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-04DOI: 10.1093/bioinformatics/btaf083
Xindian Wei, Tianyi Chen, Xibiao Wang, Wenjun Shen, Cheng Liu, Si Wu, Hau-San Wong
Motivation: Single-cell RNA sequencing (scRNA-seq) enables high-throughput transcriptomic profiling at single-cell resolution. The inherent spatial location is crucial for understanding how single cells orchestrate multicellular functions and drive diseases. However, spatial information is often lost during tissue dissociation. Spatial transcriptomic (ST) technologies can provide precise spatial gene expression atlas, while their practicality is constrained by the number of genes they can assay or the associated costs at a larger scale and the fine-grained cell-type annotation. By transferring knowledge between scRNA-seq and ST data through cell correspondence learning, it is possible to recover the spatial properties inherent in scRNA-seq datasets.
Results: In this study, we introduce COME, a COntrastive Mapping lEarning approach that learns mapping between ST and scRNA-seq data to recover the spatial information of scRNA-seq data. Extensive experiments demonstrate that the proposed COME method effectively captures precise cell-spot relationships and outperforms previous methods in recovering spatial location for scRNA-seq data. More importantly, our method is capable of precisely identifying biologically meaningful information within the data, such as the spatial structure of missing genes, spatial hierarchical patterns, and the cell-type compositions for each spot. These results indicate that the proposed COME method can help to understand the heterogeneity and activities among cells within tissue environments.
Availability and implementation: The COME is freely available in GitHub (https://github.com/cindyway/COME).
{"title":"COME: contrastive mapping learning for spatial reconstruction of single-cell RNA sequencing data.","authors":"Xindian Wei, Tianyi Chen, Xibiao Wang, Wenjun Shen, Cheng Liu, Si Wu, Hau-San Wong","doi":"10.1093/bioinformatics/btaf083","DOIUrl":"10.1093/bioinformatics/btaf083","url":null,"abstract":"<p><strong>Motivation: </strong>Single-cell RNA sequencing (scRNA-seq) enables high-throughput transcriptomic profiling at single-cell resolution. The inherent spatial location is crucial for understanding how single cells orchestrate multicellular functions and drive diseases. However, spatial information is often lost during tissue dissociation. Spatial transcriptomic (ST) technologies can provide precise spatial gene expression atlas, while their practicality is constrained by the number of genes they can assay or the associated costs at a larger scale and the fine-grained cell-type annotation. By transferring knowledge between scRNA-seq and ST data through cell correspondence learning, it is possible to recover the spatial properties inherent in scRNA-seq datasets.</p><p><strong>Results: </strong>In this study, we introduce COME, a COntrastive Mapping lEarning approach that learns mapping between ST and scRNA-seq data to recover the spatial information of scRNA-seq data. Extensive experiments demonstrate that the proposed COME method effectively captures precise cell-spot relationships and outperforms previous methods in recovering spatial location for scRNA-seq data. More importantly, our method is capable of precisely identifying biologically meaningful information within the data, such as the spatial structure of missing genes, spatial hierarchical patterns, and the cell-type compositions for each spot. These results indicate that the proposed COME method can help to understand the heterogeneity and activities among cells within tissue environments.</p><p><strong>Availability and implementation: </strong>The COME is freely available in GitHub (https://github.com/cindyway/COME).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11897431/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143485016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-04DOI: 10.1093/bioinformatics/btaf091
Kari Lavikka, Altti Ilari Maarala, Jaana Oikkonen, Sampsa Hautaniemi
Summary: Spatial and temporal intra-tumor heterogeneity drives tumor evolution and therapy resistance. Existing visualization tools often fail to capture both dimensions simultaneously. To address this, we developed Jellyfish, a tool that integrates phylogenetic and sample trees into a single plot, providing a holistic view of tumor evolution and capturing both spatial and temporal evolution. Available as a JavaScript library and R package, Jellyfish generates interactive visualizations from tumor phylogeny and clonal composition data. We demonstrate its ability to visualize complex subclonal dynamics using data from ovarian high-grade serous carcinoma.
Availability and implementation: Jellyfish is freely available with MIT license at https://github.com/HautaniemiLab/jellyfish (JavaScript library) and https://github.com/HautaniemiLab/jellyfisher (R package).
{"title":"Jellyfish: integrative visualization of spatio-temporal tumor evolution and clonal dynamics.","authors":"Kari Lavikka, Altti Ilari Maarala, Jaana Oikkonen, Sampsa Hautaniemi","doi":"10.1093/bioinformatics/btaf091","DOIUrl":"10.1093/bioinformatics/btaf091","url":null,"abstract":"<p><strong>Summary: </strong>Spatial and temporal intra-tumor heterogeneity drives tumor evolution and therapy resistance. Existing visualization tools often fail to capture both dimensions simultaneously. To address this, we developed Jellyfish, a tool that integrates phylogenetic and sample trees into a single plot, providing a holistic view of tumor evolution and capturing both spatial and temporal evolution. Available as a JavaScript library and R package, Jellyfish generates interactive visualizations from tumor phylogeny and clonal composition data. We demonstrate its ability to visualize complex subclonal dynamics using data from ovarian high-grade serous carcinoma.</p><p><strong>Availability and implementation: </strong>Jellyfish is freely available with MIT license at https://github.com/HautaniemiLab/jellyfish (JavaScript library) and https://github.com/HautaniemiLab/jellyfisher (R package).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11897425/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143506725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-04DOI: 10.1093/bioinformatics/btaf077
Vikash Pandey
Motivation: Modeling genome-scale metabolic networks (GEMs) helps understand metabolic fluxes in cells at a specific state under defined environmental conditions or perturbations. Elementary flux modes (EFMs) are powerful tools for simplifying complex metabolic networks into smaller, more manageable pathways. However, the enumeration of all EFMs, especially within GEMs, poses significant challenges due to computational complexity. Additionally, traditional EFM approaches often fail to capture essential aspects of metabolism, such as co-factor balancing and by-product generation. The previously developed Minimum Network Enrichment Analysis (MiNEA) method addresses these limitations by enumerating alternative minimal networks for given biomass building blocks and metabolic tasks. MiNEA facilitates a deeper understanding of metabolic task flexibility and context-specific metabolic routes by integrating condition-specific transcriptomics, proteomics, and metabolomics data. This approach offers significant improvements in the analysis of metabolic pathways, providing more comprehensive insights into cellular metabolism.
Results: Here, I present MiNEApy, a Python package reimplementation of MiNEA, which computes minimal networks and performs enrichment analysis. I demonstrate the application of MiNEApy on both a small-scale and a genome-scale model of the bacterium Escherichia coli, showcasing its ability to conduct minimal network enrichment analysis using minimal networks and context-specific data.
Availability and implementation: MiNEApy can be accessed at: https://github.com/vpandey-om/mineapy.
{"title":"MiNEApy: enhancing enrichment network analysis in metabolic networks.","authors":"Vikash Pandey","doi":"10.1093/bioinformatics/btaf077","DOIUrl":"10.1093/bioinformatics/btaf077","url":null,"abstract":"<p><strong>Motivation: </strong>Modeling genome-scale metabolic networks (GEMs) helps understand metabolic fluxes in cells at a specific state under defined environmental conditions or perturbations. Elementary flux modes (EFMs) are powerful tools for simplifying complex metabolic networks into smaller, more manageable pathways. However, the enumeration of all EFMs, especially within GEMs, poses significant challenges due to computational complexity. Additionally, traditional EFM approaches often fail to capture essential aspects of metabolism, such as co-factor balancing and by-product generation. The previously developed Minimum Network Enrichment Analysis (MiNEA) method addresses these limitations by enumerating alternative minimal networks for given biomass building blocks and metabolic tasks. MiNEA facilitates a deeper understanding of metabolic task flexibility and context-specific metabolic routes by integrating condition-specific transcriptomics, proteomics, and metabolomics data. This approach offers significant improvements in the analysis of metabolic pathways, providing more comprehensive insights into cellular metabolism.</p><p><strong>Results: </strong>Here, I present MiNEApy, a Python package reimplementation of MiNEA, which computes minimal networks and performs enrichment analysis. I demonstrate the application of MiNEApy on both a small-scale and a genome-scale model of the bacterium Escherichia coli, showcasing its ability to conduct minimal network enrichment analysis using minimal networks and context-specific data.</p><p><strong>Availability and implementation: </strong>MiNEApy can be accessed at: https://github.com/vpandey-om/mineapy.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11889450/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143476937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}