首页 > 最新文献

Bioinformatics (Oxford, England)最新文献

英文 中文
DeepES: deep learning-based enzyme screening to identify orphan enzyme genes.
Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf053
Keisuke Hirota, Felix Salim, Takuji Yamada

Motivation: Progress in sequencing technology has led to determination of large numbers of protein sequences, and large enzyme databases are now available. Although many computational tools for enzyme annotation were developed, sequence information is unavailable for many enzymes, known as orphan enzymes. These orphan enzymes hinder sequence similarity-based functional annotation, leading gaps in understanding the association between sequences and enzymatic reactions.

Results: Therefore, we developed DeepES, a deep learning-based tool for enzyme screening to identify orphan enzyme genes, focusing on biosynthetic gene clusters and reaction class. DeepES uses protein sequences as inputs and evaluates whether the input genes contain biosynthetic gene clusters of interest by integrating the outputs of the binary classifier for each reaction class. The validation results suggested that DeepES can capture functional similarity between protein sequences, and it can be implemented to explore orphan enzyme genes. By applying DeepES to 4744 metagenome-assembled genomes, we identified candidate genes for 236 orphan enzymes, including those involved in short-chain fatty acid production as a characteristic pathway in human gut bacteria.

Availability and implementation: DeepES is available at https://github.com/yamada-lab/DeepES. Model weights and the candidate genes are available at Zenodo (https://doi.org/10.5281/zenodo.11123900).

{"title":"DeepES: deep learning-based enzyme screening to identify orphan enzyme genes.","authors":"Keisuke Hirota, Felix Salim, Takuji Yamada","doi":"10.1093/bioinformatics/btaf053","DOIUrl":"10.1093/bioinformatics/btaf053","url":null,"abstract":"<p><strong>Motivation: </strong>Progress in sequencing technology has led to determination of large numbers of protein sequences, and large enzyme databases are now available. Although many computational tools for enzyme annotation were developed, sequence information is unavailable for many enzymes, known as orphan enzymes. These orphan enzymes hinder sequence similarity-based functional annotation, leading gaps in understanding the association between sequences and enzymatic reactions.</p><p><strong>Results: </strong>Therefore, we developed DeepES, a deep learning-based tool for enzyme screening to identify orphan enzyme genes, focusing on biosynthetic gene clusters and reaction class. DeepES uses protein sequences as inputs and evaluates whether the input genes contain biosynthetic gene clusters of interest by integrating the outputs of the binary classifier for each reaction class. The validation results suggested that DeepES can capture functional similarity between protein sequences, and it can be implemented to explore orphan enzyme genes. By applying DeepES to 4744 metagenome-assembled genomes, we identified candidate genes for 236 orphan enzymes, including those involved in short-chain fatty acid production as a characteristic pathway in human gut bacteria.</p><p><strong>Availability and implementation: </strong>DeepES is available at https://github.com/yamada-lab/DeepES. Model weights and the candidate genes are available at Zenodo (https://doi.org/10.5281/zenodo.11123900).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11881691/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143256487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AcImpute: a constraint-enhancing smooth-based approach for imputing single-cell RNA sequencing data.
Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btae711
Wei Zhang, Tiantian Liu, Han Zhang, Yuanyuan Li

Motivation: Single-cell RNA sequencing (scRNA-seq) provides a powerful tool for studying cellular heterogeneity and complexity. However, dropout events in single-cell RNA-seq data severely hinder the effectiveness and accuracy of downstream analysis. Therefore, data preprocessing with imputation methods is crucial to scRNA-seq analysis.

Results: To address the issue of oversmoothing in smoothing-based imputation methods, the presented AcImpute, an unsupervised method that enhances imputation accuracy by constraining the smoothing weights among cells for genes with different expression levels. Compared with nine other imputation methods in cluster analysis and trajectory inference, the experimental results can demonstrate that AcImpute effectively restores gene expression, preserves inter-cell variability, preventing oversmoothing and improving clustering and trajectory inference performance.

Availability and implementation: The code is available at https://github.com/Liutto/AcImpute.

{"title":"AcImpute: a constraint-enhancing smooth-based approach for imputing single-cell RNA sequencing data.","authors":"Wei Zhang, Tiantian Liu, Han Zhang, Yuanyuan Li","doi":"10.1093/bioinformatics/btae711","DOIUrl":"10.1093/bioinformatics/btae711","url":null,"abstract":"<p><strong>Motivation: </strong>Single-cell RNA sequencing (scRNA-seq) provides a powerful tool for studying cellular heterogeneity and complexity. However, dropout events in single-cell RNA-seq data severely hinder the effectiveness and accuracy of downstream analysis. Therefore, data preprocessing with imputation methods is crucial to scRNA-seq analysis.</p><p><strong>Results: </strong>To address the issue of oversmoothing in smoothing-based imputation methods, the presented AcImpute, an unsupervised method that enhances imputation accuracy by constraining the smoothing weights among cells for genes with different expression levels. Compared with nine other imputation methods in cluster analysis and trajectory inference, the experimental results can demonstrate that AcImpute effectively restores gene expression, preserves inter-cell variability, preventing oversmoothing and improving clustering and trajectory inference performance.</p><p><strong>Availability and implementation: </strong>The code is available at https://github.com/Liutto/AcImpute.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11890269/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-preserving framework for genomic computations via multi-key homomorphic encryption.
Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btae754
Mina Namazi, Mohammadali Farahpoor, Erman Ayday, Fernando Pérez-González

Motivation: The affordability of genome sequencing and the widespread availability of genomic data have opened up new medical possibilities. Nevertheless, they also raise significant concerns regarding privacy due to the sensitive information they encompass. These privacy implications act as barriers to medical research and data availability. Researchers have proposed privacy-preserving techniques to address this, with cryptography-based methods showing the most promise. However, existing cryptography-based designs lack (i) interoperability, (ii) scalability, (iii) a high degree of privacy (i.e. compromise one to have the other), or (iv) multiparty analyses support (as most existing schemes process genomic information of each party individually). Overcoming these limitations is essential to unlocking the full potential of genomic data while ensuring privacy and data utility. Further research and development are needed to advance privacy-preserving techniques in genomics, focusing on achieving interoperability and scalability, preserving data utility, and enabling secure multiparty computation.

Results: This study aims to overcome the limitations of current cryptography-based techniques by employing a multi-key homomorphic encryption scheme. By utilizing this scheme, we have developed a comprehensive protocol capable of conducting diverse genomic analyses. Our protocol facilitates interoperability among individual genome processing and enables multiparty tests, analyses of genomic databases, and operations involving multiple databases. Consequently, our approach represents an innovative advancement in secure genomic data processing, offering enhanced protection and privacy measures.

Availability and implementation: All associated code and documentation are available at https://github.com/farahpoor/smkhe.

{"title":"Privacy-preserving framework for genomic computations via multi-key homomorphic encryption.","authors":"Mina Namazi, Mohammadali Farahpoor, Erman Ayday, Fernando Pérez-González","doi":"10.1093/bioinformatics/btae754","DOIUrl":"10.1093/bioinformatics/btae754","url":null,"abstract":"<p><strong>Motivation: </strong>The affordability of genome sequencing and the widespread availability of genomic data have opened up new medical possibilities. Nevertheless, they also raise significant concerns regarding privacy due to the sensitive information they encompass. These privacy implications act as barriers to medical research and data availability. Researchers have proposed privacy-preserving techniques to address this, with cryptography-based methods showing the most promise. However, existing cryptography-based designs lack (i) interoperability, (ii) scalability, (iii) a high degree of privacy (i.e. compromise one to have the other), or (iv) multiparty analyses support (as most existing schemes process genomic information of each party individually). Overcoming these limitations is essential to unlocking the full potential of genomic data while ensuring privacy and data utility. Further research and development are needed to advance privacy-preserving techniques in genomics, focusing on achieving interoperability and scalability, preserving data utility, and enabling secure multiparty computation.</p><p><strong>Results: </strong>This study aims to overcome the limitations of current cryptography-based techniques by employing a multi-key homomorphic encryption scheme. By utilizing this scheme, we have developed a comprehensive protocol capable of conducting diverse genomic analyses. Our protocol facilitates interoperability among individual genome processing and enables multiparty tests, analyses of genomic databases, and operations involving multiple databases. Consequently, our approach represents an innovative advancement in secure genomic data processing, offering enhanced protection and privacy measures.</p><p><strong>Availability and implementation: </strong>All associated code and documentation are available at https://github.com/farahpoor/smkhe.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11890293/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143076693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AsaruSim: a single-cell and spatial RNA-Seq Nanopore long-reads simulation workflow.
Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf087
Ali Hamraoui, Laurent Jourdren, Morgane Thomas-Chollier

Motivation: The combination of long-read sequencing technologies like Oxford Nanopore with single-cell RNA sequencing (scRNAseq) assays enables the detailed exploration of transcriptomic complexity, including isoform detection and quantification, by capturing full-length cDNAs. However, challenges remain, including the lack of advanced simulation tools that can effectively mimic the unique complexities of scRNAseq long-read datasets. Such tools are essential for the evaluation and optimization of isoform detection methods dedicated to single-cell long-read studies.

Results: We developed AsaruSim, a workflow that simulates synthetic single-cell long-read Nanopore datasets, closely mimicking real experimental data. AsaruSim employs a multi-step process that includes the creation of a synthetic count matrix, generation of perfect reads, optional PCR amplification, introduction of sequencing errors, and comprehensive quality control reporting. Applied to a dataset of human peripheral blood mononuclear cells, AsaruSim accurately reproduced experimental read characteristics.

Availability and implementation: The source code and full documentation are available at https://github.com/GenomiqueENS/AsaruSim.

{"title":"AsaruSim: a single-cell and spatial RNA-Seq Nanopore long-reads simulation workflow.","authors":"Ali Hamraoui, Laurent Jourdren, Morgane Thomas-Chollier","doi":"10.1093/bioinformatics/btaf087","DOIUrl":"10.1093/bioinformatics/btaf087","url":null,"abstract":"<p><strong>Motivation: </strong>The combination of long-read sequencing technologies like Oxford Nanopore with single-cell RNA sequencing (scRNAseq) assays enables the detailed exploration of transcriptomic complexity, including isoform detection and quantification, by capturing full-length cDNAs. However, challenges remain, including the lack of advanced simulation tools that can effectively mimic the unique complexities of scRNAseq long-read datasets. Such tools are essential for the evaluation and optimization of isoform detection methods dedicated to single-cell long-read studies.</p><p><strong>Results: </strong>We developed AsaruSim, a workflow that simulates synthetic single-cell long-read Nanopore datasets, closely mimicking real experimental data. AsaruSim employs a multi-step process that includes the creation of a synthetic count matrix, generation of perfect reads, optional PCR amplification, introduction of sequencing errors, and comprehensive quality control reporting. Applied to a dataset of human peripheral blood mononuclear cells, AsaruSim accurately reproduced experimental read characteristics.</p><p><strong>Availability and implementation: </strong>The source code and full documentation are available at https://github.com/GenomiqueENS/AsaruSim.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11897429/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143476919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sul-BertGRU: an ensemble deep learning method integrating information entropy-enhanced BERT and directional multi-GRU for S-sulfhydration sites prediction.
Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf078
Xirun Wei, Qiao Ning, Kuiyang Che, Zhaowei Liu, Hui Li, Shikai Guo

Motivation: S-sulfhydration, a crucial post-translational protein modification, is pivotal in cellular recognition, signaling processes, and the development and progression of cardiovascular and neurological disorders, so identifying S-sulfhydration sites is crucial for studies in cell biology. Deep learning shows high efficiency and accuracy in identifying protein sites compared to traditional methods that often lack sensitivity and specificity in accurately locating nonsulfhydration sites. Therefore, we employ deep learning methods to tackle the challenge of pinpointing S-sulfhydration sites.

Results: In this work, we introduce a deep learning approach called Sul-BertGRU, designed specifically for predicting S-sulfhydration sites in proteins, which integrates multi-directional gated recurrent unit (GRU) and BERT. First, Sul-BertGRU proposes an information entropy-enhanced BERT (IE-BERT) to preprocess protein sequences and extract initial features. Subsequently, confidence learning is employed to eliminate potential S-sulfhydration samples from the nonsulfhydration samples and select reliable negative samples. Then, considering the directional nature of the modification process, protein sequences are categorized into left, right, and full sequences centered on cysteines. We build a multi-directional GRU to enhance the extraction of directional sequence features and model the details of the enzymatic reaction involved in S-sulfhydration. Ultimately, we apply a parallel multi-head self-attention mechanism alongside a convolutional neural network to deeply analyze sequence features that might be missed at a local level. Sul-BertGRU achieves sensitivity, specificity, precision, accuracy, Matthews correlation coefficient, and area under the curve scores of 85.82%, 68.24%, 74.80%, 77.44%, 55.13%, and 77.03%, respectively. Sul-BertGRU demonstrates exceptional performance and proves to be a reliable method for predicting protein S-sulfhydration sites.

Availability and implementation: The source code and data are available at https://github.com/Severus0902/Sul-BertGRU/.

动机S-硫酸化是蛋白质翻译后的一种重要修饰,在细胞识别、信号传导过程以及心血管和神经疾病的发生和发展中起着关键作用,因此识别S-硫酸化位点对细胞生物学研究至关重要。与传统方法相比,深度学习在识别蛋白质位点方面表现出高效率和高准确性,而传统方法在准确定位非硫酸化位点方面往往缺乏灵敏性和特异性。因此,我们采用深度学习方法来应对精确定位 S-硫酸化位点的挑战:在这项工作中,我们引入了一种名为 Sul-BertGRU 的深度学习方法,该方法是专为预测蛋白质中的 S-硫酸化位点而设计的,它整合了多向门控递归单元(GRU)和 BERT。首先,Sul-BertGRU 提出了一种信息熵增强 BERT(IE-BERT)来预处理蛋白质序列并提取初始特征。随后,利用置信度学习从非硫酸化样本中剔除潜在的硫酸化样本,并选择可靠的阴性样本。然后,考虑到修饰过程的方向性,以半胱氨酸为中心将蛋白质序列分为左序列、右序列和全序列。我们建立了一个多方向 GRU,以加强对方向性序列特征的提取,并对 S-硫酸化过程中涉及的酶反应细节进行建模。最后,我们将并行多头自注意机制与卷积神经网络(CNN)结合起来,深入分析可能在局部水平上遗漏的序列特征。Sul-BertGRU 的灵敏度、特异性、精确度、准确度、马太相关系数和曲线下面积分别达到了 85.82%、68.24%、74.80%、77.44%、55.13% 和 77.03%。Sul-BertGRU表现出卓越的性能,证明是预测蛋白质S-硫水合位点的可靠方法:源代码和数据见 https://github.com/Severus0902/Sul-BertGRU/.Supplementary 信息:补充数据可在 Bioinformatics online 上获取。
{"title":"Sul-BertGRU: an ensemble deep learning method integrating information entropy-enhanced BERT and directional multi-GRU for S-sulfhydration sites prediction.","authors":"Xirun Wei, Qiao Ning, Kuiyang Che, Zhaowei Liu, Hui Li, Shikai Guo","doi":"10.1093/bioinformatics/btaf078","DOIUrl":"10.1093/bioinformatics/btaf078","url":null,"abstract":"<p><strong>Motivation: </strong>S-sulfhydration, a crucial post-translational protein modification, is pivotal in cellular recognition, signaling processes, and the development and progression of cardiovascular and neurological disorders, so identifying S-sulfhydration sites is crucial for studies in cell biology. Deep learning shows high efficiency and accuracy in identifying protein sites compared to traditional methods that often lack sensitivity and specificity in accurately locating nonsulfhydration sites. Therefore, we employ deep learning methods to tackle the challenge of pinpointing S-sulfhydration sites.</p><p><strong>Results: </strong>In this work, we introduce a deep learning approach called Sul-BertGRU, designed specifically for predicting S-sulfhydration sites in proteins, which integrates multi-directional gated recurrent unit (GRU) and BERT. First, Sul-BertGRU proposes an information entropy-enhanced BERT (IE-BERT) to preprocess protein sequences and extract initial features. Subsequently, confidence learning is employed to eliminate potential S-sulfhydration samples from the nonsulfhydration samples and select reliable negative samples. Then, considering the directional nature of the modification process, protein sequences are categorized into left, right, and full sequences centered on cysteines. We build a multi-directional GRU to enhance the extraction of directional sequence features and model the details of the enzymatic reaction involved in S-sulfhydration. Ultimately, we apply a parallel multi-head self-attention mechanism alongside a convolutional neural network to deeply analyze sequence features that might be missed at a local level. Sul-BertGRU achieves sensitivity, specificity, precision, accuracy, Matthews correlation coefficient, and area under the curve scores of 85.82%, 68.24%, 74.80%, 77.44%, 55.13%, and 77.03%, respectively. Sul-BertGRU demonstrates exceptional performance and proves to be a reliable method for predicting protein S-sulfhydration sites.</p><p><strong>Availability and implementation: </strong>The source code and data are available at https://github.com/Severus0902/Sul-BertGRU/.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11908646/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143470197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vcfexpress: flexible, rapid user-expressions to filter and format VCFs.
Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf097
Brent S Pedersen, Aaron R Quinlan

Motivation: Variant call format (VCF) files are the standard output format for various software tools that identify genetic variation from DNA sequencing experiments. Downstream analyses require the ability to query, filter, and modify them simply and efficiently. Several tools are available to perform these operations from the command line, including BCFTools, vembrane, slivar, and others.

Results: Here, we introduce vcfexpress, a new, high-performance toolset for the analysis of VCF files, written in the Rust programming language. It is nearly as fast as BCFTools, but adds functionality to execute user expressions in the lua programming language for precise filtering and reporting of variants from a VCF or BCF file. We demonstrate performance and flexibility by comparing vcfexpress to other tools using the vembrane benchmark.

Availability and implementation: vcfexpress is available under the MIT license at https://github.com/brentp/vcfexpress with code used for the manuscript deposited in https://doi.org/10.5281/zenodo.14756838.

{"title":"Vcfexpress: flexible, rapid user-expressions to filter and format VCFs.","authors":"Brent S Pedersen, Aaron R Quinlan","doi":"10.1093/bioinformatics/btaf097","DOIUrl":"10.1093/bioinformatics/btaf097","url":null,"abstract":"<p><strong>Motivation: </strong>Variant call format (VCF) files are the standard output format for various software tools that identify genetic variation from DNA sequencing experiments. Downstream analyses require the ability to query, filter, and modify them simply and efficiently. Several tools are available to perform these operations from the command line, including BCFTools, vembrane, slivar, and others.</p><p><strong>Results: </strong>Here, we introduce vcfexpress, a new, high-performance toolset for the analysis of VCF files, written in the Rust programming language. It is nearly as fast as BCFTools, but adds functionality to execute user expressions in the lua programming language for precise filtering and reporting of variants from a VCF or BCF file. We demonstrate performance and flexibility by comparing vcfexpress to other tools using the vembrane benchmark.</p><p><strong>Availability and implementation: </strong>vcfexpress is available under the MIT license at https://github.com/brentp/vcfexpress with code used for the manuscript deposited in https://doi.org/10.5281/zenodo.14756838.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11904302/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143560371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EpicPred: predicting phenotypes driven by epitope-binding TCRs using attention-based multiple instance learning.
Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf080
Jaemin Jeon, Suwan Yu, Sangam Lee, Sang Cheol Kim, Hye-Yeong Jo, Inuk Jung, Kwangsoo Kim

Motivation: Correctly identifying epitope-binding T-cell receptors (TCRs) is important to both understand their underlying biological mechanism in association to some phenotype and accordingly develop T-cell mediated immunotherapy treatments. Although the importance of the CDR3 region in TCRs for epitope recognition is well recognized, methods for profiling their interactions in association to a certain disease or phenotype remains less studied. We developed EpicPred to identify phenotype-specific TCR-epitope interactions. EpicPred first predicts and removes unlikely TCR-epitope interactions to reduce false positives using the Open-set Recognition (OSR). Subsequently, multiple instance learning was used to identify TCR-epitope interactions specific to a cancer type or severity levels of COVID-19 infected patients.

Results: From six public TCR databases, 244 552 TCR sequences and 105 unique epitopes were used to predict epitope-binding TCRs and to filter out non-epitope-binding TCRs using the OSR method. The predicted interactions were used to further predict the phenotype groups in two cancer and four COVID-19 TCR-seq datasets of both bulk and single-cell resolution. EpicPred outperformed the competing methods in predicting the phenotypes, achieving an average AUROC of 0.80 ± 0.07.

Availability and implementation: The EpicPred Software is available at https://github.com/jaeminjj/EpicPred.

{"title":"EpicPred: predicting phenotypes driven by epitope-binding TCRs using attention-based multiple instance learning.","authors":"Jaemin Jeon, Suwan Yu, Sangam Lee, Sang Cheol Kim, Hye-Yeong Jo, Inuk Jung, Kwangsoo Kim","doi":"10.1093/bioinformatics/btaf080","DOIUrl":"10.1093/bioinformatics/btaf080","url":null,"abstract":"<p><strong>Motivation: </strong>Correctly identifying epitope-binding T-cell receptors (TCRs) is important to both understand their underlying biological mechanism in association to some phenotype and accordingly develop T-cell mediated immunotherapy treatments. Although the importance of the CDR3 region in TCRs for epitope recognition is well recognized, methods for profiling their interactions in association to a certain disease or phenotype remains less studied. We developed EpicPred to identify phenotype-specific TCR-epitope interactions. EpicPred first predicts and removes unlikely TCR-epitope interactions to reduce false positives using the Open-set Recognition (OSR). Subsequently, multiple instance learning was used to identify TCR-epitope interactions specific to a cancer type or severity levels of COVID-19 infected patients.</p><p><strong>Results: </strong>From six public TCR databases, 244 552 TCR sequences and 105 unique epitopes were used to predict epitope-binding TCRs and to filter out non-epitope-binding TCRs using the OSR method. The predicted interactions were used to further predict the phenotype groups in two cancer and four COVID-19 TCR-seq datasets of both bulk and single-cell resolution. EpicPred outperformed the competing methods in predicting the phenotypes, achieving an average AUROC of 0.80 ± 0.07.</p><p><strong>Availability and implementation: </strong>The EpicPred Software is available at https://github.com/jaeminjj/EpicPred.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11879650/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143470136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COME: contrastive mapping learning for spatial reconstruction of single-cell RNA sequencing data.
Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf083
Xindian Wei, Tianyi Chen, Xibiao Wang, Wenjun Shen, Cheng Liu, Si Wu, Hau-San Wong

Motivation: Single-cell RNA sequencing (scRNA-seq) enables high-throughput transcriptomic profiling at single-cell resolution. The inherent spatial location is crucial for understanding how single cells orchestrate multicellular functions and drive diseases. However, spatial information is often lost during tissue dissociation. Spatial transcriptomic (ST) technologies can provide precise spatial gene expression atlas, while their practicality is constrained by the number of genes they can assay or the associated costs at a larger scale and the fine-grained cell-type annotation. By transferring knowledge between scRNA-seq and ST data through cell correspondence learning, it is possible to recover the spatial properties inherent in scRNA-seq datasets.

Results: In this study, we introduce COME, a COntrastive Mapping lEarning approach that learns mapping between ST and scRNA-seq data to recover the spatial information of scRNA-seq data. Extensive experiments demonstrate that the proposed COME method effectively captures precise cell-spot relationships and outperforms previous methods in recovering spatial location for scRNA-seq data. More importantly, our method is capable of precisely identifying biologically meaningful information within the data, such as the spatial structure of missing genes, spatial hierarchical patterns, and the cell-type compositions for each spot. These results indicate that the proposed COME method can help to understand the heterogeneity and activities among cells within tissue environments.

Availability and implementation: The COME is freely available in GitHub (https://github.com/cindyway/COME).

{"title":"COME: contrastive mapping learning for spatial reconstruction of single-cell RNA sequencing data.","authors":"Xindian Wei, Tianyi Chen, Xibiao Wang, Wenjun Shen, Cheng Liu, Si Wu, Hau-San Wong","doi":"10.1093/bioinformatics/btaf083","DOIUrl":"10.1093/bioinformatics/btaf083","url":null,"abstract":"<p><strong>Motivation: </strong>Single-cell RNA sequencing (scRNA-seq) enables high-throughput transcriptomic profiling at single-cell resolution. The inherent spatial location is crucial for understanding how single cells orchestrate multicellular functions and drive diseases. However, spatial information is often lost during tissue dissociation. Spatial transcriptomic (ST) technologies can provide precise spatial gene expression atlas, while their practicality is constrained by the number of genes they can assay or the associated costs at a larger scale and the fine-grained cell-type annotation. By transferring knowledge between scRNA-seq and ST data through cell correspondence learning, it is possible to recover the spatial properties inherent in scRNA-seq datasets.</p><p><strong>Results: </strong>In this study, we introduce COME, a COntrastive Mapping lEarning approach that learns mapping between ST and scRNA-seq data to recover the spatial information of scRNA-seq data. Extensive experiments demonstrate that the proposed COME method effectively captures precise cell-spot relationships and outperforms previous methods in recovering spatial location for scRNA-seq data. More importantly, our method is capable of precisely identifying biologically meaningful information within the data, such as the spatial structure of missing genes, spatial hierarchical patterns, and the cell-type compositions for each spot. These results indicate that the proposed COME method can help to understand the heterogeneity and activities among cells within tissue environments.</p><p><strong>Availability and implementation: </strong>The COME is freely available in GitHub (https://github.com/cindyway/COME).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11897431/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143485016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Jellyfish: integrative visualization of spatio-temporal tumor evolution and clonal dynamics.
Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf091
Kari Lavikka, Altti Ilari Maarala, Jaana Oikkonen, Sampsa Hautaniemi

Summary: Spatial and temporal intra-tumor heterogeneity drives tumor evolution and therapy resistance. Existing visualization tools often fail to capture both dimensions simultaneously. To address this, we developed Jellyfish, a tool that integrates phylogenetic and sample trees into a single plot, providing a holistic view of tumor evolution and capturing both spatial and temporal evolution. Available as a JavaScript library and R package, Jellyfish generates interactive visualizations from tumor phylogeny and clonal composition data. We demonstrate its ability to visualize complex subclonal dynamics using data from ovarian high-grade serous carcinoma.

Availability and implementation: Jellyfish is freely available with MIT license at https://github.com/HautaniemiLab/jellyfish (JavaScript library) and https://github.com/HautaniemiLab/jellyfisher (R package).

{"title":"Jellyfish: integrative visualization of spatio-temporal tumor evolution and clonal dynamics.","authors":"Kari Lavikka, Altti Ilari Maarala, Jaana Oikkonen, Sampsa Hautaniemi","doi":"10.1093/bioinformatics/btaf091","DOIUrl":"10.1093/bioinformatics/btaf091","url":null,"abstract":"<p><strong>Summary: </strong>Spatial and temporal intra-tumor heterogeneity drives tumor evolution and therapy resistance. Existing visualization tools often fail to capture both dimensions simultaneously. To address this, we developed Jellyfish, a tool that integrates phylogenetic and sample trees into a single plot, providing a holistic view of tumor evolution and capturing both spatial and temporal evolution. Available as a JavaScript library and R package, Jellyfish generates interactive visualizations from tumor phylogeny and clonal composition data. We demonstrate its ability to visualize complex subclonal dynamics using data from ovarian high-grade serous carcinoma.</p><p><strong>Availability and implementation: </strong>Jellyfish is freely available with MIT license at https://github.com/HautaniemiLab/jellyfish (JavaScript library) and https://github.com/HautaniemiLab/jellyfisher (R package).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11897425/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143506725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MiNEApy: enhancing enrichment network analysis in metabolic networks.
Pub Date : 2025-03-04 DOI: 10.1093/bioinformatics/btaf077
Vikash Pandey

Motivation: Modeling genome-scale metabolic networks (GEMs) helps understand metabolic fluxes in cells at a specific state under defined environmental conditions or perturbations. Elementary flux modes (EFMs) are powerful tools for simplifying complex metabolic networks into smaller, more manageable pathways. However, the enumeration of all EFMs, especially within GEMs, poses significant challenges due to computational complexity. Additionally, traditional EFM approaches often fail to capture essential aspects of metabolism, such as co-factor balancing and by-product generation. The previously developed Minimum Network Enrichment Analysis (MiNEA) method addresses these limitations by enumerating alternative minimal networks for given biomass building blocks and metabolic tasks. MiNEA facilitates a deeper understanding of metabolic task flexibility and context-specific metabolic routes by integrating condition-specific transcriptomics, proteomics, and metabolomics data. This approach offers significant improvements in the analysis of metabolic pathways, providing more comprehensive insights into cellular metabolism.

Results: Here, I present MiNEApy, a Python package reimplementation of MiNEA, which computes minimal networks and performs enrichment analysis. I demonstrate the application of MiNEApy on both a small-scale and a genome-scale model of the bacterium Escherichia coli, showcasing its ability to conduct minimal network enrichment analysis using minimal networks and context-specific data.

Availability and implementation: MiNEApy can be accessed at: https://github.com/vpandey-om/mineapy.

{"title":"MiNEApy: enhancing enrichment network analysis in metabolic networks.","authors":"Vikash Pandey","doi":"10.1093/bioinformatics/btaf077","DOIUrl":"10.1093/bioinformatics/btaf077","url":null,"abstract":"<p><strong>Motivation: </strong>Modeling genome-scale metabolic networks (GEMs) helps understand metabolic fluxes in cells at a specific state under defined environmental conditions or perturbations. Elementary flux modes (EFMs) are powerful tools for simplifying complex metabolic networks into smaller, more manageable pathways. However, the enumeration of all EFMs, especially within GEMs, poses significant challenges due to computational complexity. Additionally, traditional EFM approaches often fail to capture essential aspects of metabolism, such as co-factor balancing and by-product generation. The previously developed Minimum Network Enrichment Analysis (MiNEA) method addresses these limitations by enumerating alternative minimal networks for given biomass building blocks and metabolic tasks. MiNEA facilitates a deeper understanding of metabolic task flexibility and context-specific metabolic routes by integrating condition-specific transcriptomics, proteomics, and metabolomics data. This approach offers significant improvements in the analysis of metabolic pathways, providing more comprehensive insights into cellular metabolism.</p><p><strong>Results: </strong>Here, I present MiNEApy, a Python package reimplementation of MiNEA, which computes minimal networks and performs enrichment analysis. I demonstrate the application of MiNEApy on both a small-scale and a genome-scale model of the bacterium Escherichia coli, showcasing its ability to conduct minimal network enrichment analysis using minimal networks and context-specific data.</p><p><strong>Availability and implementation: </strong>MiNEApy can be accessed at: https://github.com/vpandey-om/mineapy.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11889450/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143476937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Bioinformatics (Oxford, England)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1