Journal of Computational Biology最新文献_第4页

Microbe Drug Association Prediction with Bernoulli Random Forests. 基于伯努利随机森林的微生物药物关联预测。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-11-01 Epub Date: 2025-09-11 DOI: 10.1177/15578666251372198

Jia Qu, Qing-Nuo Li, Zi-Hao Song, Jin-Cheng Zhao, Qing-Gang Bu, Ze-Kang Bian, Wan-Ling Xie

Due to the widespread use of antibiotics, many microbes have become drug-resistant. It is urgent to develop new antibiotics that can effectively combat drug-resistant microbes. Exploiting microbe-drug associations can help researchers make progress in drug development. In this paper, we develop for the first time a computational model of Bernoulli random forest (BRF) for microbe-drug association (BRFMDA) prediction. First, we introduced integrated drug similarity and integrated microbe similarity to construct feature of each microbe-drug pair. Second, based on known microbe-drug association, we obtained the features of all positive sample. Then, the same number of negative samples as the number of positive samples were chosen from unknown microbe-drug pairs. Next, we used a filter-based approach to reduce the dimension of features of positive and negative samples. Lastly, BRF was used to train features of positive and negative samples to predict microbe-drug associations. For validating the performance of BRFMDA, we took leave-one-out cross-validation (LOOCV) and fivefold cross-validation, as well as two types of case studies, to validate the prediction performance of BRFMDA. The results of cross-validation and case studies suggested that BRFMDA is a dependable model for predicting potential microbe-drug associations. Specifically, on the Microbe-Drug Association Database (MDAD), BRFMDA obtained an area under the curve (AUC) of 0.9134 in global LOOCV, 0.8958 in local LOOCV, and 0.8657 ± 0.0112 in fivefold cross-validation. On the abiofilm dataset, BRFMDA achieved an AUC of 0.9130 in global LOOCV, 0.8927 in local LOOCV, and 0.8844 ± 0.0137 in fivefold cross-validation.

由于抗生素的广泛使用，许多微生物产生了耐药性。开发能够有效对抗耐药微生物的新型抗生素迫在眉睫。利用微生物与药物的关联可以帮助研究人员在药物开发方面取得进展。本文首次建立了用于微生物-药物关联预测的伯努利随机森林（BRF）计算模型。首先，我们引入整合药物相似度和整合微生物相似度，构建每个微生物-药物对的特征。其次，根据已知的微生物-药物关联，得到所有阳性样品的特征。然后，从未知的微生物-药物对中选取相同数量的阴性样本和阳性样本。接下来，我们使用基于滤波器的方法来降低正样本和负样本的特征维数。最后，BRF用于训练阳性和阴性样本的特征，以预测微生物与药物的关联。为了验证BRFMDA的性能，我们采用了留一交叉验证（LOOCV）和五重交叉验证以及两种类型的案例研究来验证BRFMDA的预测性能。交叉验证和案例研究的结果表明，BRFMDA是预测潜在微生物与药物关联的可靠模型。具体而言，在微生物-药物关联数据库（MDAD）中，BRFMDA在全局LOOCV中的曲线下面积（AUC）为0.9134，在局部LOOCV中为0.8958，在五倍交叉验证中为0.8657±0.0112。在生物膜数据集上，BRFMDA在全局LOOCV上的AUC为0.9130，在局部LOOCV上的AUC为0.8927，在五重交叉验证中AUC为0.8844±0.0137。

{"title":"Microbe Drug Association Prediction with Bernoulli Random Forests.","authors":"Jia Qu, Qing-Nuo Li, Zi-Hao Song, Jin-Cheng Zhao, Qing-Gang Bu, Ze-Kang Bian, Wan-Ling Xie","doi":"10.1177/15578666251372198","DOIUrl":"10.1177/15578666251372198","url":null,"abstract":"Due to the widespread use of antibiotics, many microbes have become drug-resistant. It is urgent to develop new antibiotics that can effectively combat drug-resistant microbes. Exploiting microbe-drug associations can help researchers make progress in drug development. In this paper, we develop for the first time a computational model of Bernoulli random forest (BRF) for microbe-drug association (BRFMDA) prediction. First, we introduced integrated drug similarity and integrated microbe similarity to construct feature of each microbe-drug pair. Second, based on known microbe-drug association, we obtained the features of all positive sample. Then, the same number of negative samples as the number of positive samples were chosen from unknown microbe-drug pairs. Next, we used a filter-based approach to reduce the dimension of features of positive and negative samples. Lastly, BRF was used to train features of positive and negative samples to predict microbe-drug associations. For validating the performance of BRFMDA, we took leave-one-out cross-validation (LOOCV) and fivefold cross-validation, as well as two types of case studies, to validate the prediction performance of BRFMDA. The results of cross-validation and case studies suggested that BRFMDA is a dependable model for predicting potential microbe-drug associations. Specifically, on the Microbe-Drug Association Database (MDAD), BRFMDA obtained an area under the curve (AUC) of 0.9134 in global LOOCV, 0.8958 in local LOOCV, and 0.8657 ± 0.0112 in fivefold cross-validation. On the abiofilm dataset, BRFMDA achieved an AUC of 0.9130 in global LOOCV, 0.8927 in local LOOCV, and 0.8844 ± 0.0137 in fivefold cross-validation.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1074-1089"},"PeriodicalIF":1.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145033432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RNAS-sgRNA: Recurrent Neural Architecture Search for Detection of On-Target Effects in Single Guide RNA. RNA - sgrna：用于检测单导RNA靶效应的循环神经结构搜索。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-11-01 Epub Date: 2025-06-12 DOI: 10.1089/cmb.2025.0031

Shehla Rafiq, Assif Assad

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/Cas9 is a leading genomic editing tool, but its effectiveness is limited by considerable heterogeneity in target efficiency among different single guide RNAs (sgRNA). This study presents RNAS-sgRNA, a hybrid model that integrates neural architecture search (NAS) with recurrent neural networks (RNN) to evaluate the on-target efficacy of CRISPR/Cas9 sgRNA. The RNAS-sgRNA model automates architectural discovery, improving sgRNA sequence categorization without considerable manual adjustment. The NAS component improves the RNN architecture, which analyzes sgRNA sequences represented as binary matrices and produces a classification score. Upon evaluation across several datasets, RNAS-sgRNA exhibits substantial performance enhancements with multiple cell lines, comparing its area under the receiver operating characteristic curve (AUROC) performance to the baseline CRISPRpred(SEQ) and DeepCRISPR models. RNAS-sgRNA demonstrated substantial improvements in AUROC performance in several cell lines compared with existing models. Notable improvements include enhancements of 8.62% for HCT116, 121.57% for HEK293T, 13.40% for HeLa, and 20.78% for HL60 cell lines, resulting in an overall improvement of 13.46%. Compared with DeepCRISPR, the model achieved additional AUROC gains in all cell lines tested, with an average improvement of 14.74%. The study also highlighted the ability of the model to deliver superior performance on smaller datasets through transfer learning, underscoring its potential applications in personalized medicine and genetic research. RNAS-sgRNA introduces a novel integration of NAS with RNN to evaluate the efficacy of CRISPR/Cas9 sgRNA. Unlike traditional methods that require significant manual adjustments, this model automates architectural discovery, optimizing the RNN structure for sgRNA sequence analysis. Furthermore, the application of transfer learning to fine-tune the pretrained model on small cell-line datasets represents a pioneering approach in the domain. The model's demonstrated ability to significantly outperform existing algorithms, including CRISPRpred(SEQ) and DeepCRISPR, across multiple cell lines highlights its innovative contribution to genomic editing research and personalized medicine.

聚类规则间隔短回文重复序列(CRISPR)/Cas9是一种领先的基因组编辑工具，但其有效性受到不同单导rna （sgRNA）之间靶效率的巨大异质性的限制。本研究提出了RNAS-sgRNA，这是一种将神经结构搜索（NAS）与递归神经网络（RNN）相结合的混合模型，用于评估CRISPR/Cas9 sgRNA的靶向疗效。rna -sgRNA模型自动化了架构发现，改进了sgRNA序列分类，无需大量的人工调整。NAS组件改进了RNN架构，该架构分析以二进制矩阵表示的sgRNA序列并产生分类分数。通过对多个数据集的评估，将rna - sgrna与基线CRISPRpred（SEQ）和DeepCRISPR模型的接收器工作特征曲线（AUROC）下的面积进行比较，发现rna - sgrna在多个细胞系中表现出显著的性能增强。与现有模型相比，rna - sgrna在几种细胞系中显示出AUROC性能的显着改善。显著的改善包括HCT116增强8.62%，HEK293T增强121.57%，HeLa增强13.40%，HL60细胞系增强20.78%，总体改善13.46%。与DeepCRISPR相比，该模型在所有测试细胞系中都获得了额外的AUROC增益，平均提高了14.74%。该研究还强调了该模型通过迁移学习在较小数据集上提供卓越性能的能力，强调了其在个性化医疗和基因研究中的潜在应用。rna -sgRNA引入了一种新的NAS与RNN的整合来评估CRISPR/Cas9 sgRNA的疗效。与需要大量手动调整的传统方法不同，该模型可以自动发现体系结构，优化RNN结构以进行sgRNA序列分析。此外，迁移学习在小细胞系数据集上微调预训练模型的应用代表了该领域的一种开创性方法。该模型在多种细胞系上的表现明显优于现有算法，包括CRISPRpred（SEQ）和DeepCRISPR，这凸显了它对基因组编辑研究和个性化医疗的创新贡献。

{"title":"RNAS-sgRNA: Recurrent Neural Architecture Search for Detection of On-Target Effects in Single Guide RNA.","authors":"Shehla Rafiq, Assif Assad","doi":"10.1089/cmb.2025.0031","DOIUrl":"10.1089/cmb.2025.0031","url":null,"abstract":"Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/Cas9 is a leading genomic editing tool, but its effectiveness is limited by considerable heterogeneity in target efficiency among different single guide RNAs (sgRNA). This study presents RNAS-sgRNA, a hybrid model that integrates neural architecture search (NAS) with recurrent neural networks (RNN) to evaluate the on-target efficacy of CRISPR/Cas9 sgRNA. The RNAS-sgRNA model automates architectural discovery, improving sgRNA sequence categorization without considerable manual adjustment. The NAS component improves the RNN architecture, which analyzes sgRNA sequences represented as binary matrices and produces a classification score. Upon evaluation across several datasets, RNAS-sgRNA exhibits substantial performance enhancements with multiple cell lines, comparing its area under the receiver operating characteristic curve (AUROC) performance to the baseline CRISPRpred(SEQ) and DeepCRISPR models. RNAS-sgRNA demonstrated substantial improvements in AUROC performance in several cell lines compared with existing models. Notable improvements include enhancements of 8.62% for HCT116, 121.57% for HEK293T, 13.40% for HeLa, and 20.78% for HL60 cell lines, resulting in an overall improvement of 13.46%. Compared with DeepCRISPR, the model achieved additional AUROC gains in all cell lines tested, with an average improvement of 14.74%. The study also highlighted the ability of the model to deliver superior performance on smaller datasets through transfer learning, underscoring its potential applications in personalized medicine and genetic research. RNAS-sgRNA introduces a novel integration of NAS with RNN to evaluate the efficacy of CRISPR/Cas9 sgRNA. Unlike traditional methods that require significant manual adjustments, this model automates architectural discovery, optimizing the RNN structure for sgRNA sequence analysis. Furthermore, the application of transfer learning to fine-tune the pretrained model on small cell-line datasets represents a pioneering approach in the domain. The model's demonstrated ability to significantly outperform existing algorithms, including CRISPRpred(SEQ) and DeepCRISPR, across multiple cell lines highlights its innovative contribution to genomic editing research and personalized medicine.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1041-1059"},"PeriodicalIF":1.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144275029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Corrigendum to: CerviNet: A Novel Approach for Cervical Cancer Classification Using Pap-Smear Images. 宫颈：一种使用宫颈涂片图像进行宫颈癌分类的新方法的勘误。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-10-01 DOI: 10.1177/15578666251387096

引用次数: 0

Enhanced Interpretable Neural Network Approach for Unified Batch Effect Mitigation and Disease Classification Using Cross-Cohort Microbiome Profiles. 基于跨队列微生物组谱的统一批次效应缓解和疾病分类的增强可解释神经网络方法。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-10-01 Epub Date: 2025-08-08 DOI: 10.1177/15578666251364292

Daryl L X Fung, Mohd Wasif Khan, Carson Kai-Sang Leung, Pingzhao Hu

The oral microbiome is a complex environment that consists of diverse microorganisms inhabiting the oral cavity. There are more than 700 different species of bacteria living in the oral cavity which provides nutrition to the microorganisms living in the mouth. As samples tend to be collected with a variation in non-biological factors, batch effects will occur. Batch effects are variations in the same samples, where the variations are affected by the differences in equipment used, the time when the samples were collected, the laboratory conditions, etc. Batch effects can be difficult to address as the variation might not be apparent in individual samples but rather as a whole group between samples. Several research has been proposed to resolve the batch effect, but they tend to require a two-step approach (batch effect removal, and classification), or will suffer from dropout events in gene expressions. In this study, we propose a one-step approach that combines both the batch effect removal and disease classification, eliminating the need for a two-step approach process. LassoNet was used with batch loss to mitigate the effect of batch effect and to classify disease outcome on oral microbiome simultaneously. The model achieved better performance than our baseline models, reaching 0.8 area under the curve on average on the five studies of oral microbiome. In addition, another key aspect of using LassoNet is its ability to carry out feature importance analysis, which is capable to reveal key oral microbiomes associated with disease outcomes.

口腔微生物群是一个复杂的环境，由居住在口腔中的多种微生物组成。口腔内有700多种不同种类的细菌，这些细菌为生活在口腔中的微生物提供营养。由于样品往往是在非生物因素变化的情况下采集的，因此会产生批效应。批效应是指同一样品的变化，这种变化受使用的设备、采集样品的时间、实验室条件等差异的影响。批处理效应很难处理，因为变化可能在单个样品中不明显，而是在样品之间的整个组中。已经提出了一些研究来解决批效应，但它们往往需要两步方法（批效应去除和分类），否则将遭受基因表达中的dropout事件。在这项研究中，我们提出了一种一步方法，结合了批效应去除和疾病分类，消除了两步方法过程的需要。使用LassoNet进行批量损失，以减轻批量效应的影响，同时对口腔微生物组的疾病结局进行分类。该模型比我们的基线模型取得了更好的性能，在5项口腔微生物组研究中平均达到了0.8的曲线下面积。此外，使用LassoNet的另一个关键方面是其进行特征重要性分析的能力，这能够揭示与疾病结果相关的关键口腔微生物组。

{"title":"Enhanced Interpretable Neural Network Approach for Unified Batch Effect Mitigation and Disease Classification Using Cross-Cohort Microbiome Profiles.","authors":"Daryl L X Fung, Mohd Wasif Khan, Carson Kai-Sang Leung, Pingzhao Hu","doi":"10.1177/15578666251364292","DOIUrl":"10.1177/15578666251364292","url":null,"abstract":"The oral microbiome is a complex environment that consists of diverse microorganisms inhabiting the oral cavity. There are more than 700 different species of bacteria living in the oral cavity which provides nutrition to the microorganisms living in the mouth. As samples tend to be collected with a variation in non-biological factors, batch effects will occur. Batch effects are variations in the same samples, where the variations are affected by the differences in equipment used, the time when the samples were collected, the laboratory conditions, etc. Batch effects can be difficult to address as the variation might not be apparent in individual samples but rather as a whole group between samples. Several research has been proposed to resolve the batch effect, but they tend to require a two-step approach (batch effect removal, and classification), or will suffer from dropout events in gene expressions. In this study, we propose a one-step approach that combines both the batch effect removal and disease classification, eliminating the need for a two-step approach process. LassoNet was used with batch loss to mitigate the effect of batch effect and to classify disease outcome on oral microbiome simultaneously. The model achieved better performance than our baseline models, reaching 0.8 area under the curve on average on the five studies of oral microbiome. In addition, another key aspect of using LassoNet is its ability to carry out feature importance analysis, which is capable to reveal key oral microbiomes associated with disease outcomes.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"951-964"},"PeriodicalIF":1.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144804234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Network-Guided Sparse Subspace Clustering on Single-Cell Data. 单cell数据的网络引导稀疏子空间聚类。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-10-01 Epub Date: 2025-07-15 DOI: 10.1177/15578666251359688

Chenyang Yuan, Shunzhou Jiang, Songyun Li, Jicong Fan, Tianwei Yu

With the rapid development of single-cell RNA sequencing (scRNA-seq) technology, researchers can now investigate gene expression at the individual cell level. Identifying cell types via unsupervised clustering is a fundamental challenge in analyzing single-cell data. However, due to the high dimensionality of expression profiles, traditional clustering methods often fail to produce satisfactory results. To address this problem, we developed NetworkSSC, a network-guided sparse subspace clustering (SSC) approach. NetworkSSC operates on the same assumption as SSC that cells of the same type have gene expressions lying within the same subspace. In addition, it integrates a regularization term incorporating the gene network's Laplacian matrix, which captures functional associations between genes. Comparative analysis on nine scRNA-seq datasets shows that NetworkSSC outperforms traditional SSC and other unsupervised methods in most cases.

随着单细胞RNA测序（scRNA-seq）技术的快速发展，研究人员现在可以在单个细胞水平上研究基因表达。通过无监督聚类识别细胞类型是分析单细胞数据的一个基本挑战。然而，由于表达轮廓的高维性，传统的聚类方法往往不能产生令人满意的结果。为了解决这个问题，我们开发了NetworkSSC，一种网络引导的稀疏子空间聚类（SSC）方法。NetworkSSC与SSC基于相同的假设，即相同类型的细胞在相同的子空间中具有基因表达。此外，它集成了一个正则化项，包含基因网络的拉普拉斯矩阵，它捕获基因之间的功能关联。对9个scRNA-seq数据集的对比分析表明，在大多数情况下，NetworkSSC优于传统的SSC和其他无监督方法。

引用次数: 0

A Biologically Informed and Efficient DNA Sequence Learner for Predicting Functional Genomics Events. 一个生物学知情和有效的DNA序列学习者预测功能基因组学事件。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-10-01 Epub Date: 2025-09-24 DOI: 10.1177/15578666251382249

Mohammad Shiri, Jiangwen Sun

Elucidating the functional mechanisms underlying most associations between phenomes and genomes uncovered by genome-wide association studies remains a challenging problem. Deep neural networks that excel in feature learning from sequential data have recently emerged as promising approaches to addressing this challenge by mapping sequence patterns in DNA to functional genomic events. Despite the impressive progress made in this regard, the existing studies are largely limited to examining a type of network architecture that primarily consists of simple stacked convolutional layers of filters of a uniform size. These networks lack the consideration of specifics in the mapping of DNA sequences to functional genomic events, thereby impairing the learning efficiency of these networks. To address this problem, in this article, we propose an efficient DNA sequence learner (EDSL), a novel biologically informed architecture that (1) introduces filters of varying sizes in the first convolutional layer to enhance the learning of sequence patterns of diverse sizes and (2) utilizes dense connections to facilitate the participation of sequence patterns at varying levels in prediction. Our results regarding both synthetic data and a dataset consisting of 367 experimentally derived functional genomic profiles demonstrate the effectiveness of the proposed design choices and the superiority of the EDSL over existing networks in terms of both prediction performance and sequence pattern learning. Moreover, our ablation study indicates that both the proposed design choices enhance learning-importantly, in a differential and complementary manner.

阐明由全基因组关联研究揭示的现象和基因组之间大多数关联的功能机制仍然是一个具有挑战性的问题。深度神经网络擅长从序列数据中进行特征学习，最近通过将DNA序列模式映射到功能基因组事件，成为解决这一挑战的有希望的方法。尽管在这方面取得了令人印象深刻的进展，但现有的研究在很大程度上仅限于检查一种主要由均匀大小的简单堆叠卷积滤波器层组成的网络架构。这些网络缺乏对DNA序列映射到功能基因组事件的具体考虑，从而损害了这些网络的学习效率。为了解决这个问题，在本文中，我们提出了一种高效的DNA序列学习器（EDSL），这是一种新颖的生物信息架构，它(1)在第一卷积层引入不同大小的过滤器，以增强对不同大小的序列模式的学习；(2)利用密集连接促进不同级别的序列模式参与预测。我们关于合成数据和由367个实验导出的功能基因组图谱组成的数据集的结果表明，所提出的设计选择的有效性以及EDSL在预测性能和序列模式学习方面优于现有网络的优势。此外，我们的消融研究表明，这两种提出的设计选择都能促进学习——重要的是，以一种不同和互补的方式。

{"title":"A Biologically Informed and Efficient DNA Sequence Learner for Predicting Functional Genomics Events.","authors":"Mohammad Shiri, Jiangwen Sun","doi":"10.1177/15578666251382249","DOIUrl":"10.1177/15578666251382249","url":null,"abstract":"Elucidating the functional mechanisms underlying most associations between phenomes and genomes uncovered by genome-wide association studies remains a challenging problem. Deep neural networks that excel in feature learning from sequential data have recently emerged as promising approaches to addressing this challenge by mapping sequence patterns in DNA to functional genomic events. Despite the impressive progress made in this regard, the existing studies are largely limited to examining a type of network architecture that primarily consists of simple stacked convolutional layers of filters of a uniform size. These networks lack the consideration of specifics in the mapping of DNA sequences to functional genomic events, thereby impairing the learning efficiency of these networks. To address this problem, in this article, we propose an efficient DNA sequence learner (EDSL), a novel biologically informed architecture that (1) introduces filters of varying sizes in the first convolutional layer to enhance the learning of sequence patterns of diverse sizes and (2) utilizes dense connections to facilitate the participation of sequence patterns at varying levels in prediction. Our results regarding both synthetic data and a dataset consisting of 367 experimentally derived functional genomic profiles demonstrate the effectiveness of the proposed design choices and the superiority of the EDSL over existing networks in terms of both prediction performance and sequence pattern learning. Moreover, our ablation study indicates that both the proposed design choices enhance learning-importantly, in a differential and complementary manner.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"965-973"},"PeriodicalIF":1.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145137771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MFF-HPO: Protein-Phenotype Associations Prediction Based on Sequence Using Multi-Feature Fusion. MFF-HPO：基于多特征融合序列的蛋白质表型关联预测。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-10-01 Epub Date: 2025-06-30 DOI: 10.1089/cmb.2024.0883

Xuehua Bi, Zhuocheng Ji, Linlin Zhang, Guanglei Yu, Zhipeng Gao, Kai Zhao

Protein abnormalities disrupt various cellular and contribute to disease development. Identifying disease-associated proteins is crucial for precision medicine, but traditional methods are time-consuming and costly, necessitating computational approaches. Existing computational methods rely on manual feature engineering and fail to leverage deep features from amino acid sequences and protein structures. In this article, we propose Model for predicting protein-phenotype associations by Fusing multi-view Features (MFF-HPO), a model for predicting protein-phenotype associations by fusing multi-view features from amino acid sequences. First, we generate three-dimensional protein structure from amino acid sequence to derive contact graphs and secondary structures then integrate these with direct sequence encoding and physicochemical properties. Using a Graph Attention Network, we extract structural features from contact graphs, while deep neural networks capture global and local features from secondary structures, physicochemical properties, and sequence encoding. Finally, concatenated features are used to predict phenotype annotations. MFF-HPO outperforms state-of-the-art methods with a mean area under the precision-recall curve of 0.314 and a mean F_max of 0.371. Ablation studies confirm that multi-view feature fusion enhances predictions, and case studies validate its practicality.

蛋白质异常破坏各种细胞，促进疾病发展。识别疾病相关蛋白对精准医疗至关重要，但传统方法耗时且成本高昂，需要采用计算方法。现有的计算方法依赖于人工特征工程，无法利用氨基酸序列和蛋白质结构的深层特征。在这篇文章中，我们提出了一种通过融合氨基酸序列的多视图特征来预测蛋白质表型关联的模型（MFF-HPO）。首先，我们从氨基酸序列生成三维蛋白质结构，得到接触图和二级结构，然后将这些与直接序列编码和物理化学性质相结合。使用图注意网络，我们从接触图中提取结构特征，而深度神经网络从二级结构、物理化学性质和序列编码中捕获全局和局部特征。最后，使用串联特征来预测表型注释。MFF-HPO优于最先进的方法，在精密度-召回率曲线下的平均面积为0.314，平均Fmax为0.371。消融研究证实了多视图特征融合增强了预测能力，案例研究证实了其实用性。

{"title":"MFF-HPO: Protein-Phenotype Associations Prediction Based on Sequence Using Multi-Feature Fusion.","authors":"Xuehua Bi, Zhuocheng Ji, Linlin Zhang, Guanglei Yu, Zhipeng Gao, Kai Zhao","doi":"10.1089/cmb.2024.0883","DOIUrl":"10.1089/cmb.2024.0883","url":null,"abstract":"Protein abnormalities disrupt various cellular and contribute to disease development. Identifying disease-associated proteins is crucial for precision medicine, but traditional methods are time-consuming and costly, necessitating computational approaches. Existing computational methods rely on manual feature engineering and fail to leverage deep features from amino acid sequences and protein structures. In this article, we propose Model for predicting protein-phenotype associations by Fusing multi-view Features (MFF-HPO), a model for predicting protein-phenotype associations by fusing multi-view features from amino acid sequences. First, we generate three-dimensional protein structure from amino acid sequence to derive contact graphs and secondary structures then integrate these with direct sequence encoding and physicochemical properties. Using a Graph Attention Network, we extract structural features from contact graphs, while deep neural networks capture global and local features from secondary structures, physicochemical properties, and sequence encoding. Finally, concatenated features are used to predict phenotype annotations. MFF-HPO outperforms state-of-the-art methods with a mean area under the precision-recall curve of 0.314 and a mean Fmax of 0.371. Ablation studies confirm that multi-view feature fusion enhances predictions, and case studies validate its practicality.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"913-922"},"PeriodicalIF":1.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144528223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging an Image-Enhanced Cross-Modal Fusion Network for Radiology Report Generation. 利用图像增强的跨模态融合网络生成放射学报告。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-10-01 Epub Date: 2025-08-11 DOI: 10.1177/15578666251365959

Yi Guo, Xiaodi Hou, Zhi Liu, Yijia Zhang

Radiology report generation (RRG) tasks leverage computer-aided technology to automatically produce descriptive text reports for medical images, aiming to ease radiologists' workload, reduce misdiagnosis rates, and lessen the pressure on medical resources. However, previous works have yet to focus on enhancing feature extraction of low-quality images, incorporating cross-modal interaction information, and mitigating latency in report generation. We propose an Image-Enhanced Cross-Modal Fusion Network (IFNet) for automatic RRG to tackle these challenges. IFNet includes three key components. First, the image enhancement module enhances the detailed representation of typical and atypical structures in X-ray images, thereby boosting detection success rates. Second, the cross-modal fusion networks efficiently and comprehensively capture the interactions of cross-modal features. Finally, a more efficient transformer report generation module is designed to optimize report generation efficiency while being suitable for low-resource devices. Experimental results on public datasets IU X-ray and MIMIC-CXR demonstrate that IFNet significantly outperforms the current state-of-the-art methods.

放射学报告生成（RRG）任务利用计算机辅助技术自动生成医学图像的描述性文本报告，旨在减轻放射科医生的工作量，减少误诊率，并减轻医疗资源的压力。然而，以前的工作尚未集中在增强低质量图像的特征提取，纳入跨模式交互信息，以及减少报告生成的延迟。我们提出了一种用于自动RRG的图像增强跨模态融合网络（IFNet）来解决这些挑战。IFNet包括三个关键组件。首先，图像增强模块增强了x射线图像中典型和非典型结构的详细表示，从而提高了检测成功率。其次，跨模态融合网络高效、全面地捕获了跨模态特征之间的相互作用。最后，设计了一个更高效的变压器报表生成模块，以优化报表生成效率，同时适用于低资源设备。在公共数据集IU X-ray和MIMIC-CXR上的实验结果表明，IFNet显著优于当前最先进的方法。

{"title":"Leveraging an Image-Enhanced Cross-Modal Fusion Network for Radiology Report Generation.","authors":"Yi Guo, Xiaodi Hou, Zhi Liu, Yijia Zhang","doi":"10.1177/15578666251365959","DOIUrl":"10.1177/15578666251365959","url":null,"abstract":"Radiology report generation (RRG) tasks leverage computer-aided technology to automatically produce descriptive text reports for medical images, aiming to ease radiologists' workload, reduce misdiagnosis rates, and lessen the pressure on medical resources. However, previous works have yet to focus on enhancing feature extraction of low-quality images, incorporating cross-modal interaction information, and mitigating latency in report generation. We propose an Image-Enhanced Cross-Modal Fusion Network (IFNet) for automatic RRG to tackle these challenges. IFNet includes three key components. First, the image enhancement module enhances the detailed representation of typical and atypical structures in X-ray images, thereby boosting detection success rates. Second, the cross-modal fusion networks efficiently and comprehensively capture the interactions of cross-modal features. Finally, a more efficient transformer report generation module is designed to optimize report generation efficiency while being suitable for low-resource devices. Experimental results on public datasets IU X-ray and MIMIC-CXR demonstrate that IFNet significantly outperforms the current state-of-the-art methods.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"923-934"},"PeriodicalIF":1.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144816832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Special Issue, Part II 20th International Symposium on Bioinformatics Research and Applications (ISBRA 2024). 第20届国际生物信息学研究与应用学术研讨会（ISBRA 2024）特刊第二部分。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-10-01 Epub Date: 2025-08-20 DOI: 10.1177/15578666251371230

Zhipeng Cai, Wei Peng, Murray Patterson

引用次数: 0

CerviNet: A Novel Approach for Cervical Cancer Classification Using Pap-Smear Images. 宫颈：利用宫颈涂片图像进行宫颈癌分类的新方法。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-10-01 Epub Date: 2025-09-24 DOI: 10.1177/15578666251379909

Ashfaque Khowaja, Zou Beiji, Xiaoyan Kui

Cervical cancer is the fourth most common disease among women worldwide, and pap smear images are used as a primary diagnostic technique to detect precancerous and cancerous abnormalities in the cervix, vagina, and vulva. Deep learning algorithms have gained popularity in developing automated computer-aided diagnostic systems to solve the difficulties associated with manual assessment. This article introduces an innovative hybrid approach to effectively and accurately categorizing cervical cells. The proposed model employs advanced data enhancement techniques, including resampling to address class imbalance and augmentation (e.g., random horizontal flips and rotations) to increase dataset diversity and improve generalization. These strategies help the model handle different types of data more effectively, making it more adaptable and reliable in real-world scenarios. We use Vision Transformer's (ViT) linear projection and position embedding to change the input images into patches that can be sent to a transformer encoder. A fusion architecture is established by incorporating supplementary convolutional layers, followed by a fully connected layer, to improve the features extracted by the model. The ViT-based model is developed using pretrained weights and allows fine-tuning to address problems with cervical cancer classification efficiently. To enhance the quality of these cell images, we employ median smoothing and Gaussian filtering as preprocessing techniques. The experiment results demonstrate the proposed methodology's potential for improving the precision of cervical cancer classification. Notably, our model exhibited outstanding accuracy on the 2-state classification on the Herlev dataset and the 3-state classification on the SIPaKMeD dataset, at 98.07% and 98.08%, respectively. The model's ability to effectively categorize cervical cancer images across various datasets is evidenced by the accuracy rates specific to each dataset. This indicates the model's robustness and promise for practical clinical use.

宫颈癌是全世界妇女中第四大常见疾病，巴氏涂片图像被用作检测子宫颈、阴道和外阴癌前病变和癌性异常的主要诊断技术。深度学习算法在开发自动化计算机辅助诊断系统以解决与人工评估相关的困难方面得到了普及。本文介绍了一种创新的混合方法来有效和准确地分类宫颈细胞。该模型采用了先进的数据增强技术，包括重新采样来解决类不平衡和增强（例如随机水平翻转和旋转），以增加数据集的多样性和提高泛化。这些策略帮助模型更有效地处理不同类型的数据，使其在实际场景中更具适应性和可靠性。我们使用视觉变压器（ViT）的线性投影和位置嵌入将输入图像转换成可以发送到变压器编码器的补丁。通过引入补充卷积层和全连通层建立融合体系，改进模型提取的特征。基于vit的模型是使用预训练的权重开发的，并允许微调以有效地解决子宫颈癌分类问题。为了提高这些细胞图像的质量，我们采用中值平滑和高斯滤波作为预处理技术。实验结果表明，该方法具有提高宫颈癌分类精度的潜力。值得注意的是，我们的模型在Herlev数据集的2状态分类和SIPaKMeD数据集的3状态分类上分别表现出了98.07%和98.08%的出色准确率。该模型对不同数据集的宫颈癌图像进行有效分类的能力得到了每个数据集特定的准确率的证明。这表明该模型的稳健性和实际临床应用的前景。

{"title":"CerviNet: A Novel Approach for Cervical Cancer Classification Using Pap-Smear Images.","authors":"Ashfaque Khowaja, Zou Beiji, Xiaoyan Kui","doi":"10.1177/15578666251379909","DOIUrl":"10.1177/15578666251379909","url":null,"abstract":"Cervical cancer is the fourth most common disease among women worldwide, and pap smear images are used as a primary diagnostic technique to detect precancerous and cancerous abnormalities in the cervix, vagina, and vulva. Deep learning algorithms have gained popularity in developing automated computer-aided diagnostic systems to solve the difficulties associated with manual assessment. This article introduces an innovative hybrid approach to effectively and accurately categorizing cervical cells. The proposed model employs advanced data enhancement techniques, including resampling to address class imbalance and augmentation (e.g., random horizontal flips and rotations) to increase dataset diversity and improve generalization. These strategies help the model handle different types of data more effectively, making it more adaptable and reliable in real-world scenarios. We use Vision Transformer's (ViT) linear projection and position embedding to change the input images into patches that can be sent to a transformer encoder. A fusion architecture is established by incorporating supplementary convolutional layers, followed by a fully connected layer, to improve the features extracted by the model. The ViT-based model is developed using pretrained weights and allows fine-tuning to address problems with cervical cancer classification efficiently. To enhance the quality of these cell images, we employ median smoothing and Gaussian filtering as preprocessing techniques. The experiment results demonstrate the proposed methodology's potential for improving the precision of cervical cancer classification. Notably, our model exhibited outstanding accuracy on the 2-state classification on the Herlev dataset and the 3-state classification on the SIPaKMeD dataset, at 98.07% and 98.08%, respectively. The model's ability to effectively categorize cervical cancer images across various datasets is evidenced by the accuracy rates specific to each dataset. This indicates the model's robustness and promise for practical clinical use.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"974-985"},"PeriodicalIF":1.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145131152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0