Journal of Computational Biology最新文献

英文中文

Biogeography-Based Multi-Objective Discrete Optimization with Constraints. 基于生物地理学的约束多目标离散优化。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-09-01 Epub Date: 2025-06-16 DOI: 10.1089/cmb.2024.0931

Leyi Hu, Xuan Liu, Xiangyu Qu, Chenyan Wang, Bingmeng Hu, Jieyao Wei

Biogeography-based optimization (BBO) is an intelligent evolutionary algorithm based on biological populations, increasing the optimization search ability by adaptive migration operation. However, the original BBO is only feasible for continuous optimization with single-objective optimization, instead of more complex optimization problems, such as discrete and multi-objective optimization problems. Therefore, in this article, we propose the improved BBO algorithm to solve multi-objective discrete optimization problem with multiple constraints. We define the decision matrix, objective vector to fit variables and objective functions of the multi-objective discrete optimization problem, and define the ideal point and utility function so that different candidate solutions can be judged according to a metric. We propose similarity threshold, repeatability threshold, cost threshold, and stagnation threshold to make the proposed algorithm improve the diversity of search solutions and give consideration to convergence. Moreover, we conduct a case study on the NP-hard problem of composite functions, and the experimental results verify the effectiveness and efficiency of our approach.

基于生物地理的优化算法（BBO）是一种基于生物种群的智能进化算法，通过自适应迁移操作来提高优化搜索能力。但是，原来的BBO只适用于单目标优化的连续优化，而不适用于更复杂的优化问题，如离散和多目标优化问题。因此，在本文中，我们提出了改进的BBO算法来解决多约束下的多目标离散优化问题。定义了多目标离散优化问题的决策矩阵、拟合变量的目标向量和目标函数，并定义了理想点和效用函数，以便根据度量来判断不同的候选解。我们提出了相似阈值、可重复性阈值、代价阈值和停滞阈值，使所提算法提高了搜索解的多样性并兼顾了收敛性。此外，我们还对复合函数的NP-hard问题进行了实例研究，实验结果验证了我们方法的有效性和效率。

{"title":"Biogeography-Based Multi-Objective Discrete Optimization with Constraints.","authors":"Leyi Hu, Xuan Liu, Xiangyu Qu, Chenyan Wang, Bingmeng Hu, Jieyao Wei","doi":"10.1089/cmb.2024.0931","DOIUrl":"10.1089/cmb.2024.0931","url":null,"abstract":"Biogeography-based optimization (BBO) is an intelligent evolutionary algorithm based on biological populations, increasing the optimization search ability by adaptive migration operation. However, the original BBO is only feasible for continuous optimization with single-objective optimization, instead of more complex optimization problems, such as discrete and multi-objective optimization problems. Therefore, in this article, we propose the improved BBO algorithm to solve multi-objective discrete optimization problem with multiple constraints. We define the decision matrix, objective vector to fit variables and objective functions of the multi-objective discrete optimization problem, and define the ideal point and utility function so that different candidate solutions can be judged according to a metric. We propose similarity threshold, repeatability threshold, cost threshold, and stagnation threshold to make the proposed algorithm improve the diversity of search solutions and give consideration to convergence. Moreover, we conduct a case study on the NP-hard problem of composite functions, and the experimental results verify the effectiveness and efficiency of our approach.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"896-910"},"PeriodicalIF":1.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144302206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Counterfactual Debiased Co-Embedding Model for Enhanced Drug-Drug Interaction Prediction. 增强药物-药物相互作用预测的反事实去偏见共嵌入模型。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-09-01 Epub Date: 2025-06-13 DOI: 10.1089/cmb.2024.0882

Xue Pan, Chunping Ouyang, Linlin Zhang, Yongbin Liu, Ying Yu

Predicting drug-drug interactions (DDIs) is critical to drug discovery and development because adverse interactions pose serious health risks. Most of the existing studies utilize the properties of drugs or network topology information of DDIs to predict unknown interactions between drugs. However, DDI networks are usually sparse with insufficient interaction information, and these approaches lack in-depth integration of these two types of information to effectively exploit potential associations between DDI network nodes and properties. In this work, we present a novel co-embedding model, counterfactual debiased co-embedding (CDCE), for counterfactual-based analyses. The model mitigates the effects of sparse networks and information embedding loss through counterfactual debiasing without losing the original information. In addition, we fuse two attribute information, Anatomical Therapeutic Chemical (ATC) code and Simplified Molecular Input Line Entry System (SMILES), from different perspectives. The implicit information obtained from the ATC code is embedded into the DDI network and then fused with SMILES through the variational graph autoencoder model. We validated CDCE on the benchmark dataset BioSNAP, with experimental results showing that it outperforms state-of-the-art methods.

预测药物-药物相互作用（ddi）对药物发现和开发至关重要，因为不良相互作用会造成严重的健康风险。现有的研究大多利用药物的性质或ddi的网络拓扑信息来预测药物间的未知相互作用。然而，DDI网络通常是稀疏的，交互信息不足，这些方法缺乏对这两类信息的深度集成，无法有效地利用DDI网络节点和属性之间的潜在关联。在这项工作中，我们提出了一种新的共嵌入模型，反事实去偏见共嵌入（CDCE），用于基于反事实的分析。该模型在不丢失原始信息的情况下，通过反事实去偏来减轻稀疏网络的影响和信息嵌入损失。此外，我们从不同的角度融合了解剖治疗化学（ATC）代码和简化分子输入线输入系统（SMILES）这两个属性信息。将ATC编码的隐式信息嵌入到DDI网络中，然后通过变分图自编码器模型与SMILES融合。我们在基准数据集BioSNAP上验证了CDCE，实验结果表明它优于最先进的方法。

{"title":"Counterfactual Debiased Co-Embedding Model for Enhanced Drug-Drug Interaction Prediction.","authors":"Xue Pan, Chunping Ouyang, Linlin Zhang, Yongbin Liu, Ying Yu","doi":"10.1089/cmb.2024.0882","DOIUrl":"10.1089/cmb.2024.0882","url":null,"abstract":"Predicting drug-drug interactions (DDIs) is critical to drug discovery and development because adverse interactions pose serious health risks. Most of the existing studies utilize the properties of drugs or network topology information of DDIs to predict unknown interactions between drugs. However, DDI networks are usually sparse with insufficient interaction information, and these approaches lack in-depth integration of these two types of information to effectively exploit potential associations between DDI network nodes and properties. In this work, we present a novel co-embedding model, counterfactual debiased co-embedding (CDCE), for counterfactual-based analyses. The model mitigates the effects of sparse networks and information embedding loss through counterfactual debiasing without losing the original information. In addition, we fuse two attribute information, Anatomical Therapeutic Chemical (ATC) code and Simplified Molecular Input Line Entry System (SMILES), from different perspectives. The implicit information obtained from the ATC code is embedded into the DDI network and then fused with SMILES through the variational graph autoencoder model. We validated CDCE on the benchmark dataset BioSNAP, with experimental results showing that it outperforms state-of-the-art methods.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"838-849"},"PeriodicalIF":1.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144289463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Special Issue, Part I 20th International Symposium on Bioinformatics Research and Applications (ISBRA 2024). 第20届国际生物信息学研究与应用学术研讨会（ISBRA 2024）特刊。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-09-01 Epub Date: 2025-08-06 DOI: 10.1177/15578666251366449

Zhipeng Cai, Wei Peng, Murray Patterson

引用次数: 0

A Innovative Strategy for Identifying Subtypes Through the Analysis of Multi-Omics Data with Adversarial Autoencoders. 一种利用对抗性自编码器分析多组学数据识别亚型的创新策略。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-09-01 Epub Date: 2025-06-26 DOI: 10.1089/cmb.2024.0927

Xia Chen, Hao Nie, Quanwei Chen, Xiang Zhang, Zixing He, Xiuxiu Chao, Weihao Ou, Xiangzheng Fu, Haowen Chen

Cancer is a disease that is both complex and diverse, and effective diagnosis and treatment require an accurate depiction of tumor subtypes. Traditional methods of cancer identification, which rely on clinical and histopathological criteria, have limitations in identifying key molecular subtypes. With the advancement of high-throughput genomics technologies, the field of cancer research has undergone a transformation, enabling detailed analysis of tumor molecular characteristics on a large scale. The integration of multiple types of genomic data is expected to provide a more comprehensive understanding of the molecular mechanisms of cancer and to promote the discovery of new diagnostic and therapeutic targets. However, achieving this requires the development of new computational techniques. In order to facilitate more efficient feature extraction and dimensionality reduction of multi-omics data, we present MultiDAAE (Multi-omics Double Adversarial Autoencoder), a novel technique that combines autoencoders with two discriminators to form two generative adversarial networks. On several cancer datasets, our method shows outstanding clustering performance when compared to state-of-the-art techniques. To sum up, MultiDAAE can help identify possible molecular pathways and provide information for the development of tailored cancer treatments.

癌症是一种既复杂又多样的疾病，有效的诊断和治疗需要对肿瘤亚型的准确描述。传统的癌症鉴定方法依赖于临床和组织病理学标准，在鉴定关键分子亚型方面存在局限性。随着高通量基因组学技术的进步，癌症研究领域发生了转变，可以大规模地详细分析肿瘤分子特征。多种基因组数据的整合有望提供对癌症分子机制更全面的认识，并促进新的诊断和治疗靶点的发现。然而，实现这一目标需要开发新的计算技术。为了更有效地提取多组数据的特征和降维，我们提出了MultiDAAE (multi-omics Double Adversarial Autoencoder)，这是一种将自编码器与两个鉴别器结合在一起形成两个生成式对抗网络的新技术。在几个癌症数据集上，与最先进的技术相比，我们的方法显示出出色的聚类性能。综上所述，MultiDAAE可以帮助识别可能的分子途径，并为开发量身定制的癌症治疗提供信息。

{"title":"A Innovative Strategy for Identifying Subtypes Through the Analysis of Multi-Omics Data with Adversarial Autoencoders.","authors":"Xia Chen, Hao Nie, Quanwei Chen, Xiang Zhang, Zixing He, Xiuxiu Chao, Weihao Ou, Xiangzheng Fu, Haowen Chen","doi":"10.1089/cmb.2024.0927","DOIUrl":"10.1089/cmb.2024.0927","url":null,"abstract":"Cancer is a disease that is both complex and diverse, and effective diagnosis and treatment require an accurate depiction of tumor subtypes. Traditional methods of cancer identification, which rely on clinical and histopathological criteria, have limitations in identifying key molecular subtypes. With the advancement of high-throughput genomics technologies, the field of cancer research has undergone a transformation, enabling detailed analysis of tumor molecular characteristics on a large scale. The integration of multiple types of genomic data is expected to provide a more comprehensive understanding of the molecular mechanisms of cancer and to promote the discovery of new diagnostic and therapeutic targets. However, achieving this requires the development of new computational techniques. In order to facilitate more efficient feature extraction and dimensionality reduction of multi-omics data, we present MultiDAAE (Multi-omics Double Adversarial Autoencoder), a novel technique that combines autoencoders with two discriminators to form two generative adversarial networks. On several cancer datasets, our method shows outstanding clustering performance when compared to state-of-the-art techniques. To sum up, MultiDAAE can help identify possible molecular pathways and provide information for the development of tailored cancer treatments.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"879-895"},"PeriodicalIF":1.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144496836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multichannel Contribution Aware Network for Prostate Cancer Grading in Histopathology Images. 组织病理图像中前列腺癌分级的多通道贡献感知网络。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-09-01 Epub Date: 2025-03-28 DOI: 10.1089/cmb.2024.0872

Junlai Qiu, Qingfeng Chen, Wei Lan, Junyue Cao

Gleason grading of prostate histopathology images is widely used by pathologists for diagnosis and prognosis. Spatial characteristics of cell and tissues through staining images is essential for accurate grading of prostate cancer. Although considerable efforts have been made to train grading models, they mainly rely on basic preprocessed images and largely overlook the intricate multiple staining aspects of histopathology images that are crucial for spatial information capture. This article proposes a novel deep learning model for automated prostate cancer grading by integrating several staining characteristics. Image deconvolution is applied to separate the multiple staining channels in the histopathology image, thereby enabling the model to identify effective feature information. A channel and pixel attention-based encoder is designed to extract cell and tissue structure information from multiple staining channel images. We propose a dual-branch decoder, where the classical convolutional neural network branch specializes in local feature extraction and the Transformer branch focuses on global feature extraction, to effectively fuse and refine features from different staining channels. Taking full advantage of the complementarity of multiple staining channels makes the features more compact and discriminative, leading to precise grading. Extensive experiments on relevant public datasets demonstrate the effectiveness and scalability of the proposed model.

前列腺组织病理图像的Gleason分级被病理学家广泛用于诊断和预后。通过染色图像了解细胞和组织的空间特征对于前列腺癌的准确分级至关重要。尽管在训练分级模型方面已经做出了相当大的努力，但它们主要依赖于基本的预处理图像，并且在很大程度上忽略了组织病理学图像中复杂的多重染色方面，而这些方面对于空间信息捕获至关重要。本文提出了一种新的深度学习模型，通过整合几个染色特征来实现前列腺癌的自动分级。利用图像反卷积分离组织病理图像中的多个染色通道，从而使模型能够识别有效的特征信息。设计了一种基于通道和像素注意力的编码器，用于从多个染色通道图像中提取细胞和组织结构信息。我们提出了一种双分支解码器，其中经典卷积神经网络分支专注于局部特征提取，而变压器分支专注于全局特征提取，以有效地融合和提炼来自不同染色通道的特征。充分利用多个染色通道的互补性，使特征更加紧凑和区分，从而精确分级。在相关公共数据集上的大量实验证明了该模型的有效性和可扩展性。

{"title":"Multichannel Contribution Aware Network for Prostate Cancer Grading in Histopathology Images.","authors":"Junlai Qiu, Qingfeng Chen, Wei Lan, Junyue Cao","doi":"10.1089/cmb.2024.0872","DOIUrl":"10.1089/cmb.2024.0872","url":null,"abstract":"Gleason grading of prostate histopathology images is widely used by pathologists for diagnosis and prognosis. Spatial characteristics of cell and tissues through staining images is essential for accurate grading of prostate cancer. Although considerable efforts have been made to train grading models, they mainly rely on basic preprocessed images and largely overlook the intricate multiple staining aspects of histopathology images that are crucial for spatial information capture. This article proposes a novel deep learning model for automated prostate cancer grading by integrating several staining characteristics. Image deconvolution is applied to separate the multiple staining channels in the histopathology image, thereby enabling the model to identify effective feature information. A channel and pixel attention-based encoder is designed to extract cell and tissue structure information from multiple staining channel images. We propose a dual-branch decoder, where the classical convolutional neural network branch specializes in local feature extraction and the Transformer branch focuses on global feature extraction, to effectively fuse and refine features from different staining channels. Taking full advantage of the complementarity of multiple staining channels makes the features more compact and discriminative, leading to precise grading. Extensive experiments on relevant public datasets demonstrate the effectiveness and scalability of the proposed model.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"826-837"},"PeriodicalIF":1.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143730222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Effloc: An Efficient Locating Algorithm for Mass-Occurrence Biological Patterns with FM-Index. Effloc：一种基于fm指数的大规模生物模式的有效定位算法。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-09-01 Epub Date: 2025-05-02 DOI: 10.1089/cmb.2024.0925

Li-Lu Guo

Pattern locating is a crucial step in various biological sequence analysis tasks. As a compressed full-text indexing technology, full-text minute-space index has been introduced for biological pattern locating over ultra-long genomes, with a low memory footprint and retrieving time independent of genome size. However, its locating time is limited by the number of occurrences of the biological pattern in the genome, and it is not efficient enough when dealing with mass-occurrence biological patterns. To solve this problem, we propose an efficient locating algorithm for mass-occurrence biological patterns in genomic sequence, namely Effloc. It is developed on two optimization techniques. One is that rankings with the same Burrows-Wheeler Transform character are organized into a group and calculated together, thereby reducing the number of last-to-first column (LF) mapping operations required to jump forward to find suffix array (SA) sampling points; the other is to design a specific structure to record the jump status, thus avoiding the redundant LF mapping operations that exist in the process of finding SA sampling points for those adjacent patterns that share the same sampling point. Compared with the existing algorithm, Effloc can significantly reduce the number of time-consuming LF mapping operations in mass-occurrence pattern locating. Ablation experiments verified our algorithm's effectiveness, exhibiting faster locating speed compared with five state-of-the-art competing algorithms. The source code and data are released at https://github.com/Lilu-guo/Effloc.

模式定位是各种生物序列分析任务的关键步骤。全文分钟空间索引是一种压缩全文索引技术，用于超长基因组的生物模式定位，具有内存占用少、检索时间与基因组大小无关的特点。然而，它的定位时间受到该生物模式在基因组中出现次数的限制，在处理大量出现的生物模式时效率不够高。为了解决这一问题，我们提出了一种高效的基因组序列中大量出现的生物模式定位算法——Effloc。它是在两种优化技术的基础上发展起来的。一是将具有相同Burrows-Wheeler变换特征的排名组织成一组并一起计算，从而减少了前跳查找后缀数组（SA）采样点所需的从最后到第一列（LF）映射操作的次数；二是设计一个特定的结构来记录跳转状态，从而避免了为共享同一采样点的相邻模式寻找SA采样点过程中存在的冗余LF映射操作。与现有算法相比，Effloc可以显著减少大量发生模式定位中耗时的LF映射操作。消融实验验证了该算法的有效性，与五种最先进的竞争算法相比，显示出更快的定位速度。源代码和数据发布在https://github.com/Lilu-guo/Effloc。

{"title":"Effloc: An Efficient Locating Algorithm for Mass-Occurrence Biological Patterns with FM-Index.","authors":"Li-Lu Guo","doi":"10.1089/cmb.2024.0925","DOIUrl":"10.1089/cmb.2024.0925","url":null,"abstract":"Pattern locating is a crucial step in various biological sequence analysis tasks. As a compressed full-text indexing technology, full-text minute-space index has been introduced for biological pattern locating over ultra-long genomes, with a low memory footprint and retrieving time independent of genome size. However, its locating time is limited by the number of occurrences of the biological pattern in the genome, and it is not efficient enough when dealing with mass-occurrence biological patterns. To solve this problem, we propose an efficient locating algorithm for mass-occurrence biological patterns in genomic sequence, namely Effloc. It is developed on two optimization techniques. One is that rankings with the same Burrows-Wheeler Transform character are organized into a group and calculated together, thereby reducing the number of last-to-first column (LF) mapping operations required to jump forward to find suffix array (SA) sampling points; the other is to design a specific structure to record the jump status, thus avoiding the redundant LF mapping operations that exist in the process of finding SA sampling points for those adjacent patterns that share the same sampling point. Compared with the existing algorithm, Effloc can significantly reduce the number of time-consuming LF mapping operations in mass-occurrence pattern locating. Ablation experiments verified our algorithm's effectiveness, exhibiting faster locating speed compared with five state-of-the-art competing algorithms. The source code and data are released at https://github.com/Lilu-guo/Effloc.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"865-878"},"PeriodicalIF":1.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144009455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VTrans: A VAE-Based Pre-Trained Transformer Method for Microbiome Data Analysis. VTrans：一种基于vae的微生物组数据分析预训练变压器方法。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-09-01 Epub Date: 2025-04-28 DOI: 10.1089/cmb.2024.0884

Xinyuan Shi, Fangfang Zhu, Wenwen Min

Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580.

预测生存结果和评估患者风险在了解不同癌症阶段的微生物组成方面起着关键作用。随着深度学习的不断进步，已经证实深度学习具有基于微生物数据分析患者生存风险的潜力。然而，单个癌症数据集面临的一个共同挑战是有限的样本量和特征空间的高维。这种困境经常导致深度学习模型中的过拟合问题，阻碍了它们有效提取深度数据表示的能力，并导致模型性能次优。为了克服这些挑战，我们提倡使用预训练和微调策略，这些策略已被证明在解决单个癌症数据集中样本量较小的限制方面是有效的。在这项研究中，我们提出了一个深度学习模型，该模型结合了变压器编码器和变分自编码器（VAE）， VTrans，采用预训练和微调策略，利用微生物数据预测癌症患者的生存风险。此外，我们强调了扩展VTrans以整合微生物多组学数据的潜力。我们的方法在来自癌症基因组图谱计划的三个不同的癌症数据集上进行了评估，研究结果表明：(1)与传统机器学习和其他深度学习模型相比，VTrans在性能方面表现出色。(2)预训练的使用显著提高了其性能。(3)与位置编码相比，采用VAE编码在丰富数据表示方面更为有效。(4)利用显著性图的思想，可以观察到哪些微生物对分类结果的贡献较大。这些结果证明了VTrans在预测患者生存风险方面的有效性。本文中使用的源代码和所有数据集可在https://github.com/wenwenmin/VTrans和https://doi.org/10.5281/zenodo.14166580上获得。

{"title":"VTrans: A VAE-Based Pre-Trained Transformer Method for Microbiome Data Analysis.","authors":"Xinyuan Shi, Fangfang Zhu, Wenwen Min","doi":"10.1089/cmb.2024.0884","DOIUrl":"10.1089/cmb.2024.0884","url":null,"abstract":"Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580.","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"850-864"},"PeriodicalIF":1.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144002934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rebuttal to Flaws in the Paper 'Nearly Instantaneous Time-Varying Reproduction Number for Contagious Diseases-a Direct Approach Based on Nonlinear Regression'. 对“传染病近瞬时时变繁殖数——一种基于非线性回归的直接方法”论文缺陷的反驳。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-08-01 DOI: 10.1089/cmb.2025.0024

Jūratė Šaltytė Benth, Fred Espen Benth, Espen Rostrup Nakstad

引用次数: 0

Flaws in the Article "Nearly Instantaneous Time-Varying Reproduction Number for Contagious Diseases-a Direct Approach Based on Nonlinear Regression". “传染病的近瞬时时变繁殖数——一种基于非线性回归的直接方法”一文的缺陷。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-08-01 DOI: 10.1177/15578666251360613

Geir Storvik, Solveig Engebretsen, Birgitte Freiesleben de Blasio, Arnoldo Frigessi

引用次数: 0

Filtering for Highly Variable Genes and High-Quality Spots Improves Phylogenetic Analysis of Cancer Spatial Transcriptomics Visium Data. 高可变基因和高质量位点的过滤改进了癌症空间转录组学的系统发育分析。

IF 1.6 4区生物学 Q4 BIOCHEMICAL RESEARCH METHODS

Journal of Computational Biology

Pub Date : 2025-08-01 Epub Date: 2025-06-11 DOI: 10.1089/cmb.2024.0614

Alexandra Sasha Gavryushkina, Holly R Pinkney, Sarah D Diermeier, Alex Gavryushkin

Phylogenetic relationship of cells within tumors can help us to understand how cancer develops in space and time and identify driver mutations and other evolutionary events that enable cancer growth and spread. Numerous studies have reconstructed phylogenies from single-cell DNA-seq data. Here, we are looking into the problem of phylogenetic analysis of spatially resolved near single-cell RNA-seq data, which is a cost-efficient alternative (or complementary) data source that integrates multiple sources of evolutionary information, including point mutations, copy number changes, and epimutations. Recent attempts to use such data, although promising, raised many methodological challenges. Here, we explored data preprocessing and modeling approaches for evolutionary analyses of Visium spatial transcriptomics data. We conclude that using only highly variable genes and accounting for heterogeneous RNA capture across tissue-covered spots improves the reconstructed topological relationships and influences estimated branch lengths.

肿瘤内细胞的系统发育关系可以帮助我们了解癌症在空间和时间上是如何发展的，并识别驱动突变和其他使癌症生长和扩散的进化事件。许多研究已经从单细胞DNA-seq数据重建系统发育。在这里，我们正在研究空间解决的近单细胞RNA-seq数据的系统发育分析问题，这是一个具有成本效益的替代（或补充）数据源，集成了多种进化信息源，包括点突变，拷贝数变化和演化。最近使用这些数据的尝试虽然很有希望，但也提出了许多方法上的挑战。在此，我们探索了Visium空间转录组学数据进化分析的数据预处理和建模方法。我们的结论是，仅使用高度可变的基因并考虑跨组织覆盖点的异质RNA捕获，可以改善重建的拓扑关系并影响估计的分支长度。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Computational Biology

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀