Biogeography-based optimization (BBO) is an intelligent evolutionary algorithm based on biological populations, increasing the optimization search ability by adaptive migration operation. However, the original BBO is only feasible for continuous optimization with single-objective optimization, instead of more complex optimization problems, such as discrete and multi-objective optimization problems. Therefore, in this article, we propose the improved BBO algorithm to solve multi-objective discrete optimization problem with multiple constraints. We define the decision matrix, objective vector to fit variables and objective functions of the multi-objective discrete optimization problem, and define the ideal point and utility function so that different candidate solutions can be judged according to a metric. We propose similarity threshold, repeatability threshold, cost threshold, and stagnation threshold to make the proposed algorithm improve the diversity of search solutions and give consideration to convergence. Moreover, we conduct a case study on the NP-hard problem of composite functions, and the experimental results verify the effectiveness and efficiency of our approach.
{"title":"Biogeography-Based Multi-Objective Discrete Optimization with Constraints.","authors":"Leyi Hu, Xuan Liu, Xiangyu Qu, Chenyan Wang, Bingmeng Hu, Jieyao Wei","doi":"10.1089/cmb.2024.0931","DOIUrl":"10.1089/cmb.2024.0931","url":null,"abstract":"<p><p>Biogeography-based optimization (BBO) is an intelligent evolutionary algorithm based on biological populations, increasing the optimization search ability by adaptive migration operation. However, the original BBO is only feasible for continuous optimization with single-objective optimization, instead of more complex optimization problems, such as discrete and multi-objective optimization problems. Therefore, in this article, we propose the improved BBO algorithm to solve multi-objective discrete optimization problem with multiple constraints. We define the decision matrix, objective vector to fit variables and objective functions of the multi-objective discrete optimization problem, and define the ideal point and utility function so that different candidate solutions can be judged according to a metric. We propose similarity threshold, repeatability threshold, cost threshold, and stagnation threshold to make the proposed algorithm improve the diversity of search solutions and give consideration to convergence. Moreover, we conduct a case study on the NP-hard problem of composite functions, and the experimental results verify the effectiveness and efficiency of our approach.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"896-910"},"PeriodicalIF":1.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144302206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Predicting drug-drug interactions (DDIs) is critical to drug discovery and development because adverse interactions pose serious health risks. Most of the existing studies utilize the properties of drugs or network topology information of DDIs to predict unknown interactions between drugs. However, DDI networks are usually sparse with insufficient interaction information, and these approaches lack in-depth integration of these two types of information to effectively exploit potential associations between DDI network nodes and properties. In this work, we present a novel co-embedding model, counterfactual debiased co-embedding (CDCE), for counterfactual-based analyses. The model mitigates the effects of sparse networks and information embedding loss through counterfactual debiasing without losing the original information. In addition, we fuse two attribute information, Anatomical Therapeutic Chemical (ATC) code and Simplified Molecular Input Line Entry System (SMILES), from different perspectives. The implicit information obtained from the ATC code is embedded into the DDI network and then fused with SMILES through the variational graph autoencoder model. We validated CDCE on the benchmark dataset BioSNAP, with experimental results showing that it outperforms state-of-the-art methods.
{"title":"Counterfactual Debiased Co-Embedding Model for Enhanced Drug-Drug Interaction Prediction.","authors":"Xue Pan, Chunping Ouyang, Linlin Zhang, Yongbin Liu, Ying Yu","doi":"10.1089/cmb.2024.0882","DOIUrl":"10.1089/cmb.2024.0882","url":null,"abstract":"<p><p>Predicting drug-drug interactions (DDIs) is critical to drug discovery and development because adverse interactions pose serious health risks. Most of the existing studies utilize the properties of drugs or network topology information of DDIs to predict unknown interactions between drugs. However, DDI networks are usually sparse with insufficient interaction information, and these approaches lack in-depth integration of these two types of information to effectively exploit potential associations between DDI network nodes and properties. In this work, we present a novel co-embedding model, counterfactual debiased co-embedding (CDCE), for counterfactual-based analyses. The model mitigates the effects of sparse networks and information embedding loss through counterfactual debiasing without losing the original information. In addition, we fuse two attribute information, Anatomical Therapeutic Chemical (ATC) code and Simplified Molecular Input Line Entry System (SMILES), from different perspectives. The implicit information obtained from the ATC code is embedded into the DDI network and then fused with SMILES through the variational graph autoencoder model. We validated CDCE on the benchmark dataset BioSNAP, with experimental results showing that it outperforms state-of-the-art methods.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"838-849"},"PeriodicalIF":1.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144289463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-08-06DOI: 10.1177/15578666251366449
Zhipeng Cai, Wei Peng, Murray Patterson
{"title":"<i>Special Issue, Part I</i> 20th International Symposium on Bioinformatics Research and Applications (ISBRA 2024).","authors":"Zhipeng Cai, Wei Peng, Murray Patterson","doi":"10.1177/15578666251366449","DOIUrl":"10.1177/15578666251366449","url":null,"abstract":"","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"825"},"PeriodicalIF":1.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144789288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cancer is a disease that is both complex and diverse, and effective diagnosis and treatment require an accurate depiction of tumor subtypes. Traditional methods of cancer identification, which rely on clinical and histopathological criteria, have limitations in identifying key molecular subtypes. With the advancement of high-throughput genomics technologies, the field of cancer research has undergone a transformation, enabling detailed analysis of tumor molecular characteristics on a large scale. The integration of multiple types of genomic data is expected to provide a more comprehensive understanding of the molecular mechanisms of cancer and to promote the discovery of new diagnostic and therapeutic targets. However, achieving this requires the development of new computational techniques. In order to facilitate more efficient feature extraction and dimensionality reduction of multi-omics data, we present MultiDAAE (Multi-omics Double Adversarial Autoencoder), a novel technique that combines autoencoders with two discriminators to form two generative adversarial networks. On several cancer datasets, our method shows outstanding clustering performance when compared to state-of-the-art techniques. To sum up, MultiDAAE can help identify possible molecular pathways and provide information for the development of tailored cancer treatments.
{"title":"A Innovative Strategy for Identifying Subtypes Through the Analysis of Multi-Omics Data with Adversarial Autoencoders.","authors":"Xia Chen, Hao Nie, Quanwei Chen, Xiang Zhang, Zixing He, Xiuxiu Chao, Weihao Ou, Xiangzheng Fu, Haowen Chen","doi":"10.1089/cmb.2024.0927","DOIUrl":"10.1089/cmb.2024.0927","url":null,"abstract":"<p><p>Cancer is a disease that is both complex and diverse, and effective diagnosis and treatment require an accurate depiction of tumor subtypes. Traditional methods of cancer identification, which rely on clinical and histopathological criteria, have limitations in identifying key molecular subtypes. With the advancement of high-throughput genomics technologies, the field of cancer research has undergone a transformation, enabling detailed analysis of tumor molecular characteristics on a large scale. The integration of multiple types of genomic data is expected to provide a more comprehensive understanding of the molecular mechanisms of cancer and to promote the discovery of new diagnostic and therapeutic targets. However, achieving this requires the development of new computational techniques. In order to facilitate more efficient feature extraction and dimensionality reduction of multi-omics data, we present MultiDAAE (Multi-omics Double Adversarial Autoencoder), a novel technique that combines autoencoders with two discriminators to form two generative adversarial networks. On several cancer datasets, our method shows outstanding clustering performance when compared to state-of-the-art techniques. To sum up, MultiDAAE can help identify possible molecular pathways and provide information for the development of tailored cancer treatments.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"879-895"},"PeriodicalIF":1.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144496836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-03-28DOI: 10.1089/cmb.2024.0872
Junlai Qiu, Qingfeng Chen, Wei Lan, Junyue Cao
Gleason grading of prostate histopathology images is widely used by pathologists for diagnosis and prognosis. Spatial characteristics of cell and tissues through staining images is essential for accurate grading of prostate cancer. Although considerable efforts have been made to train grading models, they mainly rely on basic preprocessed images and largely overlook the intricate multiple staining aspects of histopathology images that are crucial for spatial information capture. This article proposes a novel deep learning model for automated prostate cancer grading by integrating several staining characteristics. Image deconvolution is applied to separate the multiple staining channels in the histopathology image, thereby enabling the model to identify effective feature information. A channel and pixel attention-based encoder is designed to extract cell and tissue structure information from multiple staining channel images. We propose a dual-branch decoder, where the classical convolutional neural network branch specializes in local feature extraction and the Transformer branch focuses on global feature extraction, to effectively fuse and refine features from different staining channels. Taking full advantage of the complementarity of multiple staining channels makes the features more compact and discriminative, leading to precise grading. Extensive experiments on relevant public datasets demonstrate the effectiveness and scalability of the proposed model.
{"title":"Multichannel Contribution Aware Network for Prostate Cancer Grading in Histopathology Images.","authors":"Junlai Qiu, Qingfeng Chen, Wei Lan, Junyue Cao","doi":"10.1089/cmb.2024.0872","DOIUrl":"10.1089/cmb.2024.0872","url":null,"abstract":"<p><p>Gleason grading of prostate histopathology images is widely used by pathologists for diagnosis and prognosis. Spatial characteristics of cell and tissues through staining images is essential for accurate grading of prostate cancer. Although considerable efforts have been made to train grading models, they mainly rely on basic preprocessed images and largely overlook the intricate multiple staining aspects of histopathology images that are crucial for spatial information capture. This article proposes a novel deep learning model for automated prostate cancer grading by integrating several staining characteristics. Image deconvolution is applied to separate the multiple staining channels in the histopathology image, thereby enabling the model to identify effective feature information. A channel and pixel attention-based encoder is designed to extract cell and tissue structure information from multiple staining channel images. We propose a dual-branch decoder, where the classical convolutional neural network branch specializes in local feature extraction and the Transformer branch focuses on global feature extraction, to effectively fuse and refine features from different staining channels. Taking full advantage of the complementarity of multiple staining channels makes the features more compact and discriminative, leading to precise grading. Extensive experiments on relevant public datasets demonstrate the effectiveness and scalability of the proposed model.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"826-837"},"PeriodicalIF":1.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143730222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-05-02DOI: 10.1089/cmb.2024.0925
Li-Lu Guo
Pattern locating is a crucial step in various biological sequence analysis tasks. As a compressed full-text indexing technology, full-text minute-space index has been introduced for biological pattern locating over ultra-long genomes, with a low memory footprint and retrieving time independent of genome size. However, its locating time is limited by the number of occurrences of the biological pattern in the genome, and it is not efficient enough when dealing with mass-occurrence biological patterns. To solve this problem, we propose an efficient locating algorithm for mass-occurrence biological patterns in genomic sequence, namely Effloc. It is developed on two optimization techniques. One is that rankings with the same Burrows-Wheeler Transform character are organized into a group and calculated together, thereby reducing the number of last-to-first column (LF) mapping operations required to jump forward to find suffix array (SA) sampling points; the other is to design a specific structure to record the jump status, thus avoiding the redundant LF mapping operations that exist in the process of finding SA sampling points for those adjacent patterns that share the same sampling point. Compared with the existing algorithm, Effloc can significantly reduce the number of time-consuming LF mapping operations in mass-occurrence pattern locating. Ablation experiments verified our algorithm's effectiveness, exhibiting faster locating speed compared with five state-of-the-art competing algorithms. The source code and data are released at https://github.com/Lilu-guo/Effloc.
{"title":"Effloc: An Efficient Locating Algorithm for Mass-Occurrence Biological Patterns with FM-Index.","authors":"Li-Lu Guo","doi":"10.1089/cmb.2024.0925","DOIUrl":"10.1089/cmb.2024.0925","url":null,"abstract":"<p><p>Pattern locating is a crucial step in various biological sequence analysis tasks. As a compressed full-text indexing technology, full-text minute-space index has been introduced for biological pattern locating over ultra-long genomes, with a low memory footprint and retrieving time independent of genome size. However, its locating time is limited by the number of occurrences of the biological pattern in the genome, and it is not efficient enough when dealing with mass-occurrence biological patterns. To solve this problem, we propose an efficient locating algorithm for mass-occurrence biological patterns in genomic sequence, namely Effloc. It is developed on two optimization techniques. One is that rankings with the same Burrows-Wheeler Transform character are organized into a group and calculated together, thereby reducing the number of last-to-first column (<i>LF</i>) mapping operations required to jump forward to find suffix array (SA) sampling points; the other is to design a specific structure to record the jump status, thus avoiding the redundant <i>LF</i> mapping operations that exist in the process of finding SA sampling points for those adjacent patterns that share the same sampling point. Compared with the existing algorithm, Effloc can significantly reduce the number of time-consuming <i>LF</i> mapping operations in mass-occurrence pattern locating. Ablation experiments verified our algorithm's effectiveness, exhibiting faster locating speed compared with five state-of-the-art competing algorithms. The source code and data are released at https://github.com/Lilu-guo/Effloc.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"865-878"},"PeriodicalIF":1.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144009455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01Epub Date: 2025-04-28DOI: 10.1089/cmb.2024.0884
Xinyuan Shi, Fangfang Zhu, Wenwen Min
Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580.
{"title":"VTrans: A VAE-Based Pre-Trained Transformer Method for Microbiome Data Analysis.","authors":"Xinyuan Shi, Fangfang Zhu, Wenwen Min","doi":"10.1089/cmb.2024.0884","DOIUrl":"10.1089/cmb.2024.0884","url":null,"abstract":"<p><p>Predicting the survival outcomes and assessing the risk of patients play a pivotal role in comprehending the microbial composition across various stages of cancer. With the ongoing advancements in deep learning, it has been substantiated that deep learning holds the potential to analyze patient survival risks based on microbial data. However, confronting a common challenge in individual cancer datasets involves the limited sample size and the high dimensionality of the feature space. This predicament often leads to overfitting issues in deep learning models, hindering their ability to effectively extract profound data representations and resulting in suboptimal model performance. To overcome these challenges, we advocate the utilization of pretraining and fine-tuning strategies, which have proven effective in addressing the constraint of having a smaller sample size in individual cancer datasets. In this study, we propose a deep learning model that amalgamates Transformer encoder and variational autoencoder (VAE), VTrans, employing both pre-training and fine-tuning strategies to predict the survival risk of cancer patients using microbial data. Furthermore, we highlight the potential of extending VTrans to integrate microbial multi-omics data. Our method is assessed on three distinct cancer datasets from The Cancer Genome Atlas Program, and the research findings demonstrated that (1) VTrans excels in terms of performance compared to conventional machine learning and other deep learning models. (2) The utilization of pretraning significantly enhances its performance. (3) In contrast to positional encoding, employing VAE encoding proves to be more effective in enriching data representation. (4) Using the idea of saliency map, it is possible to observe which microbes have a high contribution to the classification results. These results demonstrate the effectiveness of VTrans in prediting patient survival risk. Source code and all datasets used in this paper are available at https://github.com/wenwenmin/VTrans and https://doi.org/10.5281/zenodo.14166580.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"850-864"},"PeriodicalIF":1.6,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144002934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jūratė Šaltytė Benth, Fred Espen Benth, Espen Rostrup Nakstad
{"title":"Rebuttal to Flaws in the Paper 'Nearly Instantaneous Time-Varying Reproduction Number for Contagious Diseases-a Direct Approach Based on Nonlinear Regression'.","authors":"Jūratė Šaltytė Benth, Fred Espen Benth, Espen Rostrup Nakstad","doi":"10.1089/cmb.2025.0024","DOIUrl":"https://doi.org/10.1089/cmb.2025.0024","url":null,"abstract":"","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"32 8","pages":"819-823"},"PeriodicalIF":1.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144753527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01DOI: 10.1177/15578666251360613
Geir Storvik, Solveig Engebretsen, Birgitte Freiesleben de Blasio, Arnoldo Frigessi
{"title":"Flaws in the Article \"Nearly Instantaneous Time-Varying Reproduction Number for Contagious Diseases-a Direct Approach Based on Nonlinear Regression\".","authors":"Geir Storvik, Solveig Engebretsen, Birgitte Freiesleben de Blasio, Arnoldo Frigessi","doi":"10.1177/15578666251360613","DOIUrl":"https://doi.org/10.1177/15578666251360613","url":null,"abstract":"","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"32 8","pages":"813-818"},"PeriodicalIF":1.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144753526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-01Epub Date: 2025-06-11DOI: 10.1089/cmb.2024.0614
Alexandra Sasha Gavryushkina, Holly R Pinkney, Sarah D Diermeier, Alex Gavryushkin
Phylogenetic relationship of cells within tumors can help us to understand how cancer develops in space and time and identify driver mutations and other evolutionary events that enable cancer growth and spread. Numerous studies have reconstructed phylogenies from single-cell DNA-seq data. Here, we are looking into the problem of phylogenetic analysis of spatially resolved near single-cell RNA-seq data, which is a cost-efficient alternative (or complementary) data source that integrates multiple sources of evolutionary information, including point mutations, copy number changes, and epimutations. Recent attempts to use such data, although promising, raised many methodological challenges. Here, we explored data preprocessing and modeling approaches for evolutionary analyses of Visium spatial transcriptomics data. We conclude that using only highly variable genes and accounting for heterogeneous RNA capture across tissue-covered spots improves the reconstructed topological relationships and influences estimated branch lengths.
{"title":"Filtering for Highly Variable Genes and High-Quality Spots Improves Phylogenetic Analysis of Cancer Spatial Transcriptomics Visium Data.","authors":"Alexandra Sasha Gavryushkina, Holly R Pinkney, Sarah D Diermeier, Alex Gavryushkin","doi":"10.1089/cmb.2024.0614","DOIUrl":"10.1089/cmb.2024.0614","url":null,"abstract":"<p><p>Phylogenetic relationship of cells within tumors can help us to understand how cancer develops in space and time and identify driver mutations and other evolutionary events that enable cancer growth and spread. Numerous studies have reconstructed phylogenies from single-cell DNA-seq data. Here, we are looking into the problem of phylogenetic analysis of spatially resolved near single-cell RNA-seq data, which is a cost-efficient alternative (or complementary) data source that integrates multiple sources of evolutionary information, including point mutations, copy number changes, and epimutations. Recent attempts to use such data, although promising, raised many methodological challenges. Here, we explored data preprocessing and modeling approaches for evolutionary analyses of Visium spatial transcriptomics data. We conclude that using only highly variable genes and accounting for heterogeneous RNA capture across tissue-covered spots improves the reconstructed topological relationships and influences estimated branch lengths.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"738-752"},"PeriodicalIF":1.6,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144266391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}