Pub Date : 2025-07-01DOI: 10.1109/TCBB.2024.3471930
Zhuoping Zhou, Boning Tong, Davoud Ataee Tarzanagh, Bojian Hou, Andrew J Saykin, Qi Long, Li Shen
Tensor Canonical Correlation Analysis (TCCA) is a commonly employed statistical method utilized to examine linear associations between two sets of tensor datasets. However, the existing TCCA models fail to adequately address the heterogeneity present in real-world tensor data, such as brain imaging data collected from diverse groups characterized by factors like sex and race. Consequently, these models may yield biased outcomes. In order to surmount this constraint, we propose a novel approach called Multi-Group TCCA (MG-TCCA), which enables the joint analysis of multiple subgroups. By incorporating a dual sparsity structure and a block coordinate ascent algorithm, our MG-TCCA method effectively addresses heterogeneity and leverages information across different groups to identify consistent signals. This novel approach facilitates the quantification of shared and individual structures, reduces data dimensionality, and enables visual exploration. To empirically validate our approach, we conduct a study focused on investigating correlations between two brain positron emission tomography (PET) modalities (AV-45 and FDG) within an Alzheimer's disease (AD) cohort. Our results demonstrate that MG-TCCA surpasses traditional TCCA and Sparse TCCA (STCCA) in identifying sex-specific cross-modality imaging correlations. This heightened performance of MG-TCCA provides valuable insights for the characterization of multimodal imaging biomarkers in AD.
{"title":"MG-TCCA: Tensor Canonical Correlation Analysis Across Multiple Groups.","authors":"Zhuoping Zhou, Boning Tong, Davoud Ataee Tarzanagh, Bojian Hou, Andrew J Saykin, Qi Long, Li Shen","doi":"10.1109/TCBB.2024.3471930","DOIUrl":"10.1109/TCBB.2024.3471930","url":null,"abstract":"<p><p>Tensor Canonical Correlation Analysis (TCCA) is a commonly employed statistical method utilized to examine linear associations between two sets of tensor datasets. However, the existing TCCA models fail to adequately address the heterogeneity present in real-world tensor data, such as brain imaging data collected from diverse groups characterized by factors like sex and race. Consequently, these models may yield biased outcomes. In order to surmount this constraint, we propose a novel approach called Multi-Group TCCA (MG-TCCA), which enables the joint analysis of multiple subgroups. By incorporating a dual sparsity structure and a block coordinate ascent algorithm, our MG-TCCA method effectively addresses heterogeneity and leverages information across different groups to identify consistent signals. This novel approach facilitates the quantification of shared and individual structures, reduces data dimensionality, and enables visual exploration. To empirically validate our approach, we conduct a study focused on investigating correlations between two brain positron emission tomography (PET) modalities (AV-45 and FDG) within an Alzheimer's disease (AD) cohort. Our results demonstrate that MG-TCCA surpasses traditional TCCA and Sparse TCCA (STCCA) in identifying sex-specific cross-modality imaging correlations. This heightened performance of MG-TCCA provides valuable insights for the characterization of multimodal imaging biomarkers in AD.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"1299-1310"},"PeriodicalIF":3.4,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11954983/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01DOI: 10.1109/TCBB.2024.3447746
Shaokai Wang, Ming Zhu, Bin Ma
Major Histocompatibility Complex (MHC) molecules play a critical role in the immune system by presenting peptides on the cell surface for recognition by T-cells. Tumor cells often produce MHC peptides with amino acid mutations, known as neoantigens, which evade T-cell recognition, leading to rapid tumor growth. In immunotherapies such as TCR-T and CAR-T, identifying these mutated MHC peptide sequences is crucial. Current mass spectrometry-based peptide identification methods primarily rely on database searching, which fails to detect mutated peptides not present in human databases. In this paper, we propose a novel workflow called NeoMS, designed to efficiently identify both non-mutated and mutated MHC-I peptides from mass spectrometry data. NeoMS utilizes a tagging algorithm to generate an expanded sequence database that includes potential mutated proteins for each sample. Furthermore, it employs a machine learning-based scoring function for each peptide-spectrum match (PSM) to maximize search sensitivity. Finally, a rigorous target-decoy approach is implemented to control the false discovery rates (FDR) of the peptides with and without mutations separately. Experimental results for regular peptides demonstrate that NeoMS outperforms four benchmark methods. For mutated peptides, NeoMS successfully identifies hundreds of high-quality mutated peptides in a melanoma-associated sample, with their validity confirmed by further studies.
主要组织相容性复合物(MHC)分子在免疫系统中发挥着关键作用,它在细胞表面呈现肽,供 T 细胞识别。肿瘤细胞通常会产生氨基酸突变的 MHC 多肽,即所谓的新抗原,它们会逃避 T 细胞的识别,导致肿瘤快速生长。在 TCR-T 和 CAR-T 等免疫疗法中,识别这些突变的 MHC 肽序列至关重要。目前基于质谱的多肽识别方法主要依赖于数据库搜索,但这种方法无法检测到人类数据库中不存在的突变多肽。在本文中,我们提出了一种名为 NeoMS 的新型工作流程,旨在从质谱数据中有效识别非突变和突变 MHC-I 肽。NeoMS 利用标记算法生成一个扩展序列数据库,其中包括每个样本的潜在突变蛋白质。此外,它还对每个肽谱匹配(PSM)采用基于机器学习的评分函数,以最大限度地提高搜索灵敏度。最后,它采用了一种严格的目标诱饵方法,分别控制有突变和无突变肽段的错误发现率(FDR)。针对常规多肽的实验结果表明,NeoMS优于四种基准方法。对于突变肽,NeoMS在黑色素瘤相关样本中成功鉴定出了数百个高质量的突变肽,其有效性得到了进一步研究的证实。
{"title":"NeoMS: Mass Spectrometry-Based Method for Uncovering Mutated MHC-I Neoantigens.","authors":"Shaokai Wang, Ming Zhu, Bin Ma","doi":"10.1109/TCBB.2024.3447746","DOIUrl":"10.1109/TCBB.2024.3447746","url":null,"abstract":"<p><p>Major Histocompatibility Complex (MHC) molecules play a critical role in the immune system by presenting peptides on the cell surface for recognition by T-cells. Tumor cells often produce MHC peptides with amino acid mutations, known as neoantigens, which evade T-cell recognition, leading to rapid tumor growth. In immunotherapies such as TCR-T and CAR-T, identifying these mutated MHC peptide sequences is crucial. Current mass spectrometry-based peptide identification methods primarily rely on database searching, which fails to detect mutated peptides not present in human databases. In this paper, we propose a novel workflow called NeoMS, designed to efficiently identify both non-mutated and mutated MHC-I peptides from mass spectrometry data. NeoMS utilizes a tagging algorithm to generate an expanded sequence database that includes potential mutated proteins for each sample. Furthermore, it employs a machine learning-based scoring function for each peptide-spectrum match (PSM) to maximize search sensitivity. Finally, a rigorous target-decoy approach is implemented to control the false discovery rates (FDR) of the peptides with and without mutations separately. Experimental results for regular peptides demonstrate that NeoMS outperforms four benchmark methods. For mutated peptides, NeoMS successfully identifies hundreds of high-quality mutated peptides in a melanoma-associated sample, with their validity confirmed by further studies.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"444-454"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01DOI: 10.1109/TCBB.2024.3422288
Lei Zhang, Junyong Zhu, Sheng Wang, Jie Hou, Dong Si, Renzhi Cao
The goal of protein structure refinement is to enhance the precision of predicted protein models, particularly at the residue level of the local structure. Existing refinement approaches primarily rely on physics, whereas molecular simulation methods are resource-intensive and time-consuming. In this study, we employ deep learning methods to extract structural constraints from protein structure residues to assist in protein structure refinement. We introduce a novel method, AnglesRefine, which focuses on a protein's secondary structure and employs transformer to refine various protein structure angles (psi, phi, omega, CA_C_N_angle, C_N_CA_angle, N_CA_C_angle), ultimately generating a superior protein model based on the refined angles. We evaluate our approach against other cutting-edge methods using the CASP11-14 and CASP15 datasets. Experimental outcomes indicate that our method generally surpasses other techniques on the CASP11-14 test dataset, while performing comparably or marginally better on the CASP15 test dataset. Our method consistently demonstrates the least likelihood of model quality degradation, e.g., the degradation percentage of our method is less than 10%, while other methods are about 50%. Furthermore, as our approach eliminates the need for conformational search and sampling, it significantly reduces computational time compared to existing refinement methods.
{"title":"AnglesRefine: Refinement of 3D Protein Structures Using Transformer Based on Torsion Angles.","authors":"Lei Zhang, Junyong Zhu, Sheng Wang, Jie Hou, Dong Si, Renzhi Cao","doi":"10.1109/TCBB.2024.3422288","DOIUrl":"10.1109/TCBB.2024.3422288","url":null,"abstract":"<p><p>The goal of protein structure refinement is to enhance the precision of predicted protein models, particularly at the residue level of the local structure. Existing refinement approaches primarily rely on physics, whereas molecular simulation methods are resource-intensive and time-consuming. In this study, we employ deep learning methods to extract structural constraints from protein structure residues to assist in protein structure refinement. We introduce a novel method, AnglesRefine, which focuses on a protein's secondary structure and employs transformer to refine various protein structure angles (psi, phi, omega, CA_C_N_angle, C_N_CA_angle, N_CA_C_angle), ultimately generating a superior protein model based on the refined angles. We evaluate our approach against other cutting-edge methods using the CASP11-14 and CASP15 datasets. Experimental outcomes indicate that our method generally surpasses other techniques on the CASP11-14 test dataset, while performing comparably or marginally better on the CASP15 test dataset. Our method consistently demonstrates the least likelihood of model quality degradation, e.g., the degradation percentage of our method is less than 10%, while other methods are about 50%. Furthermore, as our approach eliminates the need for conformational search and sampling, it significantly reduces computational time compared to existing refinement methods.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"397-408"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141497925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01DOI: 10.1109/TCBB.2024.3477909
Ghulam Murtaza, Justin Wagner, Justin M Zook, Ritambhara Singh
Hi-C experiments allow researchers to study and understand the 3D genome organization and its regulatory function. Unfortunately, sequencing costs and technical constraints severely restrict access to high-quality Hi-C data for many cell types. Existing frameworks rely on a sparse Hi-C dataset or cheaper-to-acquire ChIP-seq data to predict Hi-C contact maps with high read coverage. However, these methods fail to generalize to sparse or cross-cell-type inputs because they do not account for the contributions of epigenomic features or the impact of the structural neighborhood in predicting Hi-C reads. We propose GrapHiC, which combines Hi-C and ChIP-seq in a graph representation, allowing more accurate embedding of structural and epigenomic features. Each node represents a binned genomic region, and we assign edge weights using the observed Hi-C reads. Additionally, we embed ChIP-seq and relative positional information as node attributes, allowing our representation to capture structural neighborhoods and the contributions of proteins and their modifications for predicting Hi-C reads. We show that GrapHiC generalizes better than the current state-of-the-art on cross-cell-type settings and sparse Hi-C inputs. Moreover, we can utilize our framework to impute Hi-C reads even when no Hi-C contact map is available, thus making high-quality Hi-C data accessible for many cell types.
{"title":"GrapHiC: An Integrative Graph Based Approach for Imputing Missing Hi-C Reads.","authors":"Ghulam Murtaza, Justin Wagner, Justin M Zook, Ritambhara Singh","doi":"10.1109/TCBB.2024.3477909","DOIUrl":"10.1109/TCBB.2024.3477909","url":null,"abstract":"<p><p>Hi-C experiments allow researchers to study and understand the 3D genome organization and its regulatory function. Unfortunately, sequencing costs and technical constraints severely restrict access to high-quality Hi-C data for many cell types. Existing frameworks rely on a sparse Hi-C dataset or cheaper-to-acquire ChIP-seq data to predict Hi-C contact maps with high read coverage. However, these methods fail to generalize to sparse or cross-cell-type inputs because they do not account for the contributions of epigenomic features or the impact of the structural neighborhood in predicting Hi-C reads. We propose GrapHiC, which combines Hi-C and ChIP-seq in a graph representation, allowing more accurate embedding of structural and epigenomic features. Each node represents a binned genomic region, and we assign edge weights using the observed Hi-C reads. Additionally, we embed ChIP-seq and relative positional information as node attributes, allowing our representation to capture structural neighborhoods and the contributions of proteins and their modifications for predicting Hi-C reads. We show that GrapHiC generalizes better than the current state-of-the-art on cross-cell-type settings and sparse Hi-C inputs. Moreover, we can utilize our framework to impute Hi-C reads even when no Hi-C contact map is available, thus making high-quality Hi-C data accessible for many cell types.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"409-419"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12034241/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142406376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01DOI: 10.1109/TCBB.2024.3443854
Chu-Ting Yu, Bo Tian, Qian-Qian Meng, Zhe-Ran Chen, Ya-Nan Pang, Xun Zhang, Yan Bian, Si-Wei Zhou, Mei-Juan Hao, Ye Gao, Lei Xin, Han Lin, Wei Wang, Luo-Wei Wang
Immunotherapy for esophageal squamous cell carcinoma (ESCC) exhibits notable variability in efficacy. Concurrently, recent research emphasizes circRNAs' impact on the ESCC tumor microenvironment. To further explore the relationship, we leveraged circRNA, microRNA, and mRNA sequence datasets to construct a comprehensive immune-related circRNA-microRNA-mRNA network, revealing competing endogenous RNA (ceRNA) roles in ESCC. The network comprises 16 circular RNAs, 13 microRNAs, and 1,560 mRNAs. Weighted gene co-expression analysis identified immune-related modules, notably cancer-associated fibroblast (CAF) and myeloid-derived suppressor cell modules, correlating significantly with immune and stemness scores. Among them, the CAF module plays a crucial role in extracellular matrix function and effectively discriminates ESCC patients. Four hub collagen family genes within CAF correlated robustly with CAF, macrophage infiltration, and T-cell exclusion. In-house sequencing and RT-qPCR validated their elevated expression. We also identified CAF module-targeting drugs as potential ESCC treatments. In summary, we established an immune-related circRNA-miRNA-mRNA network that not only illuminates ceRNA functionality but also highlights circRNAs' involvement in the CAF through collagen gene targeting. These findings hold promise to predict ESCC immune landscapes and therapy responses, ultimately aiding in more personalized and effective clinical decision-making.
{"title":"Development and Validation of a Comprehensive Analysis of the Competing Endogenous circRNA/miRNA/mRNA Network for the Identification of Immune-Related Targets in Esophageal Squamous Cell Carcinoma.","authors":"Chu-Ting Yu, Bo Tian, Qian-Qian Meng, Zhe-Ran Chen, Ya-Nan Pang, Xun Zhang, Yan Bian, Si-Wei Zhou, Mei-Juan Hao, Ye Gao, Lei Xin, Han Lin, Wei Wang, Luo-Wei Wang","doi":"10.1109/TCBB.2024.3443854","DOIUrl":"10.1109/TCBB.2024.3443854","url":null,"abstract":"<p><p>Immunotherapy for esophageal squamous cell carcinoma (ESCC) exhibits notable variability in efficacy. Concurrently, recent research emphasizes circRNAs' impact on the ESCC tumor microenvironment. To further explore the relationship, we leveraged circRNA, microRNA, and mRNA sequence datasets to construct a comprehensive immune-related circRNA-microRNA-mRNA network, revealing competing endogenous RNA (ceRNA) roles in ESCC. The network comprises 16 circular RNAs, 13 microRNAs, and 1,560 mRNAs. Weighted gene co-expression analysis identified immune-related modules, notably cancer-associated fibroblast (CAF) and myeloid-derived suppressor cell modules, correlating significantly with immune and stemness scores. Among them, the CAF module plays a crucial role in extracellular matrix function and effectively discriminates ESCC patients. Four hub collagen family genes within CAF correlated robustly with CAF, macrophage infiltration, and T-cell exclusion. In-house sequencing and RT-qPCR validated their elevated expression. We also identified CAF module-targeting drugs as potential ESCC treatments. In summary, we established an immune-related circRNA-miRNA-mRNA network that not only illuminates ceRNA functionality but also highlights circRNAs' involvement in the CAF through collagen gene targeting. These findings hold promise to predict ESCC immune landscapes and therapy responses, ultimately aiding in more personalized and effective clinical decision-making.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"481-492"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142106989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01DOI: 10.1109/TCBB.2024.3467033
Gabriel Siqueira, Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Geraldine Jean, Guillaume Fertin, Zanoni Dias
Genome Rearrangement distance problems are used in Computational Biology to estimate the evolutionary distance between genomes. These problems consist of minimizing the number of rearrangement events necessary to transform one genome into another. Two commonly used rearrangement events are reversal and transposition. The first studied problems ignored nucleotides outside genes (called intergenic regions), or assumed that genomes have a single copy of each gene. Recent works made advancements in more general problems considering the number of nucleotides in intergenic regions, and replicated genes. Nevertheless, genomes tend to have wildly different quantities of nucleotides on their intergenic regions, which poses a problem when comparing these regions exactly. To overcome this limitation, our work considers some flexibility when matching intergenic regions that do not have the same number of nucleotides. We propose new problems seeking the minimum number of reversals, or reversals and transpositions, necessary to transform one genome into another, while considering flexible intergenic region information. We show approximations for these problems by exploring their relationship with the Signed Minimum Common Flexible Intergenic String Partition problem. We also present different heuristics for the partition problem, and conduct experimental tests on simulated genomes to assess the performance of our algorithms.
{"title":"Partition Based Algorithms for Rearrangement Distances With Flexible Intergenic Regions.","authors":"Gabriel Siqueira, Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Geraldine Jean, Guillaume Fertin, Zanoni Dias","doi":"10.1109/TCBB.2024.3467033","DOIUrl":"10.1109/TCBB.2024.3467033","url":null,"abstract":"<p><p>Genome Rearrangement distance problems are used in Computational Biology to estimate the evolutionary distance between genomes. These problems consist of minimizing the number of rearrangement events necessary to transform one genome into another. Two commonly used rearrangement events are reversal and transposition. The first studied problems ignored nucleotides outside genes (called intergenic regions), or assumed that genomes have a single copy of each gene. Recent works made advancements in more general problems considering the number of nucleotides in intergenic regions, and replicated genes. Nevertheless, genomes tend to have wildly different quantities of nucleotides on their intergenic regions, which poses a problem when comparing these regions exactly. To overcome this limitation, our work considers some flexibility when matching intergenic regions that do not have the same number of nucleotides. We propose new problems seeking the minimum number of reversals, or reversals and transpositions, necessary to transform one genome into another, while considering flexible intergenic region information. We show approximations for these problems by exploring their relationship with the Signed Minimum Common Flexible Intergenic String Partition problem. We also present different heuristics for the partition problem, and conduct experimental tests on simulated genomes to assess the performance of our algorithms.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"455-468"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01DOI: 10.1109/TCBB.2024.3486911
Xuehua Bi, Chunyang Jiang, Cheng Yan, Kai Zhao, Linlin Zhang, Jianxin Wang
MiRNAs play an important role in the occurrence and development of human disease. Identifying potential miRNA-disease associations is valuable for disease diagnosis and treatment. Therefore, it is urgent to develop efficient computational methods for predicting potential miRNA-disease associations to reduce the cost and time associated with biological wet experiments. In addition, high-quality feature representation remains a challenge for miRNA-disease association prediction using graph neural network methods. In this paper, we propose a method named ESGC-MDA, which employs an enhanced Simple Graph Convolution Network to identify miRNA-disease associations. We first construct a bipartite attributed graph for miRNAs and diseases by computing multi-source similarity. Then, we enhance the feature representations of miRNA and disease nodes by applying two strategies in the simple convolution network, which include randomly dropping messages during propagation to ensure the model learns more reliable feature representations, and using adaptive weighting to aggregate features from different layers. Finally, we calculate the prediction scores of miRNA-disease pairs by using a fully connected neural network decoder. We conduct 5-fold cross-validation and 10-fold cross-validation on HDMM v2.0 and HMDD v3.2, respectively, and ESGC-MDA achieves better performance than state-of-the-art baseline methods. The case studies for cardiovascular disease, lung cancer and colon cancer also further confirm the effectiveness of ESGC-MDA.
{"title":"ESGC-MDA: Identifying miRNA-Disease Associations Using Enhanced Simple Graph Convolutional Networks.","authors":"Xuehua Bi, Chunyang Jiang, Cheng Yan, Kai Zhao, Linlin Zhang, Jianxin Wang","doi":"10.1109/TCBB.2024.3486911","DOIUrl":"10.1109/TCBB.2024.3486911","url":null,"abstract":"<p><p>MiRNAs play an important role in the occurrence and development of human disease. Identifying potential miRNA-disease associations is valuable for disease diagnosis and treatment. Therefore, it is urgent to develop efficient computational methods for predicting potential miRNA-disease associations to reduce the cost and time associated with biological wet experiments. In addition, high-quality feature representation remains a challenge for miRNA-disease association prediction using graph neural network methods. In this paper, we propose a method named ESGC-MDA, which employs an enhanced Simple Graph Convolution Network to identify miRNA-disease associations. We first construct a bipartite attributed graph for miRNAs and diseases by computing multi-source similarity. Then, we enhance the feature representations of miRNA and disease nodes by applying two strategies in the simple convolution network, which include randomly dropping messages during propagation to ensure the model learns more reliable feature representations, and using adaptive weighting to aggregate features from different layers. Finally, we calculate the prediction scores of miRNA-disease pairs by using a fully connected neural network decoder. We conduct 5-fold cross-validation and 10-fold cross-validation on HDMM v2.0 and HMDD v3.2, respectively, and ESGC-MDA achieves better performance than state-of-the-art baseline methods. The case studies for cardiovascular disease, lung cancer and colon cancer also further confirm the effectiveness of ESGC-MDA.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"422-432"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142521814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1109/TCBB.2024.3492708
Flora Rajaei, Cristian Minoccheri, Emily Wittrup, Richard C Wilson, Brian D Athey, Gilbert S Omenn, Kayvan Najarian
Over the past few years, artificial intelligence (AI) has emerged as a transformative force in drug discovery and development (DDD), revolutionizing many aspects of the process. This survey provides a comprehensive review of recent advancements in AI applications within early drug discovery and post-market drug assessment. It addresses the identification and prioritization of new therapeutic targets, prediction of drug-target interaction (DTI), design of novel drug-like molecules, and assessment of the clinical efficacy of new medications. By integrating AI technologies, pharmaceutical companies can accelerate the discovery of new treatments, enhance the precision of drug development, and bring more effective therapies to market. This shift represents a significant move towards more efficient and cost-effective methodologies in the DDD landscape.
{"title":"AI-Based Computational Methods in Early Drug Discovery and Post Market Drug Assessment: A Survey.","authors":"Flora Rajaei, Cristian Minoccheri, Emily Wittrup, Richard C Wilson, Brian D Athey, Gilbert S Omenn, Kayvan Najarian","doi":"10.1109/TCBB.2024.3492708","DOIUrl":"10.1109/TCBB.2024.3492708","url":null,"abstract":"<p><p>Over the past few years, artificial intelligence (AI) has emerged as a transformative force in drug discovery and development (DDD), revolutionizing many aspects of the process. This survey provides a comprehensive review of recent advancements in AI applications within early drug discovery and post-market drug assessment. It addresses the identification and prioritization of new therapeutic targets, prediction of drug-target interaction (DTI), design of novel drug-like molecules, and assessment of the clinical efficacy of new medications. By integrating AI technologies, pharmaceutical companies can accelerate the discovery of new treatments, enhance the precision of drug development, and bring more effective therapies to market. This shift represents a significant move towards more efficient and cost-effective methodologies in the DDD landscape.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"97-115"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395280/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1109/TCBB.2024.3487434
Haosheng Zhou, Wei Lin, Sergio R Labra, Stuart A Lipton, Jeremy A Elman, Nicholas J Schork, Aaditya V Rangan
Many traditional methods for analyzing gene-gene relationships focus on positive and negative correlations, both of which are a kind of 'symmetric' relationship. Biclustering is one such technique that typically searches for subsets of genes exhibiting correlated expression among a subset of samples. However, genes can also exhibit 'asymmetric' relationships, such as 'if-then' relationships used in boolean circuits. In this paper we develop a very general method that can be used to detect biclusters within gene-expression data that involve subsets of genes which are enriched for these 'boolean-asymmetric' relationships (BARs). These BAR-biclusters can correspond to heterogeneity that is driven by asymmetric gene-gene interactions, e.g., reflecting regulatory effects of one gene on another, rather than more standard symmetric interactions. Unlike typical approaches that search for BARs across the entire population, BAR-biclusters can detect asymmetric interactions that only occur among a subset of samples. We apply our method to a single-cell RNA-sequencing data-set, demonstrating that the statistically-significant BAR-biclusters indeed contain additional information not present within the more traditional 'boolean-symmetric'-biclusters. For example, the BAR-biclusters involve different subsets of cells, and highlight different gene-pathways within the data-set. Moreover, by combining the boolean-asymmetric- and boolean-symmetric-signals, one can build linear classifiers which outperform those built using only traditional boolean-symmetric signals.
许多分析基因-基因关系的传统方法都侧重于正相关和负相关,这两种关系都是一种 "对称 "关系。双聚类就是这样一种技术,它通常在样本子集中搜索表现出相关表达的基因子集。然而,基因也可以表现出 "非对称 "关系,例如布尔电路中使用的 "如果-那么 "关系。在本文中,我们开发了一种非常通用的方法,可用于检测基因表达数据中的双簇,这些数据涉及富集了这些 "布尔-非对称 "关系(BAR)的基因子集。这些 "布尔-非对称 "关系双集群可能对应于由非对称基因-基因相互作用驱动的异质性,例如,反映一个基因对另一个基因的调控作用,而不是更标准的对称相互作用。与在整个群体中搜索 BAR 的典型方法不同,BAR-双簇可以检测到只发生在部分样本中的非对称相互作用。我们将这一方法应用于单细胞 RNA 序列数据集,结果表明,在统计意义上显著的 BAR 双簇确实包含了更传统的 "布尔-对称 "双簇所不具备的额外信息。例如,BAR 双簇涉及不同的细胞子集,并突出了数据集中不同的基因通路。此外,通过结合布尔-非对称信号和布尔-对称信号,我们可以建立线性分类器,其效果优于仅使用传统布尔-对称信号建立的分类器。
{"title":"Detecting Boolean Asymmetric Relationships With a Loop Counting Technique and its Implications for Analyzing Heterogeneity Within Gene Expression Datasets.","authors":"Haosheng Zhou, Wei Lin, Sergio R Labra, Stuart A Lipton, Jeremy A Elman, Nicholas J Schork, Aaditya V Rangan","doi":"10.1109/TCBB.2024.3487434","DOIUrl":"10.1109/TCBB.2024.3487434","url":null,"abstract":"<p><p>Many traditional methods for analyzing gene-gene relationships focus on positive and negative correlations, both of which are a kind of 'symmetric' relationship. Biclustering is one such technique that typically searches for subsets of genes exhibiting correlated expression among a subset of samples. However, genes can also exhibit 'asymmetric' relationships, such as 'if-then' relationships used in boolean circuits. In this paper we develop a very general method that can be used to detect biclusters within gene-expression data that involve subsets of genes which are enriched for these 'boolean-asymmetric' relationships (BARs). These BAR-biclusters can correspond to heterogeneity that is driven by asymmetric gene-gene interactions, e.g., reflecting regulatory effects of one gene on another, rather than more standard symmetric interactions. Unlike typical approaches that search for BARs across the entire population, BAR-biclusters can detect asymmetric interactions that only occur among a subset of samples. We apply our method to a single-cell RNA-sequencing data-set, demonstrating that the statistically-significant BAR-biclusters indeed contain additional information not present within the more traditional 'boolean-symmetric'-biclusters. For example, the BAR-biclusters involve different subsets of cells, and highlight different gene-pathways within the data-set. Moreover, by combining the boolean-asymmetric- and boolean-symmetric-signals, one can build linear classifiers which outperform those built using only traditional boolean-symmetric signals.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"27-38"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12037869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.1109/TCBB.2024.3493203
C Vishnuppriya, G Tamilpavai
Huntington Disease (HD) is a type of neurodegenerative disorder which causes problems like psychiatric disturbances, movement problem, weight loss and problem in sleep. It needs to be addressed in earlier stage of human life. Nowadays Deep Learning (DL) based system could help physicians provide second opinion in treating patient's disease. In this work, human Deoxyribo Nucleic Acid (DNA) sequence is analyzed using Deep Neural Network (DNN) algorithm to predict the HD disease. The main objective of this work is to identify whether the human DNA is affected by HD or not. Human DNA sequences are collected from National Center for Biotechnology Information (NCBI) and synthetic human DNA data are also constructed for process. Then numerical conversion of human DNA sequence data is done by Chaos Game Representation (CGR) method. After that, numerical values of DNA data are used for feature extraction. Mean, median, standard deviation, entropy, contrast, correlation, energy and homogeneity are extracted. Additionally, the following features such as counts of adenine, thymine, guanine and cytosine are extracted from the DNA sequence data itself. The extracted features are used as input to the DNN classifier and other machine learning based classifiers such as NN (Neural Network), Support Vector Machine (SVM), Random Forest (RF) and Classification Tree with Forward Pruning (CTWFP). Six performance measures are used such as Accuracy, Sensitivity, Specificity, Precision, F1 score and Mathew Correlation Co-efficient (MCC). The study concludes DNN, NN, SVM, RF achieve 100% accuracy and CTWFP achieves accuracy of 87%.
亨廷顿舞蹈症(Huntington Disease,HD)是一种神经退行性疾病,会导致精神障碍、运动障碍、体重减轻和睡眠障碍等问题。这种疾病需要在人类生命的早期阶段加以解决。如今,基于深度学习(DL)的系统可以帮助医生在治疗患者疾病时提供第二意见。在这项工作中,使用深度神经网络(DNN)算法对人类脱氧核糖核酸(DNA)序列进行分析,以预测人类乳腺疾病。这项工作的主要目的是确定人类 DNA 是否受 HD 影响。从美国国家生物技术信息中心(NCBI)收集了人类 DNA 序列,并构建了合成人类 DNA 数据。然后通过混沌博弈表示法(CGR)对人类 DNA 序列数据进行数值转换。之后,DNA 数据的数值被用于特征提取。提取出平均值、中位数、标准偏差、熵、对比度、相关性、能量和同质性。此外,还从 DNA 序列数据中提取了腺嘌呤、胸腺嘧啶、鸟嘌呤和胞嘧啶的计数等特征。提取的特征被用作 DNN 分类器和其他基于机器学习的分类器的输入,如 NN(神经网络)、支持向量机(SVM)、随机森林(RF)和前向剪枝分类树(CTWFP)。使用了六种性能指标,如准确度、灵敏度、特异度、精确度、F1 分数和马修相关系数 (MCC)。研究得出结论,DNN、NN、SVM、RF 的准确率达到 100%,CTWFP 的准确率达到 87%。
{"title":"Performance Comparison Between Deep Neural Network and Machine Learning Based Classifiers for Huntington Disease Prediction From Human DNA Sequence.","authors":"C Vishnuppriya, G Tamilpavai","doi":"10.1109/TCBB.2024.3493203","DOIUrl":"10.1109/TCBB.2024.3493203","url":null,"abstract":"<p><p>Huntington Disease (HD) is a type of neurodegenerative disorder which causes problems like psychiatric disturbances, movement problem, weight loss and problem in sleep. It needs to be addressed in earlier stage of human life. Nowadays Deep Learning (DL) based system could help physicians provide second opinion in treating patient's disease. In this work, human Deoxyribo Nucleic Acid (DNA) sequence is analyzed using Deep Neural Network (DNN) algorithm to predict the HD disease. The main objective of this work is to identify whether the human DNA is affected by HD or not. Human DNA sequences are collected from National Center for Biotechnology Information (NCBI) and synthetic human DNA data are also constructed for process. Then numerical conversion of human DNA sequence data is done by Chaos Game Representation (CGR) method. After that, numerical values of DNA data are used for feature extraction. Mean, median, standard deviation, entropy, contrast, correlation, energy and homogeneity are extracted. Additionally, the following features such as counts of adenine, thymine, guanine and cytosine are extracted from the DNA sequence data itself. The extracted features are used as input to the DNN classifier and other machine learning based classifiers such as NN (Neural Network), Support Vector Machine (SVM), Random Forest (RF) and Classification Tree with Forward Pruning (CT<sub>WFP</sub>). Six performance measures are used such as Accuracy, Sensitivity, Specificity, Precision, F1 score and Mathew Correlation Co-efficient (MCC). The study concludes DNN, NN, SVM, RF achieve 100% accuracy and CT<sub>WFP</sub> achieves accuracy of 87%.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"52-63"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142604214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}