首页 > 最新文献

IEEE/ACM Transactions on Computational Biology and Bioinformatics最新文献

英文 中文
MG-TCCA: Tensor Canonical Correlation Analysis Across Multiple Groups. MG-TCCA:跨多组的张量典型相关分析。
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-07-01 DOI: 10.1109/TCBB.2024.3471930
Zhuoping Zhou, Boning Tong, Davoud Ataee Tarzanagh, Bojian Hou, Andrew J Saykin, Qi Long, Li Shen

Tensor Canonical Correlation Analysis (TCCA) is a commonly employed statistical method utilized to examine linear associations between two sets of tensor datasets. However, the existing TCCA models fail to adequately address the heterogeneity present in real-world tensor data, such as brain imaging data collected from diverse groups characterized by factors like sex and race. Consequently, these models may yield biased outcomes. In order to surmount this constraint, we propose a novel approach called Multi-Group TCCA (MG-TCCA), which enables the joint analysis of multiple subgroups. By incorporating a dual sparsity structure and a block coordinate ascent algorithm, our MG-TCCA method effectively addresses heterogeneity and leverages information across different groups to identify consistent signals. This novel approach facilitates the quantification of shared and individual structures, reduces data dimensionality, and enables visual exploration. To empirically validate our approach, we conduct a study focused on investigating correlations between two brain positron emission tomography (PET) modalities (AV-45 and FDG) within an Alzheimer's disease (AD) cohort. Our results demonstrate that MG-TCCA surpasses traditional TCCA and Sparse TCCA (STCCA) in identifying sex-specific cross-modality imaging correlations. This heightened performance of MG-TCCA provides valuable insights for the characterization of multimodal imaging biomarkers in AD.

张量典型相关分析(TCCA)是一种常用的统计方法,用于研究两组张量数据集之间的线性关联。然而,现有的 TCCA 模型未能充分解决现实世界中张量数据存在的异质性问题,例如从不同群体收集的脑成像数据,这些群体的特点是性别和种族等因素。因此,这些模型可能会产生有偏差的结果。为了克服这一限制,我们提出了一种称为多组 TCCA(MG-TCCA)的新方法,它可以对多个子组进行联合分析。我们的 MG-TCCA 方法结合了双重稀疏性结构和块坐标上升算法,能有效解决异质性问题,并利用不同组间的信息来识别一致的信号。这种新方法有助于量化共享结构和个体结构,降低数据维度,并实现可视化探索。为了对我们的方法进行经验验证,我们开展了一项研究,重点调查阿尔茨海默病(AD)队列中两种脑正电子发射断层扫描(PET)模式(AV-45 和 FDG)之间的相关性。我们的研究结果表明,MG-TCCA 在识别性别特异性跨模态成像相关性方面超过了传统 TCCA 和稀疏 TCCA(STCCA)。MG-TCCA 性能的提高为确定 AD 多模态成像生物标记物的特征提供了宝贵的见解。
{"title":"MG-TCCA: Tensor Canonical Correlation Analysis Across Multiple Groups.","authors":"Zhuoping Zhou, Boning Tong, Davoud Ataee Tarzanagh, Bojian Hou, Andrew J Saykin, Qi Long, Li Shen","doi":"10.1109/TCBB.2024.3471930","DOIUrl":"10.1109/TCBB.2024.3471930","url":null,"abstract":"<p><p>Tensor Canonical Correlation Analysis (TCCA) is a commonly employed statistical method utilized to examine linear associations between two sets of tensor datasets. However, the existing TCCA models fail to adequately address the heterogeneity present in real-world tensor data, such as brain imaging data collected from diverse groups characterized by factors like sex and race. Consequently, these models may yield biased outcomes. In order to surmount this constraint, we propose a novel approach called Multi-Group TCCA (MG-TCCA), which enables the joint analysis of multiple subgroups. By incorporating a dual sparsity structure and a block coordinate ascent algorithm, our MG-TCCA method effectively addresses heterogeneity and leverages information across different groups to identify consistent signals. This novel approach facilitates the quantification of shared and individual structures, reduces data dimensionality, and enables visual exploration. To empirically validate our approach, we conduct a study focused on investigating correlations between two brain positron emission tomography (PET) modalities (AV-45 and FDG) within an Alzheimer's disease (AD) cohort. Our results demonstrate that MG-TCCA surpasses traditional TCCA and Sparse TCCA (STCCA) in identifying sex-specific cross-modality imaging correlations. This heightened performance of MG-TCCA provides valuable insights for the characterization of multimodal imaging biomarkers in AD.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"1299-1310"},"PeriodicalIF":3.4,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11954983/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeoMS: Mass Spectrometry-Based Method for Uncovering Mutated MHC-I Neoantigens. NeoMS:基于质谱的发现变异 MHC-I 新抗原的方法。
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-03-01 DOI: 10.1109/TCBB.2024.3447746
Shaokai Wang, Ming Zhu, Bin Ma

Major Histocompatibility Complex (MHC) molecules play a critical role in the immune system by presenting peptides on the cell surface for recognition by T-cells. Tumor cells often produce MHC peptides with amino acid mutations, known as neoantigens, which evade T-cell recognition, leading to rapid tumor growth. In immunotherapies such as TCR-T and CAR-T, identifying these mutated MHC peptide sequences is crucial. Current mass spectrometry-based peptide identification methods primarily rely on database searching, which fails to detect mutated peptides not present in human databases. In this paper, we propose a novel workflow called NeoMS, designed to efficiently identify both non-mutated and mutated MHC-I peptides from mass spectrometry data. NeoMS utilizes a tagging algorithm to generate an expanded sequence database that includes potential mutated proteins for each sample. Furthermore, it employs a machine learning-based scoring function for each peptide-spectrum match (PSM) to maximize search sensitivity. Finally, a rigorous target-decoy approach is implemented to control the false discovery rates (FDR) of the peptides with and without mutations separately. Experimental results for regular peptides demonstrate that NeoMS outperforms four benchmark methods. For mutated peptides, NeoMS successfully identifies hundreds of high-quality mutated peptides in a melanoma-associated sample, with their validity confirmed by further studies.

主要组织相容性复合物(MHC)分子在免疫系统中发挥着关键作用,它在细胞表面呈现肽,供 T 细胞识别。肿瘤细胞通常会产生氨基酸突变的 MHC 多肽,即所谓的新抗原,它们会逃避 T 细胞的识别,导致肿瘤快速生长。在 TCR-T 和 CAR-T 等免疫疗法中,识别这些突变的 MHC 肽序列至关重要。目前基于质谱的多肽识别方法主要依赖于数据库搜索,但这种方法无法检测到人类数据库中不存在的突变多肽。在本文中,我们提出了一种名为 NeoMS 的新型工作流程,旨在从质谱数据中有效识别非突变和突变 MHC-I 肽。NeoMS 利用标记算法生成一个扩展序列数据库,其中包括每个样本的潜在突变蛋白质。此外,它还对每个肽谱匹配(PSM)采用基于机器学习的评分函数,以最大限度地提高搜索灵敏度。最后,它采用了一种严格的目标诱饵方法,分别控制有突变和无突变肽段的错误发现率(FDR)。针对常规多肽的实验结果表明,NeoMS优于四种基准方法。对于突变肽,NeoMS在黑色素瘤相关样本中成功鉴定出了数百个高质量的突变肽,其有效性得到了进一步研究的证实。
{"title":"NeoMS: Mass Spectrometry-Based Method for Uncovering Mutated MHC-I Neoantigens.","authors":"Shaokai Wang, Ming Zhu, Bin Ma","doi":"10.1109/TCBB.2024.3447746","DOIUrl":"10.1109/TCBB.2024.3447746","url":null,"abstract":"<p><p>Major Histocompatibility Complex (MHC) molecules play a critical role in the immune system by presenting peptides on the cell surface for recognition by T-cells. Tumor cells often produce MHC peptides with amino acid mutations, known as neoantigens, which evade T-cell recognition, leading to rapid tumor growth. In immunotherapies such as TCR-T and CAR-T, identifying these mutated MHC peptide sequences is crucial. Current mass spectrometry-based peptide identification methods primarily rely on database searching, which fails to detect mutated peptides not present in human databases. In this paper, we propose a novel workflow called NeoMS, designed to efficiently identify both non-mutated and mutated MHC-I peptides from mass spectrometry data. NeoMS utilizes a tagging algorithm to generate an expanded sequence database that includes potential mutated proteins for each sample. Furthermore, it employs a machine learning-based scoring function for each peptide-spectrum match (PSM) to maximize search sensitivity. Finally, a rigorous target-decoy approach is implemented to control the false discovery rates (FDR) of the peptides with and without mutations separately. Experimental results for regular peptides demonstrate that NeoMS outperforms four benchmark methods. For mutated peptides, NeoMS successfully identifies hundreds of high-quality mutated peptides in a melanoma-associated sample, with their validity confirmed by further studies.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"444-454"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142035750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AnglesRefine: Refinement of 3D Protein Structures Using Transformer Based on Torsion Angles. AnglesRefine:利用基于扭转角的变换器完善三维蛋白质结构
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-03-01 DOI: 10.1109/TCBB.2024.3422288
Lei Zhang, Junyong Zhu, Sheng Wang, Jie Hou, Dong Si, Renzhi Cao

The goal of protein structure refinement is to enhance the precision of predicted protein models, particularly at the residue level of the local structure. Existing refinement approaches primarily rely on physics, whereas molecular simulation methods are resource-intensive and time-consuming. In this study, we employ deep learning methods to extract structural constraints from protein structure residues to assist in protein structure refinement. We introduce a novel method, AnglesRefine, which focuses on a protein's secondary structure and employs transformer to refine various protein structure angles (psi, phi, omega, CA_C_N_angle, C_N_CA_angle, N_CA_C_angle), ultimately generating a superior protein model based on the refined angles. We evaluate our approach against other cutting-edge methods using the CASP11-14 and CASP15 datasets. Experimental outcomes indicate that our method generally surpasses other techniques on the CASP11-14 test dataset, while performing comparably or marginally better on the CASP15 test dataset. Our method consistently demonstrates the least likelihood of model quality degradation, e.g., the degradation percentage of our method is less than 10%, while other methods are about 50%. Furthermore, as our approach eliminates the need for conformational search and sampling, it significantly reduces computational time compared to existing refinement methods.

蛋白质结构精细化的目标是提高预测蛋白质模型的精度,尤其是在局部结构的残基水平上。现有的细化方法主要依赖物理学,而分子模拟方法则需要大量资源和时间。在本研究中,我们采用深度学习方法从蛋白质结构残基中提取结构约束,以协助蛋白质结构的细化。我们引入了一种新方法--AnglesRefine,该方法专注于蛋白质的二级结构,并利用变换器来细化各种蛋白质结构角度(psi、phi、ω、CA_C_N_angle、C_N_CA_angle、N_CA_C_angle),最终根据细化后的角度生成优秀的蛋白质模型。我们利用 CASP11-14 和 CASP15 数据集对我们的方法与其他先进方法进行了评估。实验结果表明,在 CASP11-14 测试数据集上,我们的方法总体上超越了其他技术,而在 CASP15 测试数据集上,我们的方法表现相当或略胜一筹。我们的方法始终是模型质量退化可能性最小的方法,例如,我们的方法的退化百分比低于 10%,而其他方法的退化百分比约为 50%。此外,由于我们的方法无需进行构象搜索和采样,因此与现有的完善方法相比,大大缩短了计算时间。
{"title":"AnglesRefine: Refinement of 3D Protein Structures Using Transformer Based on Torsion Angles.","authors":"Lei Zhang, Junyong Zhu, Sheng Wang, Jie Hou, Dong Si, Renzhi Cao","doi":"10.1109/TCBB.2024.3422288","DOIUrl":"10.1109/TCBB.2024.3422288","url":null,"abstract":"<p><p>The goal of protein structure refinement is to enhance the precision of predicted protein models, particularly at the residue level of the local structure. Existing refinement approaches primarily rely on physics, whereas molecular simulation methods are resource-intensive and time-consuming. In this study, we employ deep learning methods to extract structural constraints from protein structure residues to assist in protein structure refinement. We introduce a novel method, AnglesRefine, which focuses on a protein's secondary structure and employs transformer to refine various protein structure angles (psi, phi, omega, CA_C_N_angle, C_N_CA_angle, N_CA_C_angle), ultimately generating a superior protein model based on the refined angles. We evaluate our approach against other cutting-edge methods using the CASP11-14 and CASP15 datasets. Experimental outcomes indicate that our method generally surpasses other techniques on the CASP11-14 test dataset, while performing comparably or marginally better on the CASP15 test dataset. Our method consistently demonstrates the least likelihood of model quality degradation, e.g., the degradation percentage of our method is less than 10%, while other methods are about 50%. Furthermore, as our approach eliminates the need for conformational search and sampling, it significantly reduces computational time compared to existing refinement methods.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"397-408"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141497925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GrapHiC: An Integrative Graph Based Approach for Imputing Missing Hi-C Reads. GrapHiC:一种基于图的综合方法,用于估算缺失的 Hi-C 读数。
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-03-01 DOI: 10.1109/TCBB.2024.3477909
Ghulam Murtaza, Justin Wagner, Justin M Zook, Ritambhara Singh

Hi-C experiments allow researchers to study and understand the 3D genome organization and its regulatory function. Unfortunately, sequencing costs and technical constraints severely restrict access to high-quality Hi-C data for many cell types. Existing frameworks rely on a sparse Hi-C dataset or cheaper-to-acquire ChIP-seq data to predict Hi-C contact maps with high read coverage. However, these methods fail to generalize to sparse or cross-cell-type inputs because they do not account for the contributions of epigenomic features or the impact of the structural neighborhood in predicting Hi-C reads. We propose GrapHiC, which combines Hi-C and ChIP-seq in a graph representation, allowing more accurate embedding of structural and epigenomic features. Each node represents a binned genomic region, and we assign edge weights using the observed Hi-C reads. Additionally, we embed ChIP-seq and relative positional information as node attributes, allowing our representation to capture structural neighborhoods and the contributions of proteins and their modifications for predicting Hi-C reads. We show that GrapHiC generalizes better than the current state-of-the-art on cross-cell-type settings and sparse Hi-C inputs. Moreover, we can utilize our framework to impute Hi-C reads even when no Hi-C contact map is available, thus making high-quality Hi-C data accessible for many cell types.

Hi-C 实验使研究人员能够研究和了解三维基因组的组织及其调控功能。遗憾的是,测序成本和技术限制严重制约了对许多细胞类型的高质量 Hi-C 数据的获取。现有的框架依赖于稀疏的 Hi-C 数据集或获取成本更低的 ChIP-seq 数据来预测高读数覆盖率的 Hi-C 接触图。然而,这些方法无法推广到稀疏或跨细胞类型的输入,因为它们没有考虑表观基因组特征的贡献或结构邻域对预测 Hi-C 读数的影响。我们提出的 GrapHiC 方法将 Hi-C 和 ChIP-seq 结合到图表示法中,可以更准确地嵌入结构和表观基因组特征。每个节点代表一个二进制基因组区域,我们使用观察到的 Hi-C 读数分配边缘权重。此外,我们还将 ChIP-seq 和相对位置信息嵌入节点属性,从而使我们的表征能够捕捉结构邻域和蛋白质及其修饰对预测 Hi-C 读数的贡献。我们的研究表明,在交叉细胞类型设置和稀疏 Hi-C 输入上,GrapHiC 的通用性优于目前最先进的技术。此外,即使没有 Hi-C 接触图,我们也能利用我们的框架来推算 Hi-C 读数,从而使许多细胞类型都能获得高质量的 Hi-C 数据。可用性:https://github.com/rsinghlab/GrapHiC。
{"title":"GrapHiC: An Integrative Graph Based Approach for Imputing Missing Hi-C Reads.","authors":"Ghulam Murtaza, Justin Wagner, Justin M Zook, Ritambhara Singh","doi":"10.1109/TCBB.2024.3477909","DOIUrl":"10.1109/TCBB.2024.3477909","url":null,"abstract":"<p><p>Hi-C experiments allow researchers to study and understand the 3D genome organization and its regulatory function. Unfortunately, sequencing costs and technical constraints severely restrict access to high-quality Hi-C data for many cell types. Existing frameworks rely on a sparse Hi-C dataset or cheaper-to-acquire ChIP-seq data to predict Hi-C contact maps with high read coverage. However, these methods fail to generalize to sparse or cross-cell-type inputs because they do not account for the contributions of epigenomic features or the impact of the structural neighborhood in predicting Hi-C reads. We propose GrapHiC, which combines Hi-C and ChIP-seq in a graph representation, allowing more accurate embedding of structural and epigenomic features. Each node represents a binned genomic region, and we assign edge weights using the observed Hi-C reads. Additionally, we embed ChIP-seq and relative positional information as node attributes, allowing our representation to capture structural neighborhoods and the contributions of proteins and their modifications for predicting Hi-C reads. We show that GrapHiC generalizes better than the current state-of-the-art on cross-cell-type settings and sparse Hi-C inputs. Moreover, we can utilize our framework to impute Hi-C reads even when no Hi-C contact map is available, thus making high-quality Hi-C data accessible for many cell types.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"409-419"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12034241/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142406376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development and Validation of a Comprehensive Analysis of the Competing Endogenous circRNA/miRNA/mRNA Network for the Identification of Immune-Related Targets in Esophageal Squamous Cell Carcinoma. 开发并验证用于识别食管鳞状细胞癌免疫相关靶点的竞争性内源性 circRNA/miRNA/mRNA 网络综合分析方法
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-03-01 DOI: 10.1109/TCBB.2024.3443854
Chu-Ting Yu, Bo Tian, Qian-Qian Meng, Zhe-Ran Chen, Ya-Nan Pang, Xun Zhang, Yan Bian, Si-Wei Zhou, Mei-Juan Hao, Ye Gao, Lei Xin, Han Lin, Wei Wang, Luo-Wei Wang

Immunotherapy for esophageal squamous cell carcinoma (ESCC) exhibits notable variability in efficacy. Concurrently, recent research emphasizes circRNAs' impact on the ESCC tumor microenvironment. To further explore the relationship, we leveraged circRNA, microRNA, and mRNA sequence datasets to construct a comprehensive immune-related circRNA-microRNA-mRNA network, revealing competing endogenous RNA (ceRNA) roles in ESCC. The network comprises 16 circular RNAs, 13 microRNAs, and 1,560 mRNAs. Weighted gene co-expression analysis identified immune-related modules, notably cancer-associated fibroblast (CAF) and myeloid-derived suppressor cell modules, correlating significantly with immune and stemness scores. Among them, the CAF module plays a crucial role in extracellular matrix function and effectively discriminates ESCC patients. Four hub collagen family genes within CAF correlated robustly with CAF, macrophage infiltration, and T-cell exclusion. In-house sequencing and RT-qPCR validated their elevated expression. We also identified CAF module-targeting drugs as potential ESCC treatments. In summary, we established an immune-related circRNA-miRNA-mRNA network that not only illuminates ceRNA functionality but also highlights circRNAs' involvement in the CAF through collagen gene targeting. These findings hold promise to predict ESCC immune landscapes and therapy responses, ultimately aiding in more personalized and effective clinical decision-making.

食管鳞状细胞癌(ESCC)的免疫疗法在疗效上表现出明显的差异性。同时,最近的研究强调了 circRNA 对 ESCC 肿瘤微环境的影响。为了进一步探索这种关系,我们利用循环RNA、microRNA和mRNA序列数据集构建了一个全面的免疫相关循环RNA-microRNA-mRNA网络,揭示了内源性RNA(ceRNA)在ESCC中的竞争性作用。该网络包括16个环状RNA、13个microRNA和1,560个mRNA。加权基因共表达分析确定了免疫相关模块,特别是癌症相关成纤维细胞(CAF)和髓源抑制细胞模块,它们与免疫和干性评分显著相关。其中,CAF 模块在细胞外基质功能中起着关键作用,能有效区分 ESCC 患者。CAF中的四个枢纽胶原家族基因与CAF、巨噬细胞浸润和T细胞排斥密切相关。内部测序和 RT-qPCR 验证了它们的表达升高。我们还发现了可用于治疗 ESCC 的 CAF 模块靶向药物。总之,我们建立了一个与免疫相关的 circRNA-miRNA-mRNA 网络,它不仅阐明了 ceRNA 的功能,还强调了 circRNA 通过胶原基因靶向参与 CAF。这些发现有望预测 ESCC 的免疫景观和治疗反应,最终帮助做出更个性化、更有效的临床决策。
{"title":"Development and Validation of a Comprehensive Analysis of the Competing Endogenous circRNA/miRNA/mRNA Network for the Identification of Immune-Related Targets in Esophageal Squamous Cell Carcinoma.","authors":"Chu-Ting Yu, Bo Tian, Qian-Qian Meng, Zhe-Ran Chen, Ya-Nan Pang, Xun Zhang, Yan Bian, Si-Wei Zhou, Mei-Juan Hao, Ye Gao, Lei Xin, Han Lin, Wei Wang, Luo-Wei Wang","doi":"10.1109/TCBB.2024.3443854","DOIUrl":"10.1109/TCBB.2024.3443854","url":null,"abstract":"<p><p>Immunotherapy for esophageal squamous cell carcinoma (ESCC) exhibits notable variability in efficacy. Concurrently, recent research emphasizes circRNAs' impact on the ESCC tumor microenvironment. To further explore the relationship, we leveraged circRNA, microRNA, and mRNA sequence datasets to construct a comprehensive immune-related circRNA-microRNA-mRNA network, revealing competing endogenous RNA (ceRNA) roles in ESCC. The network comprises 16 circular RNAs, 13 microRNAs, and 1,560 mRNAs. Weighted gene co-expression analysis identified immune-related modules, notably cancer-associated fibroblast (CAF) and myeloid-derived suppressor cell modules, correlating significantly with immune and stemness scores. Among them, the CAF module plays a crucial role in extracellular matrix function and effectively discriminates ESCC patients. Four hub collagen family genes within CAF correlated robustly with CAF, macrophage infiltration, and T-cell exclusion. In-house sequencing and RT-qPCR validated their elevated expression. We also identified CAF module-targeting drugs as potential ESCC treatments. In summary, we established an immune-related circRNA-miRNA-mRNA network that not only illuminates ceRNA functionality but also highlights circRNAs' involvement in the CAF through collagen gene targeting. These findings hold promise to predict ESCC immune landscapes and therapy responses, ultimately aiding in more personalized and effective clinical decision-making.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"481-492"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142106989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partition Based Algorithms for Rearrangement Distances With Flexible Intergenic Regions. 基于分区的灵活基因间重排距离算法
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-03-01 DOI: 10.1109/TCBB.2024.3467033
Gabriel Siqueira, Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Geraldine Jean, Guillaume Fertin, Zanoni Dias

Genome Rearrangement distance problems are used in Computational Biology to estimate the evolutionary distance between genomes. These problems consist of minimizing the number of rearrangement events necessary to transform one genome into another. Two commonly used rearrangement events are reversal and transposition. The first studied problems ignored nucleotides outside genes (called intergenic regions), or assumed that genomes have a single copy of each gene. Recent works made advancements in more general problems considering the number of nucleotides in intergenic regions, and replicated genes. Nevertheless, genomes tend to have wildly different quantities of nucleotides on their intergenic regions, which poses a problem when comparing these regions exactly. To overcome this limitation, our work considers some flexibility when matching intergenic regions that do not have the same number of nucleotides. We propose new problems seeking the minimum number of reversals, or reversals and transpositions, necessary to transform one genome into another, while considering flexible intergenic region information. We show approximations for these problems by exploring their relationship with the Signed Minimum Common Flexible Intergenic String Partition problem. We also present different heuristics for the partition problem, and conduct experimental tests on simulated genomes to assess the performance of our algorithms.

基因组重排距离问题在计算生物学中用于估算基因组之间的进化距离。这些问题包括将一个基因组转化为另一个基因组所需的重排事件数量最小化。两种常用的重排事件是反转和转座。最初研究的问题忽略了基因外的核苷酸(称为基因间区),或假设基因组中每个基因只有一个拷贝。最近的研究在考虑基因间区核苷酸数量和复制基因等更一般的问题上取得了进展。然而,基因组在基因间区的核苷酸数量往往相差很大,这就给精确比较这些区域带来了问题。为了克服这一局限,我们的研究在匹配核苷酸数量不一致的基因间区时考虑了一定的灵活性。我们提出了新的问题,即在考虑灵活的基因间区域信息的同时,寻求将一个基因组转化为另一个基因组所需的最小反转或反转和转座次数。我们通过探讨这些问题与符号最小通用灵活基因间字符串分割问题的关系,展示了这些问题的近似值。我们还针对分割问题提出了不同的启发式算法,并在模拟基因组上进行了实验测试,以评估我们算法的性能。
{"title":"Partition Based Algorithms for Rearrangement Distances With Flexible Intergenic Regions.","authors":"Gabriel Siqueira, Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Geraldine Jean, Guillaume Fertin, Zanoni Dias","doi":"10.1109/TCBB.2024.3467033","DOIUrl":"10.1109/TCBB.2024.3467033","url":null,"abstract":"<p><p>Genome Rearrangement distance problems are used in Computational Biology to estimate the evolutionary distance between genomes. These problems consist of minimizing the number of rearrangement events necessary to transform one genome into another. Two commonly used rearrangement events are reversal and transposition. The first studied problems ignored nucleotides outside genes (called intergenic regions), or assumed that genomes have a single copy of each gene. Recent works made advancements in more general problems considering the number of nucleotides in intergenic regions, and replicated genes. Nevertheless, genomes tend to have wildly different quantities of nucleotides on their intergenic regions, which poses a problem when comparing these regions exactly. To overcome this limitation, our work considers some flexibility when matching intergenic regions that do not have the same number of nucleotides. We propose new problems seeking the minimum number of reversals, or reversals and transpositions, necessary to transform one genome into another, while considering flexible intergenic region information. We show approximations for these problems by exploring their relationship with the Signed Minimum Common Flexible Intergenic String Partition problem. We also present different heuristics for the partition problem, and conduct experimental tests on simulated genomes to assess the performance of our algorithms.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"455-468"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ESGC-MDA: Identifying miRNA-Disease Associations Using Enhanced Simple Graph Convolutional Networks. ESGC-MDA:利用增强型简单图卷积网络识别 miRNA 与疾病的关联。
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-03-01 DOI: 10.1109/TCBB.2024.3486911
Xuehua Bi, Chunyang Jiang, Cheng Yan, Kai Zhao, Linlin Zhang, Jianxin Wang

MiRNAs play an important role in the occurrence and development of human disease. Identifying potential miRNA-disease associations is valuable for disease diagnosis and treatment. Therefore, it is urgent to develop efficient computational methods for predicting potential miRNA-disease associations to reduce the cost and time associated with biological wet experiments. In addition, high-quality feature representation remains a challenge for miRNA-disease association prediction using graph neural network methods. In this paper, we propose a method named ESGC-MDA, which employs an enhanced Simple Graph Convolution Network to identify miRNA-disease associations. We first construct a bipartite attributed graph for miRNAs and diseases by computing multi-source similarity. Then, we enhance the feature representations of miRNA and disease nodes by applying two strategies in the simple convolution network, which include randomly dropping messages during propagation to ensure the model learns more reliable feature representations, and using adaptive weighting to aggregate features from different layers. Finally, we calculate the prediction scores of miRNA-disease pairs by using a fully connected neural network decoder. We conduct 5-fold cross-validation and 10-fold cross-validation on HDMM v2.0 and HMDD v3.2, respectively, and ESGC-MDA achieves better performance than state-of-the-art baseline methods. The case studies for cardiovascular disease, lung cancer and colon cancer also further confirm the effectiveness of ESGC-MDA.

miRNA 在人类疾病的发生和发展中发挥着重要作用。识别潜在的 miRNA 与疾病的关联对疾病诊断和治疗非常有价值。因此,当务之急是开发预测潜在 miRNA 与疾病关联的高效计算方法,以减少生物湿实验的成本和时间。此外,高质量的特征表示仍然是使用图神经网络方法预测 miRNA-疾病关联的一个挑战。本文提出了一种名为 ESGC-MDA 的方法,它采用增强型简单图卷积网络来识别 miRNA 与疾病的关联。我们首先通过计算多源相似性为 miRNA 和疾病构建一个双方属性图。然后,我们通过在简单卷积网络中应用两种策略来增强 miRNA 和疾病节点的特征表示,包括在传播过程中随机丢弃信息以确保模型学习到更可靠的特征表示,以及使用自适应加权来聚合不同层的特征。最后,我们使用全连接神经网络解码器计算 miRNA 疾病对的预测得分。我们分别在 HDMM v2.0 和 HMDD v3.2 上进行了 5 倍交叉验证和 10 倍交叉验证,ESGC-MDA 比最先进的基线方法取得了更好的性能。对心血管疾病、肺癌和结肠癌的案例研究也进一步证实了 ESGC-MDA 的有效性。源代码见 https://github.com/bixuehua/ESGC-MDA。
{"title":"ESGC-MDA: Identifying miRNA-Disease Associations Using Enhanced Simple Graph Convolutional Networks.","authors":"Xuehua Bi, Chunyang Jiang, Cheng Yan, Kai Zhao, Linlin Zhang, Jianxin Wang","doi":"10.1109/TCBB.2024.3486911","DOIUrl":"10.1109/TCBB.2024.3486911","url":null,"abstract":"<p><p>MiRNAs play an important role in the occurrence and development of human disease. Identifying potential miRNA-disease associations is valuable for disease diagnosis and treatment. Therefore, it is urgent to develop efficient computational methods for predicting potential miRNA-disease associations to reduce the cost and time associated with biological wet experiments. In addition, high-quality feature representation remains a challenge for miRNA-disease association prediction using graph neural network methods. In this paper, we propose a method named ESGC-MDA, which employs an enhanced Simple Graph Convolution Network to identify miRNA-disease associations. We first construct a bipartite attributed graph for miRNAs and diseases by computing multi-source similarity. Then, we enhance the feature representations of miRNA and disease nodes by applying two strategies in the simple convolution network, which include randomly dropping messages during propagation to ensure the model learns more reliable feature representations, and using adaptive weighting to aggregate features from different layers. Finally, we calculate the prediction scores of miRNA-disease pairs by using a fully connected neural network decoder. We conduct 5-fold cross-validation and 10-fold cross-validation on HDMM v2.0 and HMDD v3.2, respectively, and ESGC-MDA achieves better performance than state-of-the-art baseline methods. The case studies for cardiovascular disease, lung cancer and colon cancer also further confirm the effectiveness of ESGC-MDA.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"422-432"},"PeriodicalIF":3.4,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142521814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-Based Computational Methods in Early Drug Discovery and Post Market Drug Assessment: A Survey. 基于人工智能的计算方法在早期药物发现和上市后药物评估中的应用:调查。
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-01 DOI: 10.1109/TCBB.2024.3492708
Flora Rajaei, Cristian Minoccheri, Emily Wittrup, Richard C Wilson, Brian D Athey, Gilbert S Omenn, Kayvan Najarian

Over the past few years, artificial intelligence (AI) has emerged as a transformative force in drug discovery and development (DDD), revolutionizing many aspects of the process. This survey provides a comprehensive review of recent advancements in AI applications within early drug discovery and post-market drug assessment. It addresses the identification and prioritization of new therapeutic targets, prediction of drug-target interaction (DTI), design of novel drug-like molecules, and assessment of the clinical efficacy of new medications. By integrating AI technologies, pharmaceutical companies can accelerate the discovery of new treatments, enhance the precision of drug development, and bring more effective therapies to market. This shift represents a significant move towards more efficient and cost-effective methodologies in the DDD landscape.

在过去几年中,人工智能(AI)已成为药物发现与开发(DDD)领域的一股变革性力量,彻底改变了药物发现与开发过程的许多方面。本调查全面回顾了人工智能在早期药物发现和上市后药物评估中应用的最新进展。它涉及新治疗靶点的识别和优先排序、药物-靶点相互作用(DTI)预测、新型类药物分子设计以及新药临床疗效评估。通过整合人工智能技术,制药公司可以加快新疗法的发现,提高药物开发的精准度,并将更有效的疗法推向市场。这一转变标志着 DDD 领域正朝着更高效、更具成本效益的方法迈出重要一步。
{"title":"AI-Based Computational Methods in Early Drug Discovery and Post Market Drug Assessment: A Survey.","authors":"Flora Rajaei, Cristian Minoccheri, Emily Wittrup, Richard C Wilson, Brian D Athey, Gilbert S Omenn, Kayvan Najarian","doi":"10.1109/TCBB.2024.3492708","DOIUrl":"10.1109/TCBB.2024.3492708","url":null,"abstract":"<p><p>Over the past few years, artificial intelligence (AI) has emerged as a transformative force in drug discovery and development (DDD), revolutionizing many aspects of the process. This survey provides a comprehensive review of recent advancements in AI applications within early drug discovery and post-market drug assessment. It addresses the identification and prioritization of new therapeutic targets, prediction of drug-target interaction (DTI), design of novel drug-like molecules, and assessment of the clinical efficacy of new medications. By integrating AI technologies, pharmaceutical companies can accelerate the discovery of new treatments, enhance the precision of drug development, and bring more effective therapies to market. This shift represents a significant move towards more efficient and cost-effective methodologies in the DDD landscape.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"97-115"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395280/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142590779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Boolean Asymmetric Relationships With a Loop Counting Technique and its Implications for Analyzing Heterogeneity Within Gene Expression Datasets. 利用循环计数技术检测布尔不对称关系及其对分析基因表达数据集异质性的影响
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-01 DOI: 10.1109/TCBB.2024.3487434
Haosheng Zhou, Wei Lin, Sergio R Labra, Stuart A Lipton, Jeremy A Elman, Nicholas J Schork, Aaditya V Rangan

Many traditional methods for analyzing gene-gene relationships focus on positive and negative correlations, both of which are a kind of 'symmetric' relationship. Biclustering is one such technique that typically searches for subsets of genes exhibiting correlated expression among a subset of samples. However, genes can also exhibit 'asymmetric' relationships, such as 'if-then' relationships used in boolean circuits. In this paper we develop a very general method that can be used to detect biclusters within gene-expression data that involve subsets of genes which are enriched for these 'boolean-asymmetric' relationships (BARs). These BAR-biclusters can correspond to heterogeneity that is driven by asymmetric gene-gene interactions, e.g., reflecting regulatory effects of one gene on another, rather than more standard symmetric interactions. Unlike typical approaches that search for BARs across the entire population, BAR-biclusters can detect asymmetric interactions that only occur among a subset of samples. We apply our method to a single-cell RNA-sequencing data-set, demonstrating that the statistically-significant BAR-biclusters indeed contain additional information not present within the more traditional 'boolean-symmetric'-biclusters. For example, the BAR-biclusters involve different subsets of cells, and highlight different gene-pathways within the data-set. Moreover, by combining the boolean-asymmetric- and boolean-symmetric-signals, one can build linear classifiers which outperform those built using only traditional boolean-symmetric signals.

许多分析基因-基因关系的传统方法都侧重于正相关和负相关,这两种关系都是一种 "对称 "关系。双聚类就是这样一种技术,它通常在样本子集中搜索表现出相关表达的基因子集。然而,基因也可以表现出 "非对称 "关系,例如布尔电路中使用的 "如果-那么 "关系。在本文中,我们开发了一种非常通用的方法,可用于检测基因表达数据中的双簇,这些数据涉及富集了这些 "布尔-非对称 "关系(BAR)的基因子集。这些 "布尔-非对称 "关系双集群可能对应于由非对称基因-基因相互作用驱动的异质性,例如,反映一个基因对另一个基因的调控作用,而不是更标准的对称相互作用。与在整个群体中搜索 BAR 的典型方法不同,BAR-双簇可以检测到只发生在部分样本中的非对称相互作用。我们将这一方法应用于单细胞 RNA 序列数据集,结果表明,在统计意义上显著的 BAR 双簇确实包含了更传统的 "布尔-对称 "双簇所不具备的额外信息。例如,BAR 双簇涉及不同的细胞子集,并突出了数据集中不同的基因通路。此外,通过结合布尔-非对称信号和布尔-对称信号,我们可以建立线性分类器,其效果优于仅使用传统布尔-对称信号建立的分类器。
{"title":"Detecting Boolean Asymmetric Relationships With a Loop Counting Technique and its Implications for Analyzing Heterogeneity Within Gene Expression Datasets.","authors":"Haosheng Zhou, Wei Lin, Sergio R Labra, Stuart A Lipton, Jeremy A Elman, Nicholas J Schork, Aaditya V Rangan","doi":"10.1109/TCBB.2024.3487434","DOIUrl":"10.1109/TCBB.2024.3487434","url":null,"abstract":"<p><p>Many traditional methods for analyzing gene-gene relationships focus on positive and negative correlations, both of which are a kind of 'symmetric' relationship. Biclustering is one such technique that typically searches for subsets of genes exhibiting correlated expression among a subset of samples. However, genes can also exhibit 'asymmetric' relationships, such as 'if-then' relationships used in boolean circuits. In this paper we develop a very general method that can be used to detect biclusters within gene-expression data that involve subsets of genes which are enriched for these 'boolean-asymmetric' relationships (BARs). These BAR-biclusters can correspond to heterogeneity that is driven by asymmetric gene-gene interactions, e.g., reflecting regulatory effects of one gene on another, rather than more standard symmetric interactions. Unlike typical approaches that search for BARs across the entire population, BAR-biclusters can detect asymmetric interactions that only occur among a subset of samples. We apply our method to a single-cell RNA-sequencing data-set, demonstrating that the statistically-significant BAR-biclusters indeed contain additional information not present within the more traditional 'boolean-symmetric'-biclusters. For example, the BAR-biclusters involve different subsets of cells, and highlight different gene-pathways within the data-set. Moreover, by combining the boolean-asymmetric- and boolean-symmetric-signals, one can build linear classifiers which outperform those built using only traditional boolean-symmetric signals.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"27-38"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12037869/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142545201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance Comparison Between Deep Neural Network and Machine Learning Based Classifiers for Huntington Disease Prediction From Human DNA Sequence. 基于深度神经网络和机器学习的分类器在从人类 DNA 序列预测亨廷顿病方面的性能比较。
IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2025-01-01 DOI: 10.1109/TCBB.2024.3493203
C Vishnuppriya, G Tamilpavai

Huntington Disease (HD) is a type of neurodegenerative disorder which causes problems like psychiatric disturbances, movement problem, weight loss and problem in sleep. It needs to be addressed in earlier stage of human life. Nowadays Deep Learning (DL) based system could help physicians provide second opinion in treating patient's disease. In this work, human Deoxyribo Nucleic Acid (DNA) sequence is analyzed using Deep Neural Network (DNN) algorithm to predict the HD disease. The main objective of this work is to identify whether the human DNA is affected by HD or not. Human DNA sequences are collected from National Center for Biotechnology Information (NCBI) and synthetic human DNA data are also constructed for process. Then numerical conversion of human DNA sequence data is done by Chaos Game Representation (CGR) method. After that, numerical values of DNA data are used for feature extraction. Mean, median, standard deviation, entropy, contrast, correlation, energy and homogeneity are extracted. Additionally, the following features such as counts of adenine, thymine, guanine and cytosine are extracted from the DNA sequence data itself. The extracted features are used as input to the DNN classifier and other machine learning based classifiers such as NN (Neural Network), Support Vector Machine (SVM), Random Forest (RF) and Classification Tree with Forward Pruning (CTWFP). Six performance measures are used such as Accuracy, Sensitivity, Specificity, Precision, F1 score and Mathew Correlation Co-efficient (MCC). The study concludes DNN, NN, SVM, RF achieve 100% accuracy and CTWFP achieves accuracy of 87%.

亨廷顿舞蹈症(Huntington Disease,HD)是一种神经退行性疾病,会导致精神障碍、运动障碍、体重减轻和睡眠障碍等问题。这种疾病需要在人类生命的早期阶段加以解决。如今,基于深度学习(DL)的系统可以帮助医生在治疗患者疾病时提供第二意见。在这项工作中,使用深度神经网络(DNN)算法对人类脱氧核糖核酸(DNA)序列进行分析,以预测人类乳腺疾病。这项工作的主要目的是确定人类 DNA 是否受 HD 影响。从美国国家生物技术信息中心(NCBI)收集了人类 DNA 序列,并构建了合成人类 DNA 数据。然后通过混沌博弈表示法(CGR)对人类 DNA 序列数据进行数值转换。之后,DNA 数据的数值被用于特征提取。提取出平均值、中位数、标准偏差、熵、对比度、相关性、能量和同质性。此外,还从 DNA 序列数据中提取了腺嘌呤、胸腺嘧啶、鸟嘌呤和胞嘧啶的计数等特征。提取的特征被用作 DNN 分类器和其他基于机器学习的分类器的输入,如 NN(神经网络)、支持向量机(SVM)、随机森林(RF)和前向剪枝分类树(CTWFP)。使用了六种性能指标,如准确度、灵敏度、特异度、精确度、F1 分数和马修相关系数 (MCC)。研究得出结论,DNN、NN、SVM、RF 的准确率达到 100%,CTWFP 的准确率达到 87%。
{"title":"Performance Comparison Between Deep Neural Network and Machine Learning Based Classifiers for Huntington Disease Prediction From Human DNA Sequence.","authors":"C Vishnuppriya, G Tamilpavai","doi":"10.1109/TCBB.2024.3493203","DOIUrl":"10.1109/TCBB.2024.3493203","url":null,"abstract":"<p><p>Huntington Disease (HD) is a type of neurodegenerative disorder which causes problems like psychiatric disturbances, movement problem, weight loss and problem in sleep. It needs to be addressed in earlier stage of human life. Nowadays Deep Learning (DL) based system could help physicians provide second opinion in treating patient's disease. In this work, human Deoxyribo Nucleic Acid (DNA) sequence is analyzed using Deep Neural Network (DNN) algorithm to predict the HD disease. The main objective of this work is to identify whether the human DNA is affected by HD or not. Human DNA sequences are collected from National Center for Biotechnology Information (NCBI) and synthetic human DNA data are also constructed for process. Then numerical conversion of human DNA sequence data is done by Chaos Game Representation (CGR) method. After that, numerical values of DNA data are used for feature extraction. Mean, median, standard deviation, entropy, contrast, correlation, energy and homogeneity are extracted. Additionally, the following features such as counts of adenine, thymine, guanine and cytosine are extracted from the DNA sequence data itself. The extracted features are used as input to the DNN classifier and other machine learning based classifiers such as NN (Neural Network), Support Vector Machine (SVM), Random Forest (RF) and Classification Tree with Forward Pruning (CT<sub>WFP</sub>). Six performance measures are used such as Accuracy, Sensitivity, Specificity, Precision, F1 score and Mathew Correlation Co-efficient (MCC). The study concludes DNN, NN, SVM, RF achieve 100% accuracy and CT<sub>WFP</sub> achieves accuracy of 87%.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"52-63"},"PeriodicalIF":3.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142604214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE/ACM Transactions on Computational Biology and Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1