{"title":"Elliptic geometry-based kernel matrix for improved biological sequence classification","authors":"","doi":"10.1016/j.knosys.2024.112479","DOIUrl":null,"url":null,"abstract":"<div><p>Protein sequence classification plays a pivotal role in bioinformatics as it enables the comprehension of protein functions and their involvement in diverse biological processes. While numerous machine learning models have been proposed to tackle this challenge, traditional approaches face limitations in capturing the intricate relationships and hierarchical structures inherent in genomic sequences. These limitations stem from operating within high-dimensional non-Euclidean spaces. To address this issue, we introduce the application of the elliptic geometry-based approach for protein sequence classification. First, we transform the problem in elliptic geometry and integrate it with the Gaussian kernel to map the problem into the Mercer kernel. The Gaussian-Elliptic approach allows for the implicit mapping of data into a higher-dimensional feature space, enabling the capture of complex nonlinear relationships. This feature becomes particularly advantageous when dealing with hierarchical or tree-like structures commonly encountered in biological sequences. Experimental results highlight the effectiveness of the proposed model in protein sequence classification, showcasing the advantages of utilizing elliptic geometry in bioinformatics analyses. It outperforms state-of-the-art methods by achieving 76% and 84% accuracies for DNA and Protein datasets, respectively. Furthermore, we provide theoretical justifications for the proposed model. This study contributes to the burgeoning field of geometric deep learning, offering insights into the potential applications of elliptic representations in the analysis of biological data.</p></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":null,"pages":null},"PeriodicalIF":7.2000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124011134","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Protein sequence classification plays a pivotal role in bioinformatics as it enables the comprehension of protein functions and their involvement in diverse biological processes. While numerous machine learning models have been proposed to tackle this challenge, traditional approaches face limitations in capturing the intricate relationships and hierarchical structures inherent in genomic sequences. These limitations stem from operating within high-dimensional non-Euclidean spaces. To address this issue, we introduce the application of the elliptic geometry-based approach for protein sequence classification. First, we transform the problem in elliptic geometry and integrate it with the Gaussian kernel to map the problem into the Mercer kernel. The Gaussian-Elliptic approach allows for the implicit mapping of data into a higher-dimensional feature space, enabling the capture of complex nonlinear relationships. This feature becomes particularly advantageous when dealing with hierarchical or tree-like structures commonly encountered in biological sequences. Experimental results highlight the effectiveness of the proposed model in protein sequence classification, showcasing the advantages of utilizing elliptic geometry in bioinformatics analyses. It outperforms state-of-the-art methods by achieving 76% and 84% accuracies for DNA and Protein datasets, respectively. Furthermore, we provide theoretical justifications for the proposed model. This study contributes to the burgeoning field of geometric deep learning, offering insights into the potential applications of elliptic representations in the analysis of biological data.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.