An efficient and accurate approach to identify similarities between biological sequences using pair amino acid composition and physicochemical properties
L. Hooshyar, M. B. Hernández-Jiménez, A. Khastan, M. Vasighi
{"title":"An efficient and accurate approach to identify similarities between biological sequences using pair amino acid composition and physicochemical properties","authors":"L. Hooshyar, M. B. Hernández-Jiménez, A. Khastan, M. Vasighi","doi":"10.1007/s00500-024-09834-5","DOIUrl":null,"url":null,"abstract":"<p>Our study presents a novel method for analyzing biological sequences, utilizing Pairwise Amino Acid Composition and Amino Acid physicochemical properties to construct a feature vector. This step is pivotal, as by utilizing pairwise analysis, we consider the order of amino acids, thereby capturing subtle nuances in sequence structure. Simultaneously, by incorporating physicochemical properties, we ensure that the hidden information encoded within amino acids is not overlooked. Furthermore, by considering both the frequency and order of amino acid pairs, our method mitigates the risk of erroneously clustering different sequences as similar, a common pitfall in older methods. Our approach generates a concise 48-member vector, accommodating sequences of arbitrary lengths efficiently. This compact representation retains essential amino acid-specific features, enhancing the accuracy of sequence analysis. Unlike traditional approaches, our algorithm avoids the introduction of sparse vectors, ensuring the retention of important information. Additionally, we introduce fuzzy equivalence relationships to address uncertainty in the clustering process, enabling a more nuanced and flexible clustering approach that captures the inherent ambiguity in biological data. Despite these advancements, our algorithm is presented in a straightforward manner, ensuring accessibility to researchers with varying levels of computational expertise. This enhancement improves the robustness and interpretability of our method, providing researchers with a comprehensive and user-friendly tool for biological sequence analysis.</p>","PeriodicalId":22039,"journal":{"name":"Soft Computing","volume":"46 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00500-024-09834-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Our study presents a novel method for analyzing biological sequences, utilizing Pairwise Amino Acid Composition and Amino Acid physicochemical properties to construct a feature vector. This step is pivotal, as by utilizing pairwise analysis, we consider the order of amino acids, thereby capturing subtle nuances in sequence structure. Simultaneously, by incorporating physicochemical properties, we ensure that the hidden information encoded within amino acids is not overlooked. Furthermore, by considering both the frequency and order of amino acid pairs, our method mitigates the risk of erroneously clustering different sequences as similar, a common pitfall in older methods. Our approach generates a concise 48-member vector, accommodating sequences of arbitrary lengths efficiently. This compact representation retains essential amino acid-specific features, enhancing the accuracy of sequence analysis. Unlike traditional approaches, our algorithm avoids the introduction of sparse vectors, ensuring the retention of important information. Additionally, we introduce fuzzy equivalence relationships to address uncertainty in the clustering process, enabling a more nuanced and flexible clustering approach that captures the inherent ambiguity in biological data. Despite these advancements, our algorithm is presented in a straightforward manner, ensuring accessibility to researchers with varying levels of computational expertise. This enhancement improves the robustness and interpretability of our method, providing researchers with a comprehensive and user-friendly tool for biological sequence analysis.
期刊介绍:
Soft Computing is dedicated to system solutions based on soft computing techniques. It provides rapid dissemination of important results in soft computing technologies, a fusion of research in evolutionary algorithms and genetic programming, neural science and neural net systems, fuzzy set theory and fuzzy systems, and chaos theory and chaotic systems.
Soft Computing encourages the integration of soft computing techniques and tools into both everyday and advanced applications. By linking the ideas and techniques of soft computing with other disciplines, the journal serves as a unifying platform that fosters comparisons, extensions, and new applications. As a result, the journal is an international forum for all scientists and engineers engaged in research and development in this fast growing field.