An efficient and accurate approach to identify similarities between biological sequences using pair amino acid composition and physicochemical properties

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Soft Computing Pub Date : 2024-07-31 DOI:10.1007/s00500-024-09834-5
L. Hooshyar, M. B. Hernández-Jiménez, A. Khastan, M. Vasighi
{"title":"An efficient and accurate approach to identify similarities between biological sequences using pair amino acid composition and physicochemical properties","authors":"L. Hooshyar, M. B. Hernández-Jiménez, A. Khastan, M. Vasighi","doi":"10.1007/s00500-024-09834-5","DOIUrl":null,"url":null,"abstract":"<p>Our study presents a novel method for analyzing biological sequences, utilizing Pairwise Amino Acid Composition and Amino Acid physicochemical properties to construct a feature vector. This step is pivotal, as by utilizing pairwise analysis, we consider the order of amino acids, thereby capturing subtle nuances in sequence structure. Simultaneously, by incorporating physicochemical properties, we ensure that the hidden information encoded within amino acids is not overlooked. Furthermore, by considering both the frequency and order of amino acid pairs, our method mitigates the risk of erroneously clustering different sequences as similar, a common pitfall in older methods. Our approach generates a concise 48-member vector, accommodating sequences of arbitrary lengths efficiently. This compact representation retains essential amino acid-specific features, enhancing the accuracy of sequence analysis. Unlike traditional approaches, our algorithm avoids the introduction of sparse vectors, ensuring the retention of important information. Additionally, we introduce fuzzy equivalence relationships to address uncertainty in the clustering process, enabling a more nuanced and flexible clustering approach that captures the inherent ambiguity in biological data. Despite these advancements, our algorithm is presented in a straightforward manner, ensuring accessibility to researchers with varying levels of computational expertise. This enhancement improves the robustness and interpretability of our method, providing researchers with a comprehensive and user-friendly tool for biological sequence analysis.</p>","PeriodicalId":22039,"journal":{"name":"Soft Computing","volume":"46 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00500-024-09834-5","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Our study presents a novel method for analyzing biological sequences, utilizing Pairwise Amino Acid Composition and Amino Acid physicochemical properties to construct a feature vector. This step is pivotal, as by utilizing pairwise analysis, we consider the order of amino acids, thereby capturing subtle nuances in sequence structure. Simultaneously, by incorporating physicochemical properties, we ensure that the hidden information encoded within amino acids is not overlooked. Furthermore, by considering both the frequency and order of amino acid pairs, our method mitigates the risk of erroneously clustering different sequences as similar, a common pitfall in older methods. Our approach generates a concise 48-member vector, accommodating sequences of arbitrary lengths efficiently. This compact representation retains essential amino acid-specific features, enhancing the accuracy of sequence analysis. Unlike traditional approaches, our algorithm avoids the introduction of sparse vectors, ensuring the retention of important information. Additionally, we introduce fuzzy equivalence relationships to address uncertainty in the clustering process, enabling a more nuanced and flexible clustering approach that captures the inherent ambiguity in biological data. Despite these advancements, our algorithm is presented in a straightforward manner, ensuring accessibility to researchers with varying levels of computational expertise. This enhancement improves the robustness and interpretability of our method, providing researchers with a comprehensive and user-friendly tool for biological sequence analysis.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用成对氨基酸组成和理化特性识别生物序列相似性的高效准确方法
我们的研究提出了一种分析生物序列的新方法,利用成对氨基酸组成和氨基酸理化性质来构建特征向量。这一步至关重要,因为通过成对分析,我们考虑到了氨基酸的顺序,从而捕捉到了序列结构中的细微差别。同时,通过结合物理化学特性,我们确保了氨基酸中编码的隐藏信息不会被忽视。此外,通过同时考虑氨基酸对的频率和顺序,我们的方法降低了错误地将不同序列聚类为相似序列的风险,而这正是旧方法的一个常见缺陷。我们的方法生成了一个简洁的 48 个成员的向量,可以有效地容纳任意长度的序列。这种简洁的表示保留了氨基酸的基本特征,提高了序列分析的准确性。与传统方法不同,我们的算法避免了引入稀疏向量,确保了重要信息的保留。此外,我们还引入了模糊等价关系来解决聚类过程中的不确定性,从而实现了一种更细致、更灵活的聚类方法,捕捉到了生物数据固有的模糊性。尽管取得了这些进步,我们的算法仍以简单明了的方式呈现,确保具有不同计算专业知识水平的研究人员都能使用。这一改进提高了我们方法的稳健性和可解释性,为研究人员提供了一个全面、用户友好的生物序列分析工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Soft Computing
Soft Computing 工程技术-计算机:跨学科应用
CiteScore
8.10
自引率
9.80%
发文量
927
审稿时长
7.3 months
期刊介绍: Soft Computing is dedicated to system solutions based on soft computing techniques. It provides rapid dissemination of important results in soft computing technologies, a fusion of research in evolutionary algorithms and genetic programming, neural science and neural net systems, fuzzy set theory and fuzzy systems, and chaos theory and chaotic systems. Soft Computing encourages the integration of soft computing techniques and tools into both everyday and advanced applications. By linking the ideas and techniques of soft computing with other disciplines, the journal serves as a unifying platform that fosters comparisons, extensions, and new applications. As a result, the journal is an international forum for all scientists and engineers engaged in research and development in this fast growing field.
期刊最新文献
Handwritten text recognition and information extraction from ancient manuscripts using deep convolutional and recurrent neural network Optimizing green solid transportation with carbon cap and trade: a multi-objective two-stage approach in a type-2 Pythagorean fuzzy context Production chain modeling based on learning flow stochastic petri nets Multi-population multi-strategy differential evolution algorithm with dynamic population size adjustment Dynamic parameter identification of modular robot manipulators based on hybrid optimization strategy: genetic algorithm and least squares method
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1