{"title":"Biological versus Topological Domains in Improving the Reliability of Evolutionary-Based Protein Complex Detection Algorithms","authors":"Isra H. Abdulateef, B. Attea, D. Alzubaydi","doi":"10.24996/ijs.2024.65.3.42","DOIUrl":null,"url":null,"abstract":" By definition, the detection of protein complexes that form protein-protein interaction networks (PPINs) is an NP-hard problem. Evolutionary algorithms (EAs), as global search methods, are proven in the literature to be more successful than greedy methods in detecting protein complexes. However, the design of most of these EA-based approaches relies on the topological information of the proteins in the PPIN. Biological information, as a key resource for molecular profiles, on the other hand, acquired a little interest in the design of the components in these EA-based methods. The main aim of this paper is to redesign two operators in the EA based on the functional domain rather than the graph topological domain. The perturbation mechanism of both crossover and mutation operators is designed based on the direct gene ontology annotations and Jaccard similarity coefficients for the proteins. The results on yeast Saccharomyces cerevisiae PPIN provide a useful perspective that the functional domain of the proteins, as compared with the topological domain, is more consistent with the true information reported in the Munich Information Center for Protein Sequence (MIPS) catalog. The evaluation at both complex and protein levels reveals that feeding the components of the EA with biological information will imply more accurate complex structures, whereas topological information may mislead the algorithm towards a faulty structure.","PeriodicalId":14698,"journal":{"name":"Iraqi Journal of Science","volume":"46 12","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Iraqi Journal of Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.24996/ijs.2024.65.3.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Earth and Planetary Sciences","Score":null,"Total":0}
引用次数: 0
Abstract
By definition, the detection of protein complexes that form protein-protein interaction networks (PPINs) is an NP-hard problem. Evolutionary algorithms (EAs), as global search methods, are proven in the literature to be more successful than greedy methods in detecting protein complexes. However, the design of most of these EA-based approaches relies on the topological information of the proteins in the PPIN. Biological information, as a key resource for molecular profiles, on the other hand, acquired a little interest in the design of the components in these EA-based methods. The main aim of this paper is to redesign two operators in the EA based on the functional domain rather than the graph topological domain. The perturbation mechanism of both crossover and mutation operators is designed based on the direct gene ontology annotations and Jaccard similarity coefficients for the proteins. The results on yeast Saccharomyces cerevisiae PPIN provide a useful perspective that the functional domain of the proteins, as compared with the topological domain, is more consistent with the true information reported in the Munich Information Center for Protein Sequence (MIPS) catalog. The evaluation at both complex and protein levels reveals that feeding the components of the EA with biological information will imply more accurate complex structures, whereas topological information may mislead the algorithm towards a faulty structure.
根据定义,检测形成蛋白质-蛋白质相互作用网络(PPINs)的蛋白质复合物是一个 NP 难问题。文献证明,进化算法(EA)作为全局搜索方法,在检测蛋白质复合体方面比贪婪方法更成功。然而,大多数基于进化算法的方法的设计都依赖于 PPIN 中蛋白质的拓扑信息。另一方面,生物信息作为分子图谱的关键资源,在这些基于 EA 方法的组件设计中却鲜有问津。本文的主要目的是根据功能域而不是图拓扑域重新设计 EA 中的两个算子。交叉和突变算子的扰动机制是根据蛋白质的直接基因本体注释和 Jaccard 相似系数设计的。酵母 PPIN 的研究结果提供了一个有用的视角,即与拓扑结构域相比,蛋白质的功能域更符合慕尼黑蛋白质序列信息中心(MIPS)目录中报告的真实信息。对复杂结构和蛋白质水平的评估表明,向 EA 的组件提供生物信息将意味着更准确的复杂结构,而拓扑信息则可能会误导算法得出错误的结构。