Incorporating semantic similarity into clustering process for identifying protein complexes from Affinity Purification/Mass Spectrometry data

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops Pub Date : 2012-10-04 DOI:10.1109/BIBM.2012.6392718

Bingjing Cai, Haiying Wang, Huiru Zheng, Hui Wang

{"title":"Incorporating semantic similarity into clustering process for identifying protein complexes from Affinity Purification/Mass Spectrometry data","authors":"Bingjing Cai, Haiying Wang, Huiru Zheng, Hui Wang","doi":"10.1109/BIBM.2012.6392718","DOIUrl":null,"url":null,"abstract":"This paper presents a framework for incorporating semantic similarities in the detection of protein complexes from Affinity Purification/Mass Spectrometry (AP-MS) data. AP-MS data is modeled as a bipartite network, where one set of nodes consist of bait proteins and the other set are prey proteins. Pair-wise similarities of bait proteins are computed by combining similarities based on topological features and functional semantic similarities. A hierarchical clustering algorithm is then applied to obtain `seed clusters' consisting of bait proteins. Starting from these `seed' clusters, an expansion process is developed to recruit prey proteins which are significantly associated with bait proteins, to produce final sets of identified protein complexes. In the application to real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes. Six statistical metrics have been applied. Results show that by integrating semantic similarities into the clustering process, the accuracy of identifying complexes has been greatly improved. Meanwhile, clustering results obtained by the proposed framework are better than those from several existent clustering methods.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":"190 1","pages":"1-4"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2012.6392718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

This paper presents a framework for incorporating semantic similarities in the detection of protein complexes from Affinity Purification/Mass Spectrometry (AP-MS) data. AP-MS data is modeled as a bipartite network, where one set of nodes consist of bait proteins and the other set are prey proteins. Pair-wise similarities of bait proteins are computed by combining similarities based on topological features and functional semantic similarities. A hierarchical clustering algorithm is then applied to obtain `seed clusters' consisting of bait proteins. Starting from these `seed' clusters, an expansion process is developed to recruit prey proteins which are significantly associated with bait proteins, to produce final sets of identified protein complexes. In the application to real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes. Six statistical metrics have been applied. Results show that by integrating semantic similarities into the clustering process, the accuracy of identifying complexes has been greatly improved. Meanwhile, clustering results obtained by the proposed framework are better than those from several existent clustering methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

将语义相似度整合到聚类过程中，从亲和纯化/质谱数据中识别蛋白质复合物

本文提出了一个结合语义相似性的框架，用于从亲和纯化/质谱(AP-MS)数据中检测蛋白质复合物。AP-MS数据建模为一个二部网络，其中一组节点由诱饵蛋白质组成，另一组节点由猎物蛋白质组成。将基于拓扑特征的相似性和功能语义相似性相结合，计算诱饵蛋白的成对相似性。然后应用分层聚类算法获得由诱饵蛋白组成的“种子簇”。从这些“种子”簇开始，开发了一个扩展过程，以招募与诱饵蛋白显著相关的猎物蛋白，以产生最终的鉴定蛋白复合物。在实际AP-MS数据集的应用中，我们通过使用策划的蛋白质复合物来验证预测的蛋白质复合物的生物学意义。应用了六种统计度量。结果表明，将语义相似度集成到聚类过程中，大大提高了识别复合体的准确率。同时，该框架的聚类结果优于现有的几种聚类方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops

自引率

0.00%

发文量

期刊最新文献

Towards comprehensive longitudinal healthcare data capture On the repetitive collection indexing problem Sampling low-energy protein-protein configurations with basin hopping The effect of measurement approach and noise level on gene selection stability Clinical research progress of treatment over Tourette syndrome with acup-mox therapy