Incorporating semantic similarity into clustering process for identifying protein complexes from Affinity Purification/Mass Spectrometry data

Bingjing Cai, Haiying Wang, Huiru Zheng, Hui Wang
{"title":"Incorporating semantic similarity into clustering process for identifying protein complexes from Affinity Purification/Mass Spectrometry data","authors":"Bingjing Cai, Haiying Wang, Huiru Zheng, Hui Wang","doi":"10.1109/BIBM.2012.6392718","DOIUrl":null,"url":null,"abstract":"This paper presents a framework for incorporating semantic similarities in the detection of protein complexes from Affinity Purification/Mass Spectrometry (AP-MS) data. AP-MS data is modeled as a bipartite network, where one set of nodes consist of bait proteins and the other set are prey proteins. Pair-wise similarities of bait proteins are computed by combining similarities based on topological features and functional semantic similarities. A hierarchical clustering algorithm is then applied to obtain `seed clusters' consisting of bait proteins. Starting from these `seed' clusters, an expansion process is developed to recruit prey proteins which are significantly associated with bait proteins, to produce final sets of identified protein complexes. In the application to real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes. Six statistical metrics have been applied. Results show that by integrating semantic similarities into the clustering process, the accuracy of identifying complexes has been greatly improved. Meanwhile, clustering results obtained by the proposed framework are better than those from several existent clustering methods.","PeriodicalId":6392,"journal":{"name":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2012-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2012.6392718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

This paper presents a framework for incorporating semantic similarities in the detection of protein complexes from Affinity Purification/Mass Spectrometry (AP-MS) data. AP-MS data is modeled as a bipartite network, where one set of nodes consist of bait proteins and the other set are prey proteins. Pair-wise similarities of bait proteins are computed by combining similarities based on topological features and functional semantic similarities. A hierarchical clustering algorithm is then applied to obtain `seed clusters' consisting of bait proteins. Starting from these `seed' clusters, an expansion process is developed to recruit prey proteins which are significantly associated with bait proteins, to produce final sets of identified protein complexes. In the application to real AP-MS datasets, we validate biological significance of predicted protein complexes by using curated protein complexes. Six statistical metrics have been applied. Results show that by integrating semantic similarities into the clustering process, the accuracy of identifying complexes has been greatly improved. Meanwhile, clustering results obtained by the proposed framework are better than those from several existent clustering methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
将语义相似度整合到聚类过程中,从亲和纯化/质谱数据中识别蛋白质复合物
本文提出了一个结合语义相似性的框架,用于从亲和纯化/质谱(AP-MS)数据中检测蛋白质复合物。AP-MS数据建模为一个二部网络,其中一组节点由诱饵蛋白质组成,另一组节点由猎物蛋白质组成。将基于拓扑特征的相似性和功能语义相似性相结合,计算诱饵蛋白的成对相似性。然后应用分层聚类算法获得由诱饵蛋白组成的“种子簇”。从这些“种子”簇开始,开发了一个扩展过程,以招募与诱饵蛋白显著相关的猎物蛋白,以产生最终的鉴定蛋白复合物。在实际AP-MS数据集的应用中,我们通过使用策划的蛋白质复合物来验证预测的蛋白质复合物的生物学意义。应用了六种统计度量。结果表明,将语义相似度集成到聚类过程中,大大提高了识别复合体的准确率。同时,该框架的聚类结果优于现有的几种聚类方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Towards comprehensive longitudinal healthcare data capture On the repetitive collection indexing problem Sampling low-energy protein-protein configurations with basin hopping The effect of measurement approach and noise level on gene selection stability Clinical research progress of treatment over Tourette syndrome with acup-mox therapy
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1