Clustering with a faulty oracle

Kasper Green Larsen, M. Mitzenmacher, Charalampos E. Tsourakakis
{"title":"Clustering with a faulty oracle","authors":"Kasper Green Larsen, M. Mitzenmacher, Charalampos E. Tsourakakis","doi":"10.1145/3366423.3380045","DOIUrl":null,"url":null,"abstract":"Clustering, i.e., finding groups in the data, is a problem that permeates multiple fields of science and engineering. Recently, the problem of clustering with a noisy oracle has drawn attention due to various applications including crowdsourced entity resolution [33], and predicting signs of interactions in large-scale online social networks [20, 21]. Here, we consider the following fundamental model for two clusters as proposed by Mitzenmacher and Tsourakakis [28], and Mazumdar and Saha [25]; there exist n items, belonging to two unknown groups. We are allowed to query any pair of nodes whether they belong to the same cluster or not, but the answer to the query is corrupted with some probability . Let 1 > δ = 1 − 2q > 0 be the bias. In this work, we provide a polynomial time algorithm that recovers all signs correctly with high probability in the presence of noise with queries. This is the best known result for this problem for all but tiny δ, improving on the current state-of-the-art due to Mazumdar and Saha [25].","PeriodicalId":20754,"journal":{"name":"Proceedings of The Web Conference 2020","volume":"5 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of The Web Conference 2020","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366423.3380045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Clustering, i.e., finding groups in the data, is a problem that permeates multiple fields of science and engineering. Recently, the problem of clustering with a noisy oracle has drawn attention due to various applications including crowdsourced entity resolution [33], and predicting signs of interactions in large-scale online social networks [20, 21]. Here, we consider the following fundamental model for two clusters as proposed by Mitzenmacher and Tsourakakis [28], and Mazumdar and Saha [25]; there exist n items, belonging to two unknown groups. We are allowed to query any pair of nodes whether they belong to the same cluster or not, but the answer to the query is corrupted with some probability . Let 1 > δ = 1 − 2q > 0 be the bias. In this work, we provide a polynomial time algorithm that recovers all signs correctly with high probability in the presence of noise with queries. This is the best known result for this problem for all but tiny δ, improving on the current state-of-the-art due to Mazumdar and Saha [25].
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用错误的oracle进行集群
聚类,即在数据中找到组,是一个渗透到多个科学和工程领域的问题。最近,由于各种应用,包括众包实体解析[33]和预测大规模在线社交网络中的交互迹象[20,21],带有噪声oracle的聚类问题引起了人们的关注。在这里,我们考虑以下由Mitzenmacher和Tsourakakis[28]以及Mazumdar和Saha[25]提出的两个集群的基本模型;有n个项目,属于两个未知的组。我们可以查询任意一对节点是否属于同一集群,但查询的结果有一定的概率是错误的。设1 > δ = 1−2q > 0为偏置。在这项工作中,我们提供了一个多项式时间算法,该算法在查询中存在噪声的情况下以高概率正确恢复所有符号。这是除微小δ之外的所有δ问题的最著名结果,由于Mazumdar和Saha的研究[25],这一结果在目前最先进的基础上得到了改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Gone, Gone, but Not Really, and Gone, But Not forgotten: A Typology of Website Recoverability Those who are left behind: A chronicle of internet access in Cuba Towards Automated Technologies in the Referencing Quality of Wikidata Companion of The Web Conference 2022, Virtual Event / Lyon, France, April 25 - 29, 2022 WWW '21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1