{"title":"基于Factoid嵌入的无监督用户身份链接","authors":"Wei Xie, Xin Mu, R. Lee, Feida Zhu, Ee-Peng Lim","doi":"10.1109/ICDM.2018.00182","DOIUrl":null,"url":null,"abstract":"User identity linkage (UIL), the problem of matching user account across multiple online social networks (OSNs), is widely studied and important to many real-world applications. Most existing UIL solutions adopt a supervised or semi-supervised approach which generally suffer from scarcity of labeled data. In this paper, we propose Factoid Embedding, a novel framework that adopts an unsupervised approach. It is designed to cope with different profile attributes, content types and network links of different OSNs. The key idea is that each piece of information about a user identity describes the real identity owner, and thus distinguishes the owner from other users. We represent such a piece of information by a factoid and model it as a triplet consisting of user identity, predicate, and an object or another user identity. By embedding these factoids, we learn the user identity latent representations and link two user identities from different OSNs if they are close to each other in the user embedding space. Our Factoid Embedding algorithm is designed such that as we learn the embedding space, each embedded factoid is \"translated\" into a motion in the user embedding space to bring similar user identities closer, and different user identities further apart. Extensive experiments are conducted to evaluate Factoid Embedding on two real-world OSNs data sets. The experiment results show that Factoid Embedding outperforms the state-of-the-art methods even without training data.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Unsupervised User Identity Linkage via Factoid Embedding\",\"authors\":\"Wei Xie, Xin Mu, R. Lee, Feida Zhu, Ee-Peng Lim\",\"doi\":\"10.1109/ICDM.2018.00182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"User identity linkage (UIL), the problem of matching user account across multiple online social networks (OSNs), is widely studied and important to many real-world applications. Most existing UIL solutions adopt a supervised or semi-supervised approach which generally suffer from scarcity of labeled data. In this paper, we propose Factoid Embedding, a novel framework that adopts an unsupervised approach. It is designed to cope with different profile attributes, content types and network links of different OSNs. The key idea is that each piece of information about a user identity describes the real identity owner, and thus distinguishes the owner from other users. We represent such a piece of information by a factoid and model it as a triplet consisting of user identity, predicate, and an object or another user identity. By embedding these factoids, we learn the user identity latent representations and link two user identities from different OSNs if they are close to each other in the user embedding space. Our Factoid Embedding algorithm is designed such that as we learn the embedding space, each embedded factoid is \\\"translated\\\" into a motion in the user embedding space to bring similar user identities closer, and different user identities further apart. Extensive experiments are conducted to evaluate Factoid Embedding on two real-world OSNs data sets. The experiment results show that Factoid Embedding outperforms the state-of-the-art methods even without training data.\",\"PeriodicalId\":286444,\"journal\":{\"name\":\"2018 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2018.00182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2018.00182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29
摘要
用户身份链接(User identity linkage, UIL)是指跨多个在线社交网络(online social network, OSNs)匹配用户账户的问题,它被广泛研究,对许多现实应用都很重要。大多数现有的UIL解决方案采用监督或半监督方法,通常受到标记数据稀缺的影响。在本文中,我们提出了一种采用无监督方法的新框架Factoid Embedding。针对不同osn的不同配置文件属性、不同内容类型、不同网络链路而设计。关键思想是,关于用户身份的每条信息都描述了真正的身份所有者,从而将所有者与其他用户区分开来。我们用factoid表示这样的信息,并将其建模为由用户标识、谓词和对象或另一个用户标识组成的三元组。通过嵌入这些factoid,我们学习用户身份的潜在表征,并在用户嵌入空间中,如果两个用户身份在不同的osn中彼此接近,我们将它们链接起来。我们的Factoid嵌入算法是这样设计的:当我们学习嵌入空间时,每个嵌入的Factoid被“翻译”成用户嵌入空间中的运动,从而使相似的用户身份更接近,而不同的用户身份更远。在两个真实的osn数据集上进行了大量的实验来评估Factoid嵌入。实验结果表明,即使没有训练数据,Factoid嵌入也优于最先进的方法。
Unsupervised User Identity Linkage via Factoid Embedding
User identity linkage (UIL), the problem of matching user account across multiple online social networks (OSNs), is widely studied and important to many real-world applications. Most existing UIL solutions adopt a supervised or semi-supervised approach which generally suffer from scarcity of labeled data. In this paper, we propose Factoid Embedding, a novel framework that adopts an unsupervised approach. It is designed to cope with different profile attributes, content types and network links of different OSNs. The key idea is that each piece of information about a user identity describes the real identity owner, and thus distinguishes the owner from other users. We represent such a piece of information by a factoid and model it as a triplet consisting of user identity, predicate, and an object or another user identity. By embedding these factoids, we learn the user identity latent representations and link two user identities from different OSNs if they are close to each other in the user embedding space. Our Factoid Embedding algorithm is designed such that as we learn the embedding space, each embedded factoid is "translated" into a motion in the user embedding space to bring similar user identities closer, and different user identities further apart. Extensive experiments are conducted to evaluate Factoid Embedding on two real-world OSNs data sets. The experiment results show that Factoid Embedding outperforms the state-of-the-art methods even without training data.