2018 IEEE International Conference on Data Mining (ICDM)最新文献_第6页

Privacy-Preserving Temporal Record Linkage 保护隐私的临时记录链接

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00053

Thilina Ranbaduge, P. Christen

Record linkage (RL) is the process of identifying matching records from different databases that refer to the same entity. It is common that the attribute values of records that belong to the same entity do evolve over time, for example people can change their surname or address. Therefore, to identify the records that refer to the same entity over time, RL should make use of temporal information such as the time-stamp of when a record was created and/or update last. However, if RL needs to be conducted on information about people, due to privacy and confidentiality concerns organizations are often not willing or allowed to share sensitive data in their databases, such as personal medical records, or location and financial details, with other organizations. This paper is the first to propose a privacy-preserving temporal record linkage (PPTRL) protocol that can link records across different databases while ensuring the privacy of the sensitive data in these databases. We propose a novel protocol based on Bloom filter encoding which incorporates the temporal information available in records during the linkage process. Our approach uses homomorphic encryption to securely calculate the probabilities of entities changing attribute values in their records over a period of time. Based on these probabilities we generate a set of masking Bloom filters to adjust the similarities between record pairs. We provide a theoretical analysis of the complexity and privacy of our technique and conduct an empirical study on large real databases containing several millions of records. The experimental results show that our approach can achieve better linkage quality compared to non-temporal PPRL while providing privacy to individuals in the databases that are being linked.

记录链接(Record linkage, RL)是识别来自不同数据库中引用同一实体的匹配记录的过程。属于同一实体的记录的属性值通常会随着时间的推移而变化，例如，人们可以更改他们的姓氏或地址。因此，为了识别随时间推移引用同一实体的记录，RL应该利用时间信息，例如记录创建和/或最后更新时间的时间戳。但是，如果需要对有关人员的信息进行RL，由于隐私和机密性问题，组织通常不愿意或不允许与其他组织共享其数据库中的敏感数据，例如个人医疗记录或位置和财务详细信息。本文首次提出了一种保护隐私的时间记录链接(PPTRL)协议，该协议可以跨不同数据库链接记录，同时保证这些数据库中敏感数据的隐私性。我们提出了一种新的基于布隆滤波编码的协议，该协议结合了在链接过程中记录中可用的时间信息。我们的方法使用同态加密来安全地计算实体在一段时间内更改其记录中的属性值的概率。基于这些概率，我们生成一组屏蔽布隆过滤器来调整记录对之间的相似性。我们对我们的技术的复杂性和隐私性进行了理论分析，并对包含数百万条记录的大型真实数据库进行了实证研究。实验结果表明，与非时态PPRL相比，我们的方法可以获得更好的链接质量，同时为正在链接的数据库中的个人提供隐私。

{"title":"Privacy-Preserving Temporal Record Linkage","authors":"Thilina Ranbaduge, P. Christen","doi":"10.1109/ICDM.2018.00053","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00053","url":null,"abstract":"Record linkage (RL) is the process of identifying matching records from different databases that refer to the same entity. It is common that the attribute values of records that belong to the same entity do evolve over time, for example people can change their surname or address. Therefore, to identify the records that refer to the same entity over time, RL should make use of temporal information such as the time-stamp of when a record was created and/or update last. However, if RL needs to be conducted on information about people, due to privacy and confidentiality concerns organizations are often not willing or allowed to share sensitive data in their databases, such as personal medical records, or location and financial details, with other organizations. This paper is the first to propose a privacy-preserving temporal record linkage (PPTRL) protocol that can link records across different databases while ensuring the privacy of the sensitive data in these databases. We propose a novel protocol based on Bloom filter encoding which incorporates the temporal information available in records during the linkage process. Our approach uses homomorphic encryption to securely calculate the probabilities of entities changing attribute values in their records over a period of time. Based on these probabilities we generate a set of masking Bloom filters to adjust the similarities between record pairs. We provide a theoretical analysis of the complexity and privacy of our technique and conduct an empirical study on large real databases containing several millions of records. The experimental results show that our approach can achieve better linkage quality compared to non-temporal PPRL while providing privacy to individuals in the databases that are being linked.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132982675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Deep Structure Learning for Fraud Detection 欺诈检测的深度结构学习

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00072

Haibo Wang, Chuan Zhou, Jia Wu, Weizhen Dang, Xingquan Zhu, Jilong Wang

Fraud detection is of great importance because fraudulent behaviors may mislead consumers or bring huge losses to enterprises. Due to the lockstep feature of fraudulent behaviors, fraud detection problem can be viewed as finding suspicious dense blocks in the attributed bipartite graph. In reality, existing attribute-based methods are not adversarially robust, because fraudsters can take some camouflage actions to cover their behavior attributes as normal. More importantly, existing structural information based methods only consider shallow topology structure, making their effectiveness sensitive to the density of suspicious blocks. In this paper, we propose a novel deep structure learning model named DeepFD to differentiate normal users and suspicious users. DeepFD can preserve the non-linear graph structure and user behavior information simultaneously. Experimental results on different types of datasets demonstrate that DeepFD outperforms the state-of-the-art baselines.

欺诈检测非常重要，因为欺诈行为可能会误导消费者或给企业带来巨大损失。由于欺诈行为的lockstep特征，欺诈检测问题可以看作是在属性二部图中寻找可疑的密集块。在现实中，现有的基于属性的方法并不具有对抗鲁棒性，因为欺诈者可以采取一些伪装动作来掩盖他们的行为属性。更重要的是，现有的基于结构信息的方法只考虑了浅层拓扑结构，使得其有效性对可疑块的密度很敏感。在本文中，我们提出了一种新的深度结构学习模型DeepFD来区分正常用户和可疑用户。DeepFD可以同时保留非线性图结构和用户行为信息。在不同类型数据集上的实验结果表明，DeepFD优于最先进的基线。

引用次数: 43

Leveraging Hypergraph Random Walk Tag Expansion and User Social Relation for Microblog Recommendation 利用超图随机漫步标签扩展和用户社会关系进行微博推荐

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00152

Huifang Ma, Di Zhang, Weizhong Zhao, Yanru Wang, Zhongzhi Shi

Recommending valuable contents for microblog users is an important way to improve users' experiences. As high quality descriptors of user semantics, tags have always been used to represent users' interests or attributes. In this work, we propose a microblog recommendation approach via hypergraph random walk tag expansion and user social relation. More specifically, microblogs are considered as hyperedges and terms are taken as hypervertexs for each user, and the weighting strategies for both hyperedges and hypervertexs are established. Random walk is performed on the weighted hypergraph to obtain a number of terms as tags for users. And then the tag similarity matrix and the user-tag matrix can be constructed based on tag probability correlations and weight of each tag. Besides, the significance of user social relation is also considered for recommendation. Moreover, an iterative updating scheme is developed to get the final user-tag matrix for computing the similarities between microblogs and users. Experimental results show that the algorithm is effective in microblog recommendation.

向微博用户推荐有价值的内容是提升用户体验的重要途径。标签作为用户语义的高质量描述符，一直被用来表示用户的兴趣或属性。在这项工作中，我们提出了一种基于超图随机行走标签扩展和用户社会关系的微博推荐方法。具体而言，将微博视为超边缘，将每个用户的术语视为超顶点，并建立超边缘和超顶点的加权策略。对加权超图进行随机漫步，获得若干项作为用户标签。然后根据标签的概率相关性和每个标签的权重构造标签相似矩阵和用户标签矩阵。此外，推荐还考虑了用户社会关系的重要性。此外，提出了一种迭代更新方案，得到最终的用户标签矩阵，用于计算微博与用户之间的相似度。实验结果表明，该算法在微博推荐中是有效的。

{"title":"Leveraging Hypergraph Random Walk Tag Expansion and User Social Relation for Microblog Recommendation","authors":"Huifang Ma, Di Zhang, Weizhong Zhao, Yanru Wang, Zhongzhi Shi","doi":"10.1109/ICDM.2018.00152","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00152","url":null,"abstract":"Recommending valuable contents for microblog users is an important way to improve users' experiences. As high quality descriptors of user semantics, tags have always been used to represent users' interests or attributes. In this work, we propose a microblog recommendation approach via hypergraph random walk tag expansion and user social relation. More specifically, microblogs are considered as hyperedges and terms are taken as hypervertexs for each user, and the weighting strategies for both hyperedges and hypervertexs are established. Random walk is performed on the weighted hypergraph to obtain a number of terms as tags for users. And then the tag similarity matrix and the user-tag matrix can be constructed based on tag probability correlations and weight of each tag. Besides, the significance of user social relation is also considered for recommendation. Moreover, an iterative updating scheme is developed to get the final user-tag matrix for computing the similarities between microblogs and users. Experimental results show that the algorithm is effective in microblog recommendation.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115212088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Exploiting Topic-Based Adversarial Neural Network for Cross-Domain Keyphrase Extraction 利用基于主题的对抗神经网络进行跨域关键词提取

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00075

Yanan Wang, Qi Liu, Chuan Qin, Tong Xu, Yijun Wang, Enhong Chen, Hui Xiong

Keyphrases have been widely used in large document collections for providing a concise summary of document content. While significant efforts have been made on the task of automatic keyphrase extraction, existing methods have challenges in training a robust supervised model when there are insufficient labeled data in the resource-poor domains. To this end, in this paper, we propose a novel Topic-based Adversarial Neural Network (TANN) method, which aims at exploiting the unlabeled data in the target domain and the data in the resource-rich source domain. Specifically, we first explicitly incorporate the global topic information into the document representation using a topic correlation layer. Then, domain-invariant features are learned to allow the efficient transfer from the source domain to the target by utilizing adversarial training on the topic-based representation. Meanwhile, to balance the adversarial training and preserve the domain-private features in the target domain, we reconstruct the target data from both forward and backward directions. Finally, based on the learned features, keyphrase are extracted using a tagging method. Experiments on two realworld cross-domain scenarios demonstrate that our method can significantly improve the performance of keyphrase extraction on unlabeled or insufficiently labeled target domain.

关键字已广泛用于大型文档集合中，以提供文档内容的简明摘要。虽然在关键词自动提取方面已经做出了很大的努力，但当资源贫乏的领域中标记数据不足时，现有的方法在训练鲁棒监督模型方面存在挑战。为此，本文提出了一种新的基于topic的对抗神经网络(TANN)方法，该方法旨在利用目标域中的未标记数据和资源丰富的源域中的数据。具体来说，我们首先使用主题相关层显式地将全局主题信息合并到文档表示中。然后，通过对基于主题的表示进行对抗性训练，学习域不变特征，从而实现从源域到目标域的有效转移。同时，为了平衡对抗性训练和保留目标域的域私有特征，我们从前向和后向重构目标数据。最后，基于学习到的特征，使用标注方法提取关键词。实验结果表明，该方法可以显著提高未标记或标记不足目标域的关键词提取性能。

{"title":"Exploiting Topic-Based Adversarial Neural Network for Cross-Domain Keyphrase Extraction","authors":"Yanan Wang, Qi Liu, Chuan Qin, Tong Xu, Yijun Wang, Enhong Chen, Hui Xiong","doi":"10.1109/ICDM.2018.00075","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00075","url":null,"abstract":"Keyphrases have been widely used in large document collections for providing a concise summary of document content. While significant efforts have been made on the task of automatic keyphrase extraction, existing methods have challenges in training a robust supervised model when there are insufficient labeled data in the resource-poor domains. To this end, in this paper, we propose a novel Topic-based Adversarial Neural Network (TANN) method, which aims at exploiting the unlabeled data in the target domain and the data in the resource-rich source domain. Specifically, we first explicitly incorporate the global topic information into the document representation using a topic correlation layer. Then, domain-invariant features are learned to allow the efficient transfer from the source domain to the target by utilizing adversarial training on the topic-based representation. Meanwhile, to balance the adversarial training and preserve the domain-private features in the target domain, we reconstruct the target data from both forward and backward directions. Finally, based on the learned features, keyphrase are extracted using a tagging method. Experiments on two realworld cross-domain scenarios demonstrate that our method can significantly improve the performance of keyphrase extraction on unlabeled or insufficiently labeled target domain.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115805917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Social Recommendation with Missing Not at Random Data 缺失非随机数据的社会推荐

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00018

Jiawei Chen, C. Wang, M. Ester, Qihao Shi, Yan Feng, Chun Chen

With the explosive growth of online social networks, many social recommendation methods have been proposed and demonstrated that social information has potential to improve the recommendation performance. However, existing social recommendation methods always assume that the data is missing at random (MAR) but this is rarely the case. In fact, by analysing two real-world social recommendation datasets, we observed the following interesting phenomena: (1) users tend to consume and rate the items that they like and the items that have been consumed by their friends. (2) When the items have been consumed by more friends, the average values of the observed ratings will become smaller, not larger as assumed by the existing models. To model these phenomena, we integrate the missing not at random (MNAR) assumption in social recommendation and propose a new social recommendation method SPMF-MNAR, which models the observation process of rating data based on user's preference and social influence. Extensive experiments conducted on large real-world datasets validate that SPMF-MNAR achieves better performance than existing social recommendation methods and the non-social methods based on MNAR assumption.

随着在线社交网络的爆炸式增长，人们提出了许多社交推荐方法，并证明社交信息具有提高推荐性能的潜力。然而，现有的社会推荐方法总是假设数据随机缺失(MAR)，但这种情况很少发生。事实上，通过分析两个现实世界的社交推荐数据集，我们观察到以下有趣的现象:(1)用户倾向于消费和评价他们喜欢的商品和他们的朋友已经消费过的商品。(2)当该商品被更多的朋友消费时，观察到的评分平均值会变小，而不是像现有模型假设的那样变大。为了对这些现象进行建模，我们将缺失非随机假设(missing not at random, MNAR)融入到社会推荐中，提出了一种新的社会推荐方法SPMF-MNAR，该方法基于用户偏好和社会影响对评分数据的观察过程进行建模。在大型真实数据集上进行的大量实验验证了SPMF-MNAR比现有的社会推荐方法和基于MNAR假设的非社会推荐方法取得了更好的性能。

{"title":"Social Recommendation with Missing Not at Random Data","authors":"Jiawei Chen, C. Wang, M. Ester, Qihao Shi, Yan Feng, Chun Chen","doi":"10.1109/ICDM.2018.00018","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00018","url":null,"abstract":"With the explosive growth of online social networks, many social recommendation methods have been proposed and demonstrated that social information has potential to improve the recommendation performance. However, existing social recommendation methods always assume that the data is missing at random (MAR) but this is rarely the case. In fact, by analysing two real-world social recommendation datasets, we observed the following interesting phenomena: (1) users tend to consume and rate the items that they like and the items that have been consumed by their friends. (2) When the items have been consumed by more friends, the average values of the observed ratings will become smaller, not larger as assumed by the existing models. To model these phenomena, we integrate the missing not at random (MNAR) assumption in social recommendation and propose a new social recommendation method SPMF-MNAR, which models the observation process of rating data based on user's preference and social influence. Extensive experiments conducted on large real-world datasets validate that SPMF-MNAR achieves better performance than existing social recommendation methods and the non-social methods based on MNAR assumption.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114900947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

Learning Community Structure with Variational Autoencoder 用变分自编码器学习社区结构

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00022

Jun Jin Choong, Xin Liu, T. Murata

Discovering community structure in networks remains a fundamentally challenging task. From scientific domains such as biology, chemistry and physics to social networks the challenge of identifying community structures in different kinds of network is challenging since there is no universal definition of community structure. Furthermore, with the surge of social networks, content information has played a pivotal role in defining community structure, demanding techniques beyond its traditional approach. Recently, network representation learning have shown tremendous promise. Leveraging on recent advances in deep learning, one can exploit deep learning's superiority to a network problem. Most predominantly, successes in supervised and semi-supervised task has shown promising results in network representation learning tasks such as link prediction and graph classification. However, much has yet to be explored in the literature of community detection which is an unsupervised learning task. This paper proposes a deep generative model for community detection and network generation. Empowered with Bayesian deep learning, deep generative models are capable of exploiting non-linearities while giving insights in terms of uncertainty. Hence, this paper proposes Variational Graph Autoencoder for Community Detection (VGAECD). Extensive experiment shows that it is capable of outperforming existing state-of-the-art methods. The generalization of the proposed model also allows the model to be considered as a graph generator. Additionally, unlike traditional methods, the proposed model does not require a predefined community structure definition. Instead, it assumes the existence of latent similarity between nodes and allows the model to find these similarities through an automatic model selection process. Optionally, it is capable of exploiting feature-rich information of a network such as node content, further increasing its performance.

发现网络中的社区结构仍然是一项具有根本性挑战性的任务。从生物学、化学和物理等科学领域到社会网络，由于没有对社区结构的普遍定义，识别不同类型网络中的社区结构的挑战是具有挑战性的。此外，随着社交网络的激增，内容信息在定义社区结构方面发挥了关键作用，这需要超越传统方法的技术。最近，网络表示学习显示出巨大的前景。利用深度学习的最新进展，人们可以利用深度学习在网络问题上的优势。最主要的是，监督和半监督任务的成功在网络表示学习任务(如链接预测和图分类)中显示了有希望的结果。然而，社区检测是一项无监督学习任务，在文献中还有待探索。本文提出了一种用于社区检测和网络生成的深度生成模型。利用贝叶斯深度学习，深度生成模型能够利用非线性，同时提供不确定性方面的见解。为此，本文提出了一种用于社区检测的变分图自编码器(VGAECD)。大量的实验表明，它能够优于现有的最先进的方法。所提出的模型的泛化也允许模型被视为一个图生成器。此外，与传统方法不同，所提出的模型不需要预定义的社区结构定义。相反，它假设节点之间存在潜在的相似性，并允许模型通过自动模型选择过程找到这些相似性。可选地，它能够利用网络的功能丰富的信息，如节点内容，进一步提高其性能。

{"title":"Learning Community Structure with Variational Autoencoder","authors":"Jun Jin Choong, Xin Liu, T. Murata","doi":"10.1109/ICDM.2018.00022","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00022","url":null,"abstract":"Discovering community structure in networks remains a fundamentally challenging task. From scientific domains such as biology, chemistry and physics to social networks the challenge of identifying community structures in different kinds of network is challenging since there is no universal definition of community structure. Furthermore, with the surge of social networks, content information has played a pivotal role in defining community structure, demanding techniques beyond its traditional approach. Recently, network representation learning have shown tremendous promise. Leveraging on recent advances in deep learning, one can exploit deep learning's superiority to a network problem. Most predominantly, successes in supervised and semi-supervised task has shown promising results in network representation learning tasks such as link prediction and graph classification. However, much has yet to be explored in the literature of community detection which is an unsupervised learning task. This paper proposes a deep generative model for community detection and network generation. Empowered with Bayesian deep learning, deep generative models are capable of exploiting non-linearities while giving insights in terms of uncertainty. Hence, this paper proposes Variational Graph Autoencoder for Community Detection (VGAECD). Extensive experiment shows that it is capable of outperforming existing state-of-the-art methods. The generalization of the proposed model also allows the model to be considered as a graph generator. Additionally, unlike traditional methods, the proposed model does not require a predefined community structure definition. Instead, it assumes the existence of latent similarity between nodes and allows the model to find these similarities through an automatic model selection process. Optionally, it is capable of exploiting feature-rich information of a network such as node content, further increasing its performance.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125282949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

D-CARS: A Declarative Context-Aware Recommender System D-CARS:声明式上下文感知推荐系统

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00151

Rosni Lumbantoruan, Xiangmin Zhou, Yongli Ren, Z. Bao

Context-aware recommendation has emerged as perhaps the most popular service over online sites, and has seen applications to domains as diverse as entertainment, e-business, e-health and government services. There has been recent significant progress on the quality and scalability of recommender systems. However, we believe that different target users concern different contexts when they select an online item, which can greatly affect the quality of recommendation, and have not been investigated yet. In this paper, we propose a new type of recommender system, Declarative Context-Aware Recommender System (D-CARS), which enables the personalization of the contexts exploited for each target user by automatically analysing the viewing history of users. First, we propose a novel User-Window Non-negative Matrix Factorization topic model (UW-NMF) that adaptively identifies the significant contexts of users and constructs user profiles in a personalized manner. Then, we design a novel declarative context-aware recommendation algorithm that exploits the user context preference to identify a group of item candidates and its context distribution, based on a Subspace Ensemble Tree Model (SETM), which is constructed in the identified context subspace for item recommendation. Finally, we propose an algorithm that incrementally maintains our SETM model. Extensive experiments are conducted to prove the high effectiveness and efficiency of our D-CARS system.

上下文感知推荐可能已经成为在线站点上最受欢迎的服务，并已应用于娱乐、电子商务、电子保健和政府服务等各种领域。最近在推荐系统的质量和可扩展性方面取得了重大进展。然而，我们认为不同的目标用户在选择在线商品时关注不同的上下文，这可能会极大地影响推荐的质量，目前还没有研究。在本文中，我们提出了一种新型的推荐系统，声明式上下文感知推荐系统(D-CARS)，它通过自动分析用户的观看历史，为每个目标用户提供个性化的上下文。首先，我们提出了一种新的用户窗口非负矩阵分解主题模型(UW-NMF)，该模型自适应地识别用户的重要上下文并以个性化的方式构建用户配置文件。然后，我们设计了一种新的声明式上下文感知推荐算法，该算法基于子空间集成树模型(SETM)，利用用户上下文偏好来识别一组候选项目及其上下文分布，该子空间集成树模型在识别出的上下文子空间中构造，用于项目推荐。最后，我们提出了一种增量式维护SETM模型的算法。大量的实验证明了我们的D-CARS系统的高效性和高效率。

{"title":"D-CARS: A Declarative Context-Aware Recommender System","authors":"Rosni Lumbantoruan, Xiangmin Zhou, Yongli Ren, Z. Bao","doi":"10.1109/ICDM.2018.00151","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00151","url":null,"abstract":"Context-aware recommendation has emerged as perhaps the most popular service over online sites, and has seen applications to domains as diverse as entertainment, e-business, e-health and government services. There has been recent significant progress on the quality and scalability of recommender systems. However, we believe that different target users concern different contexts when they select an online item, which can greatly affect the quality of recommendation, and have not been investigated yet. In this paper, we propose a new type of recommender system, Declarative Context-Aware Recommender System (D-CARS), which enables the personalization of the contexts exploited for each target user by automatically analysing the viewing history of users. First, we propose a novel User-Window Non-negative Matrix Factorization topic model (UW-NMF) that adaptively identifies the significant contexts of users and constructs user profiles in a personalized manner. Then, we design a novel declarative context-aware recommendation algorithm that exploits the user context preference to identify a group of item candidates and its context distribution, based on a Subspace Ensemble Tree Model (SETM), which is constructed in the identified context subspace for item recommendation. Finally, we propose an algorithm that incrementally maintains our SETM model. Extensive experiments are conducted to prove the high effectiveness and efficiency of our D-CARS system.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126757732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Differentially Private Prescriptive Analytics 差异私有规范分析

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00124

Haripriya Harikumar, Santu Rana, Sunil Gupta, Thin Nguyen, R. Kaimal, S. Venkatesh

Privacy preservation is important. Prescriptive analytics is a method to extract corrective actions to avoid undesirable outcomes. We propose a privacy preserving prescriptive analytics algorithm to protect the data used during the construction of the prescriptive analytics algorithm. We use differential privacy mechanism to achieve strong privacy guarantee. Differential privacy mechanism requires computation of sensitivity: maximum change in the output between two training datasets, which is differed by only one instance. The main challenge we addressed is the computation of sensitivity of the prescription vector. In absence of any analytical form, we construct a nested global optimization problem to compute the sensitivity. We solve the optimization problem using constrained Bayesian optimization, as the nested structure makes the objective function expensive. We demonstrate our algorithm on two real world datasets and observe that the prescription vectors remains useful even after making them private.

保护隐私很重要。规定性分析是一种提取纠正措施以避免不良结果的方法。我们提出了一种保护隐私的规定性分析算法，以保护规定性分析算法构建过程中使用的数据。采用差分隐私机制实现强隐私保障。差分隐私机制需要计算灵敏度:两个训练数据集之间输出的最大变化，只相差一个实例。我们解决的主要挑战是处方向量的灵敏度计算。在没有解析形式的情况下，构造了一个嵌套全局优化问题来计算灵敏度。由于嵌套结构使得目标函数昂贵，我们使用约束贝叶斯优化来解决优化问题。我们在两个真实世界的数据集上演示了我们的算法，并观察到即使将处方向量设置为私有，它们仍然有用。

引用次数: 4

Using Balancing Terms to Avoid Discrimination in Classification 利用平衡项避免分类中的歧视

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00116

Simon Enni, I. Assent

From personalized ad delivery and healthcare to criminal sentencing, more decisions are made with help from methods developed in the fields of data mining and machine learning than ever before. However, their widespread use has raised concerns about the discriminatory impact which the methods may have on people subject to these decisions. Recently, imbalance in the misclassification rates between groups has been identified as a source of discrimination. Such discrimination is not handled by most existing work in discrimination-aware data mining, and it can persist even if other types of discrimination are alleviated. In this article, we present the Balancing Terms (BT) method to address this problem. BT balances the error rates of any classifier with a differentiable prediction function, and unlike existing work, it can incorporate a preference for the trade-off between fairness and accuracy. We empirically evaluate BT on real-world data, demonstrating that our method produces tradeoffs between error rate balance and total classification error that are superior and in only few cases comparable to the state-of-the-art.

从个性化广告投放和医疗保健到刑事判决，在数据挖掘和机器学习领域开发的方法的帮助下，做出的决策比以往任何时候都多。然而，它们的广泛使用引起了人们的关注，即这些方法可能对受这些决定约束的人产生歧视性影响。最近，在群体之间的错误分类率的不平衡已被确定为歧视的来源。在感知歧视的数据挖掘中，大多数现有的工作都没有处理这种歧视，即使其他类型的歧视得到缓解，这种歧视也会持续存在。在本文中，我们提出平衡项(BT)方法来解决这个问题。BT用一个可微的预测函数来平衡任何分类器的错误率，与现有的工作不同，它可以在公平性和准确性之间权衡。我们在真实世界的数据上对BT进行了经验评估，证明我们的方法在错误率平衡和总分类错误之间产生了折衷，并且在少数情况下可以与最先进的技术相媲美。

引用次数: 2

Multi-label Answer Aggregation Based on Joint Matrix Factorization 基于联合矩阵分解的多标签答案聚合

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00067

Jinzheng Tu, Guoxian Yu, C. Domeniconi, J. Wang, Guoqiang Xiao, Maozu Guo

Crowdsourcing is a useful and economic approach to data annotation. To obtain annotation of high quality, various aggregation approaches have been developed, which take into account different factors that impact the quality of aggregated answers. However, existing methods generally focus on single-label (multi-class and binary) tasks, and they ignore the inter-correlation between labels, and thus may have compromised quality. In this paper, we introduce a Multi-Label answer aggregation approach based on Joint Matrix Factorization (ML-JMF). ML-JMF selectively and jointly factorizes the sample-label association matrices collected from different annotators into products of individual and shared low-rank matrices. As such, it takes advantage of the robustness of low-rank matrix approximation to noise, and reduces the impact of unreliable annotators by assigning small (zero) weights to their annotation matrices. In addition, it takes advantage of the correlation among labels by leveraging the shared low-rank matrix, and of the similarity between annotators using the individual low-rank matrices to guide the factorization. ML-JMF pursues the low-rank matrices via a unified objective function, and introduces an iterative technique to optimize it. ML-JMF finally uses the optimized low-rank matrices and weights to infer the ground-truth labels. Our experimental results on multi-label datasets show that ML-JMF outperforms competitive methods in inferring ground truth labels. Our approach can identify unreliable annotators, and is robust against their misleading answers through the assignment of small (zero) weights to their annotation.

众包是一种有用且经济的数据注释方法。为了获得高质量的注释，人们开发了各种聚合方法，这些方法考虑了影响聚合答案质量的不同因素。然而，现有的方法通常侧重于单标签(多类和二进制)任务，而忽略了标签之间的相互关系，因此可能会降低质量。本文介绍了一种基于联合矩阵分解(ML-JMF)的多标签答案聚合方法。ML-JMF选择性地、联合地将从不同注释器收集的样本-标签关联矩阵分解为单个和共享的低秩矩阵的乘积。因此，它利用了低秩矩阵逼近噪声的鲁棒性，并通过为其注释矩阵分配小(零)权值来减少不可靠注释器的影响。此外，它通过利用共享的低秩矩阵来利用标签之间的相关性，并利用单个低秩矩阵来利用注释器之间的相似性来指导分解。ML-JMF通过统一的目标函数来追求低秩矩阵，并引入了迭代优化技术。ML-JMF最后使用优化后的低秩矩阵和权值来推断基真标签。我们在多标签数据集上的实验结果表明，ML-JMF在推断基础真值标签方面优于竞争方法。我们的方法可以识别不可靠的注释者，并且通过为其注释分配小(零)权值，对其误导性答案具有鲁棒性。

{"title":"Multi-label Answer Aggregation Based on Joint Matrix Factorization","authors":"Jinzheng Tu, Guoxian Yu, C. Domeniconi, J. Wang, Guoqiang Xiao, Maozu Guo","doi":"10.1109/ICDM.2018.00067","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00067","url":null,"abstract":"Crowdsourcing is a useful and economic approach to data annotation. To obtain annotation of high quality, various aggregation approaches have been developed, which take into account different factors that impact the quality of aggregated answers. However, existing methods generally focus on single-label (multi-class and binary) tasks, and they ignore the inter-correlation between labels, and thus may have compromised quality. In this paper, we introduce a Multi-Label answer aggregation approach based on Joint Matrix Factorization (ML-JMF). ML-JMF selectively and jointly factorizes the sample-label association matrices collected from different annotators into products of individual and shared low-rank matrices. As such, it takes advantage of the robustness of low-rank matrix approximation to noise, and reduces the impact of unreliable annotators by assigning small (zero) weights to their annotation matrices. In addition, it takes advantage of the correlation among labels by leveraging the shared low-rank matrix, and of the similarity between annotators using the individual low-rank matrices to guide the factorization. ML-JMF pursues the low-rank matrices via a unified objective function, and introduces an iterative technique to optimize it. ML-JMF finally uses the optimized low-rank matrices and weights to infer the ground-truth labels. Our experimental results on multi-label datasets show that ML-JMF outperforms competitive methods in inferring ground truth labels. Our approach can identify unreliable annotators, and is robust against their misleading answers through the assignment of small (zero) weights to their annotation.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122052966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18