首页 > 最新文献

2018 IEEE International Conference on Data Mining (ICDM)最新文献

英文 中文
Exploiting the Sentimental Bias between Ratings and Reviews for Enhancing Recommendation 利用评分和评论之间的情感偏差来增强推荐
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00185
Yuanbo Xu, Yongjian Yang, Jiayu Han, E. Wang, Fuzhen Zhuang, Hui Xiong
In real-world recommendation scenarios, there are two common phenomena: 1) users only provide ratings but there is no review comment. As a result, the historical transaction data available for recommender system are usually unbalanced and sparse; 2) Users' opinions can be better grasped in their reviews than ratings. This indicates that there is always a bias between ratings and reviews. Therefore, it is important that users' ratings and reviews should be mutually reinforced to grasp the users' true opinions. To this end, in this paper, we develop an opinion mining model based on convolutional neural networks for enhancing recommendation (NeuO). Specifically, we exploit a two-step training neural networks, which utilize both reviews and ratings to grasp users' true opinions in unbalanced data. Moreover, we propose a Sentiment Classification scoring method (SC), which employs dual attention vectors to predict the users' sentiment scores of their reviews. A combination function is designed to use the results of SC and user-item rating matrix to catch the opinion bias. Finally, a Multilayer perceptron based Matrix Factorization (MMF) method is proposed to make recommendations with the enhanced user-item matrix. Extensive experiments on real-world data demonstrate that our approach can achieve a superior performance over state-of-the-art baselines on real-world datasets.
在真实的推荐场景中,有两种常见的现象:1)用户只提供评分,没有评论。因此,推荐系统可用的历史交易数据通常是不平衡和稀疏的;2)用户的评论比评分更能反映用户的意见。这表明评级和评论之间总是存在偏见。因此,用户的评分和评论应该相互加强,以掌握用户的真实意见。为此,在本文中,我们开发了一种基于卷积神经网络的意见挖掘模型,用于增强推荐(NeuO)。具体来说,我们利用两步训练神经网络,它利用评论和评级来把握用户在不平衡数据中的真实意见。此外,我们提出了一种情感分类评分方法(SC),该方法采用双注意力向量来预测用户评论的情感得分。设计了一个组合函数,利用SC的结果和用户-物品评价矩阵来捕捉意见偏差。最后,提出了一种基于多层感知器的矩阵分解(MMF)方法,利用增强的用户-项目矩阵进行推荐。在真实世界数据上的大量实验表明,我们的方法可以在真实世界数据集的最先进基线上实现卓越的性能。
{"title":"Exploiting the Sentimental Bias between Ratings and Reviews for Enhancing Recommendation","authors":"Yuanbo Xu, Yongjian Yang, Jiayu Han, E. Wang, Fuzhen Zhuang, Hui Xiong","doi":"10.1109/ICDM.2018.00185","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00185","url":null,"abstract":"In real-world recommendation scenarios, there are two common phenomena: 1) users only provide ratings but there is no review comment. As a result, the historical transaction data available for recommender system are usually unbalanced and sparse; 2) Users' opinions can be better grasped in their reviews than ratings. This indicates that there is always a bias between ratings and reviews. Therefore, it is important that users' ratings and reviews should be mutually reinforced to grasp the users' true opinions. To this end, in this paper, we develop an opinion mining model based on convolutional neural networks for enhancing recommendation (NeuO). Specifically, we exploit a two-step training neural networks, which utilize both reviews and ratings to grasp users' true opinions in unbalanced data. Moreover, we propose a Sentiment Classification scoring method (SC), which employs dual attention vectors to predict the users' sentiment scores of their reviews. A combination function is designed to use the results of SC and user-item rating matrix to catch the opinion bias. Finally, a Multilayer perceptron based Matrix Factorization (MMF) method is proposed to make recommendations with the enhanced user-item matrix. Extensive experiments on real-world data demonstrate that our approach can achieve a superior performance over state-of-the-art baselines on real-world datasets.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"10 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125933450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Entire Regularization Path for Sparse Nonnegative Interaction Model 稀疏非负交互模型的全正则化路径
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00168
Mirai Takayanagi, Yasuo Tabei, Hiroto Saigo
Building sparse combinatorial model with non-negative constraint is essential in solving real-world problems such as in biology, in which the target response is often formulated by additive linear combination of features variables. This paper presents a solution to this problem by combining itemset mining with non-negative least squares. However, once incorporation of modern regularization is considered, then a naive solution requires to solve expensive enumeration problem many times for every regularization parameter. In this paper, we devise a regularization path tracking algorithm such that combinatorial feature is searched and included one by one to the solution set. Our contribution is a proposal of novel bounds specifically designed for the feature search problem. In synthetic dataset, the proposed method is demonstrated to run orders of magnitudes faster than a naive counterpart which does not employ tree pruning. We also empirically show that non-negativity constraints can reduce the number of active features much less than that of LASSO, leading to significant speed-ups in pattern search. In experiments using HIV-1 drug resistance dataset, the proposed method could successfully model the rapidly increasing drug resistance triggered by accumulation of mutations in HIV-1 genetic sequences. We also demonstrate the effectiveness of non-negativity constraints in suppressing false positive features, resulting in a model with smaller number of features and thereby improved interpretability.
建立具有非负约束的稀疏组合模型对于解决诸如生物学等现实问题至关重要,在这些问题中,目标响应通常是由特征变量的加性线性组合来表示的。本文提出了将项目集挖掘与非负最小二乘相结合的方法来解决这一问题。然而,一旦考虑到现代正则化的结合,那么一个朴素的解决方案需要为每个正则化参数多次解决昂贵的枚举问题。本文设计了一种正则化路径跟踪算法,将组合特征逐个搜索并包含到解集中。我们的贡献是提出了专门为特征搜索问题设计的新边界。在合成数据集中,该方法的运行速度比不使用树修剪的朴素方法快几个数量级。我们还通过经验证明,非负性约束减少的活动特征数量远少于LASSO,从而导致模式搜索的显着加速。在使用HIV-1耐药数据集的实验中,该方法可以成功地模拟HIV-1基因序列突变积累引发的快速增加的耐药性。我们还证明了非负性约束在抑制假阳性特征方面的有效性,从而产生具有更少特征的模型,从而提高了可解释性。
{"title":"Entire Regularization Path for Sparse Nonnegative Interaction Model","authors":"Mirai Takayanagi, Yasuo Tabei, Hiroto Saigo","doi":"10.1109/ICDM.2018.00168","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00168","url":null,"abstract":"Building sparse combinatorial model with non-negative constraint is essential in solving real-world problems such as in biology, in which the target response is often formulated by additive linear combination of features variables. This paper presents a solution to this problem by combining itemset mining with non-negative least squares. However, once incorporation of modern regularization is considered, then a naive solution requires to solve expensive enumeration problem many times for every regularization parameter. In this paper, we devise a regularization path tracking algorithm such that combinatorial feature is searched and included one by one to the solution set. Our contribution is a proposal of novel bounds specifically designed for the feature search problem. In synthetic dataset, the proposed method is demonstrated to run orders of magnitudes faster than a naive counterpart which does not employ tree pruning. We also empirically show that non-negativity constraints can reduce the number of active features much less than that of LASSO, leading to significant speed-ups in pattern search. In experiments using HIV-1 drug resistance dataset, the proposed method could successfully model the rapidly increasing drug resistance triggered by accumulation of mutations in HIV-1 genetic sequences. We also demonstrate the effectiveness of non-negativity constraints in suppressing false positive features, resulting in a model with smaller number of features and thereby improved interpretability.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"26 9","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113980420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Leveraging Hypergraph Random Walk Tag Expansion and User Social Relation for Microblog Recommendation 利用超图随机漫步标签扩展和用户社会关系进行微博推荐
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00152
Huifang Ma, Di Zhang, Weizhong Zhao, Yanru Wang, Zhongzhi Shi
Recommending valuable contents for microblog users is an important way to improve users' experiences. As high quality descriptors of user semantics, tags have always been used to represent users' interests or attributes. In this work, we propose a microblog recommendation approach via hypergraph random walk tag expansion and user social relation. More specifically, microblogs are considered as hyperedges and terms are taken as hypervertexs for each user, and the weighting strategies for both hyperedges and hypervertexs are established. Random walk is performed on the weighted hypergraph to obtain a number of terms as tags for users. And then the tag similarity matrix and the user-tag matrix can be constructed based on tag probability correlations and weight of each tag. Besides, the significance of user social relation is also considered for recommendation. Moreover, an iterative updating scheme is developed to get the final user-tag matrix for computing the similarities between microblogs and users. Experimental results show that the algorithm is effective in microblog recommendation.
向微博用户推荐有价值的内容是提升用户体验的重要途径。标签作为用户语义的高质量描述符,一直被用来表示用户的兴趣或属性。在这项工作中,我们提出了一种基于超图随机行走标签扩展和用户社会关系的微博推荐方法。具体而言,将微博视为超边缘,将每个用户的术语视为超顶点,并建立超边缘和超顶点的加权策略。对加权超图进行随机漫步,获得若干项作为用户标签。然后根据标签的概率相关性和每个标签的权重构造标签相似矩阵和用户标签矩阵。此外,推荐还考虑了用户社会关系的重要性。此外,提出了一种迭代更新方案,得到最终的用户标签矩阵,用于计算微博与用户之间的相似度。实验结果表明,该算法在微博推荐中是有效的。
{"title":"Leveraging Hypergraph Random Walk Tag Expansion and User Social Relation for Microblog Recommendation","authors":"Huifang Ma, Di Zhang, Weizhong Zhao, Yanru Wang, Zhongzhi Shi","doi":"10.1109/ICDM.2018.00152","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00152","url":null,"abstract":"Recommending valuable contents for microblog users is an important way to improve users' experiences. As high quality descriptors of user semantics, tags have always been used to represent users' interests or attributes. In this work, we propose a microblog recommendation approach via hypergraph random walk tag expansion and user social relation. More specifically, microblogs are considered as hyperedges and terms are taken as hypervertexs for each user, and the weighting strategies for both hyperedges and hypervertexs are established. Random walk is performed on the weighted hypergraph to obtain a number of terms as tags for users. And then the tag similarity matrix and the user-tag matrix can be constructed based on tag probability correlations and weight of each tag. Besides, the significance of user social relation is also considered for recommendation. Moreover, an iterative updating scheme is developed to get the final user-tag matrix for computing the similarities between microblogs and users. Experimental results show that the algorithm is effective in microblog recommendation.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115212088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploiting Topic-Based Adversarial Neural Network for Cross-Domain Keyphrase Extraction 利用基于主题的对抗神经网络进行跨域关键词提取
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00075
Yanan Wang, Qi Liu, Chuan Qin, Tong Xu, Yijun Wang, Enhong Chen, Hui Xiong
Keyphrases have been widely used in large document collections for providing a concise summary of document content. While significant efforts have been made on the task of automatic keyphrase extraction, existing methods have challenges in training a robust supervised model when there are insufficient labeled data in the resource-poor domains. To this end, in this paper, we propose a novel Topic-based Adversarial Neural Network (TANN) method, which aims at exploiting the unlabeled data in the target domain and the data in the resource-rich source domain. Specifically, we first explicitly incorporate the global topic information into the document representation using a topic correlation layer. Then, domain-invariant features are learned to allow the efficient transfer from the source domain to the target by utilizing adversarial training on the topic-based representation. Meanwhile, to balance the adversarial training and preserve the domain-private features in the target domain, we reconstruct the target data from both forward and backward directions. Finally, based on the learned features, keyphrase are extracted using a tagging method. Experiments on two realworld cross-domain scenarios demonstrate that our method can significantly improve the performance of keyphrase extraction on unlabeled or insufficiently labeled target domain.
关键字已广泛用于大型文档集合中,以提供文档内容的简明摘要。虽然在关键词自动提取方面已经做出了很大的努力,但当资源贫乏的领域中标记数据不足时,现有的方法在训练鲁棒监督模型方面存在挑战。为此,本文提出了一种新的基于topic的对抗神经网络(TANN)方法,该方法旨在利用目标域中的未标记数据和资源丰富的源域中的数据。具体来说,我们首先使用主题相关层显式地将全局主题信息合并到文档表示中。然后,通过对基于主题的表示进行对抗性训练,学习域不变特征,从而实现从源域到目标域的有效转移。同时,为了平衡对抗性训练和保留目标域的域私有特征,我们从前向和后向重构目标数据。最后,基于学习到的特征,使用标注方法提取关键词。实验结果表明,该方法可以显著提高未标记或标记不足目标域的关键词提取性能。
{"title":"Exploiting Topic-Based Adversarial Neural Network for Cross-Domain Keyphrase Extraction","authors":"Yanan Wang, Qi Liu, Chuan Qin, Tong Xu, Yijun Wang, Enhong Chen, Hui Xiong","doi":"10.1109/ICDM.2018.00075","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00075","url":null,"abstract":"Keyphrases have been widely used in large document collections for providing a concise summary of document content. While significant efforts have been made on the task of automatic keyphrase extraction, existing methods have challenges in training a robust supervised model when there are insufficient labeled data in the resource-poor domains. To this end, in this paper, we propose a novel Topic-based Adversarial Neural Network (TANN) method, which aims at exploiting the unlabeled data in the target domain and the data in the resource-rich source domain. Specifically, we first explicitly incorporate the global topic information into the document representation using a topic correlation layer. Then, domain-invariant features are learned to allow the efficient transfer from the source domain to the target by utilizing adversarial training on the topic-based representation. Meanwhile, to balance the adversarial training and preserve the domain-private features in the target domain, we reconstruct the target data from both forward and backward directions. Finally, based on the learned features, keyphrase are extracted using a tagging method. Experiments on two realworld cross-domain scenarios demonstrate that our method can significantly improve the performance of keyphrase extraction on unlabeled or insufficiently labeled target domain.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115805917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Social Recommendation with Missing Not at Random Data 缺失非随机数据的社会推荐
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00018
Jiawei Chen, C. Wang, M. Ester, Qihao Shi, Yan Feng, Chun Chen
With the explosive growth of online social networks, many social recommendation methods have been proposed and demonstrated that social information has potential to improve the recommendation performance. However, existing social recommendation methods always assume that the data is missing at random (MAR) but this is rarely the case. In fact, by analysing two real-world social recommendation datasets, we observed the following interesting phenomena: (1) users tend to consume and rate the items that they like and the items that have been consumed by their friends. (2) When the items have been consumed by more friends, the average values of the observed ratings will become smaller, not larger as assumed by the existing models. To model these phenomena, we integrate the missing not at random (MNAR) assumption in social recommendation and propose a new social recommendation method SPMF-MNAR, which models the observation process of rating data based on user's preference and social influence. Extensive experiments conducted on large real-world datasets validate that SPMF-MNAR achieves better performance than existing social recommendation methods and the non-social methods based on MNAR assumption.
随着在线社交网络的爆炸式增长,人们提出了许多社交推荐方法,并证明社交信息具有提高推荐性能的潜力。然而,现有的社会推荐方法总是假设数据随机缺失(MAR),但这种情况很少发生。事实上,通过分析两个现实世界的社交推荐数据集,我们观察到以下有趣的现象:(1)用户倾向于消费和评价他们喜欢的商品和他们的朋友已经消费过的商品。(2)当该商品被更多的朋友消费时,观察到的评分平均值会变小,而不是像现有模型假设的那样变大。为了对这些现象进行建模,我们将缺失非随机假设(missing not at random, MNAR)融入到社会推荐中,提出了一种新的社会推荐方法SPMF-MNAR,该方法基于用户偏好和社会影响对评分数据的观察过程进行建模。在大型真实数据集上进行的大量实验验证了SPMF-MNAR比现有的社会推荐方法和基于MNAR假设的非社会推荐方法取得了更好的性能。
{"title":"Social Recommendation with Missing Not at Random Data","authors":"Jiawei Chen, C. Wang, M. Ester, Qihao Shi, Yan Feng, Chun Chen","doi":"10.1109/ICDM.2018.00018","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00018","url":null,"abstract":"With the explosive growth of online social networks, many social recommendation methods have been proposed and demonstrated that social information has potential to improve the recommendation performance. However, existing social recommendation methods always assume that the data is missing at random (MAR) but this is rarely the case. In fact, by analysing two real-world social recommendation datasets, we observed the following interesting phenomena: (1) users tend to consume and rate the items that they like and the items that have been consumed by their friends. (2) When the items have been consumed by more friends, the average values of the observed ratings will become smaller, not larger as assumed by the existing models. To model these phenomena, we integrate the missing not at random (MNAR) assumption in social recommendation and propose a new social recommendation method SPMF-MNAR, which models the observation process of rating data based on user's preference and social influence. Extensive experiments conducted on large real-world datasets validate that SPMF-MNAR achieves better performance than existing social recommendation methods and the non-social methods based on MNAR assumption.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114900947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Learning Community Structure with Variational Autoencoder 用变分自编码器学习社区结构
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00022
Jun Jin Choong, Xin Liu, T. Murata
Discovering community structure in networks remains a fundamentally challenging task. From scientific domains such as biology, chemistry and physics to social networks the challenge of identifying community structures in different kinds of network is challenging since there is no universal definition of community structure. Furthermore, with the surge of social networks, content information has played a pivotal role in defining community structure, demanding techniques beyond its traditional approach. Recently, network representation learning have shown tremendous promise. Leveraging on recent advances in deep learning, one can exploit deep learning's superiority to a network problem. Most predominantly, successes in supervised and semi-supervised task has shown promising results in network representation learning tasks such as link prediction and graph classification. However, much has yet to be explored in the literature of community detection which is an unsupervised learning task. This paper proposes a deep generative model for community detection and network generation. Empowered with Bayesian deep learning, deep generative models are capable of exploiting non-linearities while giving insights in terms of uncertainty. Hence, this paper proposes Variational Graph Autoencoder for Community Detection (VGAECD). Extensive experiment shows that it is capable of outperforming existing state-of-the-art methods. The generalization of the proposed model also allows the model to be considered as a graph generator. Additionally, unlike traditional methods, the proposed model does not require a predefined community structure definition. Instead, it assumes the existence of latent similarity between nodes and allows the model to find these similarities through an automatic model selection process. Optionally, it is capable of exploiting feature-rich information of a network such as node content, further increasing its performance.
发现网络中的社区结构仍然是一项具有根本性挑战性的任务。从生物学、化学和物理等科学领域到社会网络,由于没有对社区结构的普遍定义,识别不同类型网络中的社区结构的挑战是具有挑战性的。此外,随着社交网络的激增,内容信息在定义社区结构方面发挥了关键作用,这需要超越传统方法的技术。最近,网络表示学习显示出巨大的前景。利用深度学习的最新进展,人们可以利用深度学习在网络问题上的优势。最主要的是,监督和半监督任务的成功在网络表示学习任务(如链接预测和图分类)中显示了有希望的结果。然而,社区检测是一项无监督学习任务,在文献中还有待探索。本文提出了一种用于社区检测和网络生成的深度生成模型。利用贝叶斯深度学习,深度生成模型能够利用非线性,同时提供不确定性方面的见解。为此,本文提出了一种用于社区检测的变分图自编码器(VGAECD)。大量的实验表明,它能够优于现有的最先进的方法。所提出的模型的泛化也允许模型被视为一个图生成器。此外,与传统方法不同,所提出的模型不需要预定义的社区结构定义。相反,它假设节点之间存在潜在的相似性,并允许模型通过自动模型选择过程找到这些相似性。可选地,它能够利用网络的功能丰富的信息,如节点内容,进一步提高其性能。
{"title":"Learning Community Structure with Variational Autoencoder","authors":"Jun Jin Choong, Xin Liu, T. Murata","doi":"10.1109/ICDM.2018.00022","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00022","url":null,"abstract":"Discovering community structure in networks remains a fundamentally challenging task. From scientific domains such as biology, chemistry and physics to social networks the challenge of identifying community structures in different kinds of network is challenging since there is no universal definition of community structure. Furthermore, with the surge of social networks, content information has played a pivotal role in defining community structure, demanding techniques beyond its traditional approach. Recently, network representation learning have shown tremendous promise. Leveraging on recent advances in deep learning, one can exploit deep learning's superiority to a network problem. Most predominantly, successes in supervised and semi-supervised task has shown promising results in network representation learning tasks such as link prediction and graph classification. However, much has yet to be explored in the literature of community detection which is an unsupervised learning task. This paper proposes a deep generative model for community detection and network generation. Empowered with Bayesian deep learning, deep generative models are capable of exploiting non-linearities while giving insights in terms of uncertainty. Hence, this paper proposes Variational Graph Autoencoder for Community Detection (VGAECD). Extensive experiment shows that it is capable of outperforming existing state-of-the-art methods. The generalization of the proposed model also allows the model to be considered as a graph generator. Additionally, unlike traditional methods, the proposed model does not require a predefined community structure definition. Instead, it assumes the existence of latent similarity between nodes and allows the model to find these similarities through an automatic model selection process. Optionally, it is capable of exploiting feature-rich information of a network such as node content, further increasing its performance.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125282949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
D-CARS: A Declarative Context-Aware Recommender System D-CARS:声明式上下文感知推荐系统
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00151
Rosni Lumbantoruan, Xiangmin Zhou, Yongli Ren, Z. Bao
Context-aware recommendation has emerged as perhaps the most popular service over online sites, and has seen applications to domains as diverse as entertainment, e-business, e-health and government services. There has been recent significant progress on the quality and scalability of recommender systems. However, we believe that different target users concern different contexts when they select an online item, which can greatly affect the quality of recommendation, and have not been investigated yet. In this paper, we propose a new type of recommender system, Declarative Context-Aware Recommender System (D-CARS), which enables the personalization of the contexts exploited for each target user by automatically analysing the viewing history of users. First, we propose a novel User-Window Non-negative Matrix Factorization topic model (UW-NMF) that adaptively identifies the significant contexts of users and constructs user profiles in a personalized manner. Then, we design a novel declarative context-aware recommendation algorithm that exploits the user context preference to identify a group of item candidates and its context distribution, based on a Subspace Ensemble Tree Model (SETM), which is constructed in the identified context subspace for item recommendation. Finally, we propose an algorithm that incrementally maintains our SETM model. Extensive experiments are conducted to prove the high effectiveness and efficiency of our D-CARS system.
上下文感知推荐可能已经成为在线站点上最受欢迎的服务,并已应用于娱乐、电子商务、电子保健和政府服务等各种领域。最近在推荐系统的质量和可扩展性方面取得了重大进展。然而,我们认为不同的目标用户在选择在线商品时关注不同的上下文,这可能会极大地影响推荐的质量,目前还没有研究。在本文中,我们提出了一种新型的推荐系统,声明式上下文感知推荐系统(D-CARS),它通过自动分析用户的观看历史,为每个目标用户提供个性化的上下文。首先,我们提出了一种新的用户窗口非负矩阵分解主题模型(UW-NMF),该模型自适应地识别用户的重要上下文并以个性化的方式构建用户配置文件。然后,我们设计了一种新的声明式上下文感知推荐算法,该算法基于子空间集成树模型(SETM),利用用户上下文偏好来识别一组候选项目及其上下文分布,该子空间集成树模型在识别出的上下文子空间中构造,用于项目推荐。最后,我们提出了一种增量式维护SETM模型的算法。大量的实验证明了我们的D-CARS系统的高效性和高效率。
{"title":"D-CARS: A Declarative Context-Aware Recommender System","authors":"Rosni Lumbantoruan, Xiangmin Zhou, Yongli Ren, Z. Bao","doi":"10.1109/ICDM.2018.00151","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00151","url":null,"abstract":"Context-aware recommendation has emerged as perhaps the most popular service over online sites, and has seen applications to domains as diverse as entertainment, e-business, e-health and government services. There has been recent significant progress on the quality and scalability of recommender systems. However, we believe that different target users concern different contexts when they select an online item, which can greatly affect the quality of recommendation, and have not been investigated yet. In this paper, we propose a new type of recommender system, Declarative Context-Aware Recommender System (D-CARS), which enables the personalization of the contexts exploited for each target user by automatically analysing the viewing history of users. First, we propose a novel User-Window Non-negative Matrix Factorization topic model (UW-NMF) that adaptively identifies the significant contexts of users and constructs user profiles in a personalized manner. Then, we design a novel declarative context-aware recommendation algorithm that exploits the user context preference to identify a group of item candidates and its context distribution, based on a Subspace Ensemble Tree Model (SETM), which is constructed in the identified context subspace for item recommendation. Finally, we propose an algorithm that incrementally maintains our SETM model. Extensive experiments are conducted to prove the high effectiveness and efficiency of our D-CARS system.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126757732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Differentially Private Prescriptive Analytics 差异私有规范分析
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00124
Haripriya Harikumar, Santu Rana, Sunil Gupta, Thin Nguyen, R. Kaimal, S. Venkatesh
Privacy preservation is important. Prescriptive analytics is a method to extract corrective actions to avoid undesirable outcomes. We propose a privacy preserving prescriptive analytics algorithm to protect the data used during the construction of the prescriptive analytics algorithm. We use differential privacy mechanism to achieve strong privacy guarantee. Differential privacy mechanism requires computation of sensitivity: maximum change in the output between two training datasets, which is differed by only one instance. The main challenge we addressed is the computation of sensitivity of the prescription vector. In absence of any analytical form, we construct a nested global optimization problem to compute the sensitivity. We solve the optimization problem using constrained Bayesian optimization, as the nested structure makes the objective function expensive. We demonstrate our algorithm on two real world datasets and observe that the prescription vectors remains useful even after making them private.
保护隐私很重要。规定性分析是一种提取纠正措施以避免不良结果的方法。我们提出了一种保护隐私的规定性分析算法,以保护规定性分析算法构建过程中使用的数据。采用差分隐私机制实现强隐私保障。差分隐私机制需要计算灵敏度:两个训练数据集之间输出的最大变化,只相差一个实例。我们解决的主要挑战是处方向量的灵敏度计算。在没有解析形式的情况下,构造了一个嵌套全局优化问题来计算灵敏度。由于嵌套结构使得目标函数昂贵,我们使用约束贝叶斯优化来解决优化问题。我们在两个真实世界的数据集上演示了我们的算法,并观察到即使将处方向量设置为私有,它们仍然有用。
{"title":"Differentially Private Prescriptive Analytics","authors":"Haripriya Harikumar, Santu Rana, Sunil Gupta, Thin Nguyen, R. Kaimal, S. Venkatesh","doi":"10.1109/ICDM.2018.00124","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00124","url":null,"abstract":"Privacy preservation is important. Prescriptive analytics is a method to extract corrective actions to avoid undesirable outcomes. We propose a privacy preserving prescriptive analytics algorithm to protect the data used during the construction of the prescriptive analytics algorithm. We use differential privacy mechanism to achieve strong privacy guarantee. Differential privacy mechanism requires computation of sensitivity: maximum change in the output between two training datasets, which is differed by only one instance. The main challenge we addressed is the computation of sensitivity of the prescription vector. In absence of any analytical form, we construct a nested global optimization problem to compute the sensitivity. We solve the optimization problem using constrained Bayesian optimization, as the nested structure makes the objective function expensive. We demonstrate our algorithm on two real world datasets and observe that the prescription vectors remains useful even after making them private.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125129810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Using Balancing Terms to Avoid Discrimination in Classification 利用平衡项避免分类中的歧视
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00116
Simon Enni, I. Assent
From personalized ad delivery and healthcare to criminal sentencing, more decisions are made with help from methods developed in the fields of data mining and machine learning than ever before. However, their widespread use has raised concerns about the discriminatory impact which the methods may have on people subject to these decisions. Recently, imbalance in the misclassification rates between groups has been identified as a source of discrimination. Such discrimination is not handled by most existing work in discrimination-aware data mining, and it can persist even if other types of discrimination are alleviated. In this article, we present the Balancing Terms (BT) method to address this problem. BT balances the error rates of any classifier with a differentiable prediction function, and unlike existing work, it can incorporate a preference for the trade-off between fairness and accuracy. We empirically evaluate BT on real-world data, demonstrating that our method produces tradeoffs between error rate balance and total classification error that are superior and in only few cases comparable to the state-of-the-art.
从个性化广告投放和医疗保健到刑事判决,在数据挖掘和机器学习领域开发的方法的帮助下,做出的决策比以往任何时候都多。然而,它们的广泛使用引起了人们的关注,即这些方法可能对受这些决定约束的人产生歧视性影响。最近,在群体之间的错误分类率的不平衡已被确定为歧视的来源。在感知歧视的数据挖掘中,大多数现有的工作都没有处理这种歧视,即使其他类型的歧视得到缓解,这种歧视也会持续存在。在本文中,我们提出平衡项(BT)方法来解决这个问题。BT用一个可微的预测函数来平衡任何分类器的错误率,与现有的工作不同,它可以在公平性和准确性之间权衡。我们在真实世界的数据上对BT进行了经验评估,证明我们的方法在错误率平衡和总分类错误之间产生了折衷,并且在少数情况下可以与最先进的技术相媲美。
{"title":"Using Balancing Terms to Avoid Discrimination in Classification","authors":"Simon Enni, I. Assent","doi":"10.1109/ICDM.2018.00116","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00116","url":null,"abstract":"From personalized ad delivery and healthcare to criminal sentencing, more decisions are made with help from methods developed in the fields of data mining and machine learning than ever before. However, their widespread use has raised concerns about the discriminatory impact which the methods may have on people subject to these decisions. Recently, imbalance in the misclassification rates between groups has been identified as a source of discrimination. Such discrimination is not handled by most existing work in discrimination-aware data mining, and it can persist even if other types of discrimination are alleviated. In this article, we present the Balancing Terms (BT) method to address this problem. BT balances the error rates of any classifier with a differentiable prediction function, and unlike existing work, it can incorporate a preference for the trade-off between fairness and accuracy. We empirically evaluate BT on real-world data, demonstrating that our method produces tradeoffs between error rate balance and total classification error that are superior and in only few cases comparable to the state-of-the-art.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"604 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125185613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Multi-label Answer Aggregation Based on Joint Matrix Factorization 基于联合矩阵分解的多标签答案聚合
Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00067
Jinzheng Tu, Guoxian Yu, C. Domeniconi, J. Wang, Guoqiang Xiao, Maozu Guo
Crowdsourcing is a useful and economic approach to data annotation. To obtain annotation of high quality, various aggregation approaches have been developed, which take into account different factors that impact the quality of aggregated answers. However, existing methods generally focus on single-label (multi-class and binary) tasks, and they ignore the inter-correlation between labels, and thus may have compromised quality. In this paper, we introduce a Multi-Label answer aggregation approach based on Joint Matrix Factorization (ML-JMF). ML-JMF selectively and jointly factorizes the sample-label association matrices collected from different annotators into products of individual and shared low-rank matrices. As such, it takes advantage of the robustness of low-rank matrix approximation to noise, and reduces the impact of unreliable annotators by assigning small (zero) weights to their annotation matrices. In addition, it takes advantage of the correlation among labels by leveraging the shared low-rank matrix, and of the similarity between annotators using the individual low-rank matrices to guide the factorization. ML-JMF pursues the low-rank matrices via a unified objective function, and introduces an iterative technique to optimize it. ML-JMF finally uses the optimized low-rank matrices and weights to infer the ground-truth labels. Our experimental results on multi-label datasets show that ML-JMF outperforms competitive methods in inferring ground truth labels. Our approach can identify unreliable annotators, and is robust against their misleading answers through the assignment of small (zero) weights to their annotation.
众包是一种有用且经济的数据注释方法。为了获得高质量的注释,人们开发了各种聚合方法,这些方法考虑了影响聚合答案质量的不同因素。然而,现有的方法通常侧重于单标签(多类和二进制)任务,而忽略了标签之间的相互关系,因此可能会降低质量。本文介绍了一种基于联合矩阵分解(ML-JMF)的多标签答案聚合方法。ML-JMF选择性地、联合地将从不同注释器收集的样本-标签关联矩阵分解为单个和共享的低秩矩阵的乘积。因此,它利用了低秩矩阵逼近噪声的鲁棒性,并通过为其注释矩阵分配小(零)权值来减少不可靠注释器的影响。此外,它通过利用共享的低秩矩阵来利用标签之间的相关性,并利用单个低秩矩阵来利用注释器之间的相似性来指导分解。ML-JMF通过统一的目标函数来追求低秩矩阵,并引入了迭代优化技术。ML-JMF最后使用优化后的低秩矩阵和权值来推断基真标签。我们在多标签数据集上的实验结果表明,ML-JMF在推断基础真值标签方面优于竞争方法。我们的方法可以识别不可靠的注释者,并且通过为其注释分配小(零)权值,对其误导性答案具有鲁棒性。
{"title":"Multi-label Answer Aggregation Based on Joint Matrix Factorization","authors":"Jinzheng Tu, Guoxian Yu, C. Domeniconi, J. Wang, Guoqiang Xiao, Maozu Guo","doi":"10.1109/ICDM.2018.00067","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00067","url":null,"abstract":"Crowdsourcing is a useful and economic approach to data annotation. To obtain annotation of high quality, various aggregation approaches have been developed, which take into account different factors that impact the quality of aggregated answers. However, existing methods generally focus on single-label (multi-class and binary) tasks, and they ignore the inter-correlation between labels, and thus may have compromised quality. In this paper, we introduce a Multi-Label answer aggregation approach based on Joint Matrix Factorization (ML-JMF). ML-JMF selectively and jointly factorizes the sample-label association matrices collected from different annotators into products of individual and shared low-rank matrices. As such, it takes advantage of the robustness of low-rank matrix approximation to noise, and reduces the impact of unreliable annotators by assigning small (zero) weights to their annotation matrices. In addition, it takes advantage of the correlation among labels by leveraging the shared low-rank matrix, and of the similarity between annotators using the individual low-rank matrices to guide the factorization. ML-JMF pursues the low-rank matrices via a unified objective function, and introduces an iterative technique to optimize it. ML-JMF finally uses the optimized low-rank matrices and weights to infer the ground-truth labels. Our experimental results on multi-label datasets show that ML-JMF outperforms competitive methods in inferring ground truth labels. Our approach can identify unreliable annotators, and is robust against their misleading answers through the assignment of small (zero) weights to their annotation.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122052966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
期刊
2018 IEEE International Conference on Data Mining (ICDM)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1