2018 IEEE International Conference on Data Mining (ICDM)最新文献

英文中文

Matrix Profile XII: MPdist: A Novel Time Series Distance Measure to Allow Data Mining in More Challenging Scenarios 矩阵概况XII: MPdist:一种新的时间序列距离度量，允许在更具挑战性的场景中进行数据挖掘

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00119

Shaghayegh Gharghabi, Shima Imani, A. Bagnall, Amirali Darvishzadeh, Eamonn J. Keogh

At their core, many time series data mining algorithms can be reduced to reasoning about the shapes of time series subsequences. This requires a distance measure, and most algorithms use Euclidean Distance or Dynamic Time Warping (DTW) as their core subroutine. We argue that these distance measures are not as robust as the community believes. The undue faith in these measures derives from an overreliance on benchmark datasets and self-selection bias. The community is reluctant to address more difficult domains, for which current distance measures are ill-suited. In this work, we introduce a novel distance measure MPdist. We show that our proposed distance measure is much more robust than current distance measures. Furthermore, it allows us to successfully mine datasets that would defeat any Euclidean or DTW distance-based algorithm. Additionally, we show that our distance measure can be computed so efficiently, it allows analytics on fast streams.

在其核心，许多时间序列数据挖掘算法可以简化为对时间序列子序列的形状进行推理。这需要距离度量，大多数算法使用欧几里得距离或动态时间翘曲(DTW)作为其核心子程序。我们认为，这些距离措施并不像社区认为的那样强大。对这些措施的过度信任源于对基准数据集的过度依赖和自我选择偏差。社区不愿意处理更困难的领域，目前的距离测量不适合这些领域。在这项工作中，我们引入了一种新的距离测量MPdist。我们表明，我们提出的距离度量比当前的距离度量鲁棒得多。此外，它允许我们成功地挖掘数据集，这将击败任何基于欧几里得或DTW距离的算法。此外，我们表明我们的距离测量可以如此有效地计算，它允许对快速流进行分析。

引用次数: 33

Collective Human Behavior in Cascading System: Discovery, Modeling and Applications 级联系统中的人类集体行为:发现、建模和应用

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00045

Yunfei Lu, Linyun Yu, T. Zhang, Chengxi Zang, Peng Cui, Chaoming Song, Wenwu Zhu

The collective behavior, describing spontaneously emerging social processes and events, is ubiquitous in both physical society and online social media. The knowledge of collective behavior is critical in understanding and predicting social movements, fads, riots and so on. However, detecting, quantifying and modeling the collective behavior in online social media at large scale are seldom unexplored. In this paper, we examine a real-world online social media with more than 1.7 million information spreading records, which explicitly document the detailed human behavior in this online information cascading system. We observe evident collective behavior in information cascading, and then propose metrics to quantify the collectivity. We find that previous information cascading models cannot capture the collective behavior in the real-world and thus never utilize it. Furthermore, we propose a generative framework with a latent user interest layer to capture the collective behavior in cascading system. Our framework achieves high accuracy in modeling the information cascades with respect to popularity, structure and collectivity. By leveraging the knowledge of collective behavior, our model shows the capability of making predictions without temporal features or early-stage information. Our framework can serve as a more generalized one in modeling cascading system, and, together with empirical discovery and applications, advance our understanding of human behavior.

集体行为，描述自发出现的社会过程和事件，在实体社会和在线社交媒体中无处不在。对集体行为的了解对于理解和预测社会运动、时尚、骚乱等至关重要。然而，大规模的在线社交媒体集体行为的检测、量化和建模很少未被探索。在本文中，我们研究了一个真实世界的在线社交媒体，它有超过170万条信息传播记录，这些记录明确地记录了这个在线信息级联系统中的详细人类行为。我们观察了信息级联中明显的集体行为，然后提出了量化集体的指标。我们发现以往的信息级联模型无法捕捉到现实世界中的集体行为，因此无法利用它。此外，我们提出了一个具有潜在用户兴趣层的生成框架来捕获级联系统中的集体行为。我们的框架在信息级联的流行度、结构和集体性方面达到了较高的建模精度。通过利用集体行为的知识，我们的模型显示了在没有时间特征或早期信息的情况下进行预测的能力。我们的框架可以作为一个更广义的级联系统模型，并与经验发现和应用一起，促进我们对人类行为的理解。

{"title":"Collective Human Behavior in Cascading System: Discovery, Modeling and Applications","authors":"Yunfei Lu, Linyun Yu, T. Zhang, Chengxi Zang, Peng Cui, Chaoming Song, Wenwu Zhu","doi":"10.1109/ICDM.2018.00045","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00045","url":null,"abstract":"The collective behavior, describing spontaneously emerging social processes and events, is ubiquitous in both physical society and online social media. The knowledge of collective behavior is critical in understanding and predicting social movements, fads, riots and so on. However, detecting, quantifying and modeling the collective behavior in online social media at large scale are seldom unexplored. In this paper, we examine a real-world online social media with more than 1.7 million information spreading records, which explicitly document the detailed human behavior in this online information cascading system. We observe evident collective behavior in information cascading, and then propose metrics to quantify the collectivity. We find that previous information cascading models cannot capture the collective behavior in the real-world and thus never utilize it. Furthermore, we propose a generative framework with a latent user interest layer to capture the collective behavior in cascading system. Our framework achieves high accuracy in modeling the information cascades with respect to popularity, structure and collectivity. By leveraging the knowledge of collective behavior, our model shows the capability of making predictions without temporal features or early-stage information. Our framework can serve as a more generalized one in modeling cascading system, and, together with empirical discovery and applications, advance our understanding of human behavior.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126492799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Publisher's Information 出版商的信息

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/icdm.2018.00206

引用次数: 0

Interactive Unknowns Recommendation in E-Learning Systems 电子学习系统中的交互式未知推荐

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00065

Shan-Yun Teng, Jundong Li, Lo Pang-Yun Ting, Kun-Ta Chuang, Huan Liu

The arise of E-learning systems has led to an anytime-anywhere-learning environment for everyone by providing various online courses and tests. However, due to the lack of teacher-student interaction, such ubiquitous learning is generally not as effective as offline classes. In traditional offline courses, teachers facilitate real-time interaction to teach students in accordance with personal aptitude from students' feedback in classes. Without the interruption of instructors, it is difficult for users to be aware of personal unknowns. In this paper, we address an important issue on the exploration of 'user unknowns' from an interactive question-answering process in E-learning systems. A novel interactive learning system, called CagMab, is devised to interactively recommend questions with a round-by-round strategy, which contributes to applications such as a conversational bot for self-evaluation. The flow enables users to discover their weakness and further helps them to progress. In fact, despite its importance, discovering personal unknowns remains a challenging problem in E-learning systems. Even though formulating the problem with the multi-armed bandit framework provides a solution, it often leads to suboptimal results for interactive unknowns recommendation as it simply relies on the contextual features of answered questions. Note that each question is associated with concepts and similar concepts are likely to be linked manually or systematically, which naturally forms the concept graphs. Mining the rich relationships among users, questions and concepts could be potentially helpful in providing better unknowns recommendation. To this end, in this paper, we develop a novel interactive learning framework by borrowing strengths from concept-aware graph embedding for learning user unknowns. Our experimental studies on real data show that the proposed framework can effectively discover user unknowns in an interactive fashion for the recommendation in E-learning systems.

电子学习系统的出现通过提供各种在线课程和测试，为每个人提供了一个随时随地的学习环境。然而，由于缺乏师生互动，这种泛在学习通常不如线下课堂有效。在传统的线下课程中，教师通过实时互动，根据学生在课堂上的反馈进行因材施教。如果没有讲师的打断，用户很难意识到个人的未知。在本文中，我们解决了一个重要的问题，即从电子学习系统的交互式问答过程中探索“用户未知”。一种名为CagMab的新型交互式学习系统被设计出来，以一轮一轮的策略交互式地推荐问题，这有助于诸如用于自我评估的会话机器人之类的应用。流程使用户能够发现自己的弱点，并进一步帮助他们进步。事实上，尽管它很重要，但在电子学习系统中发现个人未知仍然是一个具有挑战性的问题。尽管用多臂强盗框架来表述问题提供了一个解决方案，但它通常会导致交互式未知推荐的次优结果，因为它仅仅依赖于已回答问题的上下文特征。请注意，每个问题都与概念相关联，相似的概念可能被手动或系统地链接起来，这自然形成了概念图。挖掘用户、问题和概念之间的丰富关系可能有助于提供更好的未知推荐。为此，在本文中，我们利用概念感知图嵌入的优势开发了一种新的交互式学习框架，用于学习用户未知数。我们对真实数据的实验研究表明，所提出的框架可以有效地以交互方式发现用户未知，用于电子学习系统的推荐。

{"title":"Interactive Unknowns Recommendation in E-Learning Systems","authors":"Shan-Yun Teng, Jundong Li, Lo Pang-Yun Ting, Kun-Ta Chuang, Huan Liu","doi":"10.1109/ICDM.2018.00065","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00065","url":null,"abstract":"The arise of E-learning systems has led to an anytime-anywhere-learning environment for everyone by providing various online courses and tests. However, due to the lack of teacher-student interaction, such ubiquitous learning is generally not as effective as offline classes. In traditional offline courses, teachers facilitate real-time interaction to teach students in accordance with personal aptitude from students' feedback in classes. Without the interruption of instructors, it is difficult for users to be aware of personal unknowns. In this paper, we address an important issue on the exploration of 'user unknowns' from an interactive question-answering process in E-learning systems. A novel interactive learning system, called CagMab, is devised to interactively recommend questions with a round-by-round strategy, which contributes to applications such as a conversational bot for self-evaluation. The flow enables users to discover their weakness and further helps them to progress. In fact, despite its importance, discovering personal unknowns remains a challenging problem in E-learning systems. Even though formulating the problem with the multi-armed bandit framework provides a solution, it often leads to suboptimal results for interactive unknowns recommendation as it simply relies on the contextual features of answered questions. Note that each question is associated with concepts and similar concepts are likely to be linked manually or systematically, which naturally forms the concept graphs. Mining the rich relationships among users, questions and concepts could be potentially helpful in providing better unknowns recommendation. To this end, in this paper, we develop a novel interactive learning framework by borrowing strengths from concept-aware graph embedding for learning user unknowns. Our experimental studies on real data show that the proposed framework can effectively discover user unknowns in an interactive fashion for the recommendation in E-learning systems.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132381686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A Unified Theory of the Mobile Sequential Recommendation Problem 移动顺序推荐问题的统一理论

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00189

Zeyang Ye, Keli Xiao, Yuefan Deng

A theory is developed to unify the original form, and its many variations, of the mobile sequential recommendation (MSR) problem. The unified theory, expressing the same MSR problem, is superior to the original form in many aspects including a more standardized form. In addition to a newly proposed expected traveling time (ETT) function to measure the quality of recommended routes, we introduce five additional improvements. Also, three essential mathematical properties of the new objective function enable the development of the methods to solve realistic MSR problems with complex conditions. The MSR solutions also support the discovered properties of the proposed objective function. The unified theory should support the long-term decision making for drivers and the traffic department in general.

提出了一种理论来统一移动顺序推荐(MSR)问题的原始形式及其许多变体。统一理论表达了同样的MSR问题，在许多方面都优于原始形式，包括更标准化的形式。除了新提出的期望旅行时间(ETT)函数来衡量推荐路线的质量外，我们还介绍了五个额外的改进。此外，新目标函数的三个基本数学性质使该方法能够解决具有复杂条件的实际MSR问题。MSR解决方案还支持所建议的目标函数的已发现属性。统一理论应该支持驾驶员和交通部门的长期决策。

引用次数: 10

Intelligent Salary Benchmarking for Talent Recruitment: A Holistic Matrix Factorization Approach 人才招聘的智能薪酬基准:一种整体矩阵分解方法

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00049

Qingxin Meng, Hengshu Zhu, Keli Xiao, Hui Xiong

As a vital process to the success of an organization, salary benchmarking aims at identifying the right market rate for each job position. Traditional approaches for salary benchmarking heavily rely on the experiences from domain experts and limited market survey data, which have difficulties in handling the dynamic scenarios with the timely benchmarking requirement. To this end, in this paper, we propose a data-driven approach for intelligent salary benchmarking based on large-scale fine-grained online recruitment data. Specifically, we first construct a salary matrix based on the large-scale recruitment data and creatively formalize the salary benchmarking problem as a matrix completion task. Along this line, we develop a Holistic Salary Benchmarking Matrix Factorization (HSBMF) model for predicting the missing salary information in the salary matrix. Indeed, by integrating multiple confounding factors, such as company similarity, job similarity, and spatial-temporal similarity, HSBMF is able to provide a holistic and dynamic view for fine-grained salary benchmarking. Finally, extensive experiments on large-scale real-world data clearly validate the effectiveness of our approach for job salary benchmarking.

作为一个组织成功的重要过程，工资基准旨在确定每个工作岗位的正确市场价格。传统的薪酬对标方法严重依赖领域专家的经验和有限的市场调查数据，难以处理及时对标需求的动态场景。为此，本文提出了一种基于大规模细粒度在线招聘数据的智能薪酬基准制定方法。具体而言，我们首先基于大规模招聘数据构建薪酬矩阵，并创造性地将薪酬基准问题形式化为矩阵完成任务。沿着这条线，我们开发了一个整体工资基准矩阵分解(HSBMF)模型，用于预测工资矩阵中缺失的工资信息。事实上，通过整合多个混杂因素，如公司相似性、工作相似性和时空相似性，HSBMF能够为细粒度的薪资基准制定提供一个整体和动态的视图。最后，在大规模真实世界数据上进行的大量实验清楚地验证了我们的工作工资基准方法的有效性。

{"title":"Intelligent Salary Benchmarking for Talent Recruitment: A Holistic Matrix Factorization Approach","authors":"Qingxin Meng, Hengshu Zhu, Keli Xiao, Hui Xiong","doi":"10.1109/ICDM.2018.00049","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00049","url":null,"abstract":"As a vital process to the success of an organization, salary benchmarking aims at identifying the right market rate for each job position. Traditional approaches for salary benchmarking heavily rely on the experiences from domain experts and limited market survey data, which have difficulties in handling the dynamic scenarios with the timely benchmarking requirement. To this end, in this paper, we propose a data-driven approach for intelligent salary benchmarking based on large-scale fine-grained online recruitment data. Specifically, we first construct a salary matrix based on the large-scale recruitment data and creatively formalize the salary benchmarking problem as a matrix completion task. Along this line, we develop a Holistic Salary Benchmarking Matrix Factorization (HSBMF) model for predicting the missing salary information in the salary matrix. Indeed, by integrating multiple confounding factors, such as company similarity, job similarity, and spatial-temporal similarity, HSBMF is able to provide a holistic and dynamic view for fine-grained salary benchmarking. Finally, extensive experiments on large-scale real-world data clearly validate the effectiveness of our approach for job salary benchmarking.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131159001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

DE-RNN: Forecasting the Probability Density Function of Nonlinear Time Series DE-RNN:预测非线性时间序列的概率密度函数

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00085

K. Yeo, Igor Melnyk, Nam H. Nguyen, Eun Kyung Lee

Model-free identification of a nonlinear dynamical system from the noisy observations is of current interest due to its direct relevance to many applications in Industry 4.0. Making a prediction of such noisy time series constitutes a problem of learning the nonlinear time evolution of a probability distribution. Capability of most of the conventional time series models is limited when the underlying dynamics is nonlinear, multi-scale or when there is no prior knowledge at all on the system dynamics. We propose DE-RNN (Density Estimation Recurrent Neural Network) to learn the probability density function (PDF) of a stochastic process with an underlying nonlinear dynamics and compute the time evolution of the PDF for a probabilistic forecast. A Recurrent Neural Network (RNN)-based model is employed to learn a nonlinear operator for the temporal evolution of the stochastic process. We use a softmax layer for a numerical discretization of a smooth PDF, which transforms a function approximation problem to a classification task. A regularized cross-entropy method is introduced to impose a smoothness condition on the estimated probability distribution. A Monte Carlo procedure to compute the temporal evolution of the distribution for a multiple-step forecast is presented. It is shown that the proposed algorithm can learn the nonlinear multi-scale dynamics from the noisy observations and provides an effective tool to forecast time evolution of the underlying probability distribution. Evaluation of the algorithm on three synthetic and two real data sets shows advantage over the compared baselines, and a potential value to a wide range of problems in physics and engineering.

从噪声观测中识别非线性动力系统的无模型是当前的兴趣，因为它与工业4.0中的许多应用直接相关。对这种有噪声的时间序列进行预测是一个学习概率分布的非线性时间演化的问题。当潜在的动力学是非线性的、多尺度的，或者没有关于系统动力学的先验知识时，大多数传统的时间序列模型的能力是有限的。我们提出了DE-RNN(密度估计递归神经网络)来学习具有潜在非线性动力学的随机过程的概率密度函数，并计算概率预测的概率密度函数的时间演化。采用基于递归神经网络(RNN)的模型学习随机过程的时间演化非线性算子。我们使用softmax层对光滑PDF进行数值离散化，将函数近似问题转化为分类任务。引入正则化交叉熵方法对估计的概率分布施加平滑条件。提出了一种计算多步预测分布时间演化的蒙特卡罗方法。结果表明，该算法能够从噪声观测中学习非线性多尺度动态，为预测底层概率分布的时间演化提供了有效工具。在三个合成数据集和两个真实数据集上对该算法进行了评估，结果表明该算法优于比较基线，并且对物理和工程中的广泛问题具有潜在价值。

{"title":"DE-RNN: Forecasting the Probability Density Function of Nonlinear Time Series","authors":"K. Yeo, Igor Melnyk, Nam H. Nguyen, Eun Kyung Lee","doi":"10.1109/ICDM.2018.00085","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00085","url":null,"abstract":"Model-free identification of a nonlinear dynamical system from the noisy observations is of current interest due to its direct relevance to many applications in Industry 4.0. Making a prediction of such noisy time series constitutes a problem of learning the nonlinear time evolution of a probability distribution. Capability of most of the conventional time series models is limited when the underlying dynamics is nonlinear, multi-scale or when there is no prior knowledge at all on the system dynamics. We propose DE-RNN (Density Estimation Recurrent Neural Network) to learn the probability density function (PDF) of a stochastic process with an underlying nonlinear dynamics and compute the time evolution of the PDF for a probabilistic forecast. A Recurrent Neural Network (RNN)-based model is employed to learn a nonlinear operator for the temporal evolution of the stochastic process. We use a softmax layer for a numerical discretization of a smooth PDF, which transforms a function approximation problem to a classification task. A regularized cross-entropy method is introduced to impose a smoothness condition on the estimated probability distribution. A Monte Carlo procedure to compute the temporal evolution of the distribution for a multiple-step forecast is presented. It is shown that the proposed algorithm can learn the nonlinear multi-scale dynamics from the noisy observations and provides an effective tool to forecast time evolution of the underlying probability distribution. Evaluation of the algorithm on three synthetic and two real data sets shows advantage over the compared baselines, and a potential value to a wide range of problems in physics and engineering.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133289966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Pseudo-Implicit Feedback for Alleviating Data Sparsity in Top-K Recommendation 缓解Top-K推荐中数据稀疏性的伪隐式反馈

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00129

Yun He, Haochen Chen, Ziwei Zhu, James Caverlee

We propose PsiRec, a novel user preference propagation recommender that incorporates pseudo-implicit feedback for enriching the original sparse implicit feedback dataset. Three of the unique characteristics of PsiRec are: (i) it views user-item interactions as a bipartite graph and models pseudo-implicit feedback from this perspective; (ii) its random walks-based approach extracts graph structure information from this bipartite graph, toward estimating pseudo-implicit feedback; and (iii) it adopts a Skip-gram inspired measure of confidence in pseudo-implicit feedback that captures the pointwise mutual information between users and items. This pseudo-implicit feedback is ultimately incorporated into a new latent factor model to estimate user preference in cases of extreme sparsity. PsiRec results in improvements of 21.5% and 22.7% in terms of Precision@10 and Recall@10 over state-of-the-art Collaborative Denoising Auto-Encoders. Our implementation is available at https://github.com/heyunh2015/PsiRecICDM2018.

我们提出了一种新的用户偏好传播推荐器PsiRec，它结合了伪隐式反馈来丰富原始稀疏隐式反馈数据集。PsiRec的三个独特特征是:(i)它将用户-项目交互视为一个二部图，并从这个角度建模伪隐式反馈;(ii)基于随机行走的方法从二部图中提取图结构信息，用于估计伪隐式反馈;(iii)在伪隐式反馈中采用Skip-gram启发的置信度度量，捕获用户和项目之间的点对点相互信息。这种伪隐式反馈最终被纳入一个新的潜在因素模型，以估计极端稀疏情况下的用户偏好。与最先进的协同去噪自动编码器相比，PsiRec在Precision@10和Recall@10方面分别提高了21.5%和22.7%。我们的实现可以在https://github.com/heyunh2015/PsiRecICDM2018上获得。

引用次数: 6

Feature-Induced Partial Multi-label Learning 特征诱导部分多标签学习

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00192

Guoxian Yu, Xia Chen, C. Domeniconi, J. Wang, Zhao Li, Z. Zhang, Xindong Wu

Current efforts on multi-label learning generally assume that the given labels of training instances are noise-free. However, obtaining noise-free labels is quite difficult and often impractical, and the presence of noisy labels may compromise the performance of multi-label learning. Partial multi-label learning (PML) addresses the scenario in which each instance is annotated with a set of candidate labels, of which only a subset corresponds to the ground-truth. The PML problem is more challenging than partial-label learning, since the latter assumes that only one label is valid and may ignore the correlation among candidate labels. To tackle the PML challenge, we introduce a feature induced PML approach called fPML, which simultaneously estimates noisy labels and trains multi-label classifiers. In particular, fPML simultaneously factorizes the observed instance-label association matrix and the instance-feature matrix into low-rank matrices to achieve coherent low-rank matrices from the label and the feature spaces, and a low-rank label correlation matrix as well. The low-rank approximation of the instance-label association matrix is leveraged to estimate the association confidence. To predict the labels of unlabeled instances, fPML learns a matrix that maps the instances to labels based on the estimated association confidence. An empirical study on public multi-label datasets with injected noisy labels, and on archived proteomic datasets, shows that fPML can more accurately identify noisy labels than related solutions, and consequently can achieve better performance on predicting labels of instances than competitive methods.

目前在多标签学习方面的研究通常假设训练实例的给定标签是无噪声的。然而，获得无噪声标签是相当困难的，而且通常是不切实际的，并且噪声标签的存在可能会损害多标签学习的性能。部分多标签学习(PML)解决了用一组候选标签对每个实例进行注释的场景，其中只有一个子集对应于基本事实。PML问题比部分标签学习更具挑战性，因为后者假设只有一个标签是有效的，并且可能忽略候选标签之间的相关性。为了解决PML的挑战，我们引入了一种称为fPML的特征诱导PML方法，该方法同时估计噪声标签和训练多标签分类器。特别是，fPML将观察到的实例-标签关联矩阵和实例-特征矩阵同时分解为低秩矩阵，从而从标签和特征空间中获得一致的低秩矩阵，以及低秩标签相关矩阵。利用实例-标签关联矩阵的低秩近似来估计关联置信度。为了预测未标记实例的标签，fPML学习一个矩阵，该矩阵根据估计的关联置信度将实例映射到标签。通过对带有噪声标签的公共多标签数据集和已存档的蛋白质组学数据集的实证研究表明，fPML比相关解决方案更准确地识别噪声标签，从而在预测实例标签方面比竞争方法取得更好的性能。

{"title":"Feature-Induced Partial Multi-label Learning","authors":"Guoxian Yu, Xia Chen, C. Domeniconi, J. Wang, Zhao Li, Z. Zhang, Xindong Wu","doi":"10.1109/ICDM.2018.00192","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00192","url":null,"abstract":"Current efforts on multi-label learning generally assume that the given labels of training instances are noise-free. However, obtaining noise-free labels is quite difficult and often impractical, and the presence of noisy labels may compromise the performance of multi-label learning. Partial multi-label learning (PML) addresses the scenario in which each instance is annotated with a set of candidate labels, of which only a subset corresponds to the ground-truth. The PML problem is more challenging than partial-label learning, since the latter assumes that only one label is valid and may ignore the correlation among candidate labels. To tackle the PML challenge, we introduce a feature induced PML approach called fPML, which simultaneously estimates noisy labels and trains multi-label classifiers. In particular, fPML simultaneously factorizes the observed instance-label association matrix and the instance-feature matrix into low-rank matrices to achieve coherent low-rank matrices from the label and the feature spaces, and a low-rank label correlation matrix as well. The low-rank approximation of the instance-label association matrix is leveraged to estimate the association confidence. To predict the labels of unlabeled instances, fPML learns a matrix that maps the instances to labels based on the estimated association confidence. An empirical study on public multi-label datasets with injected noisy labels, and on archived proteomic datasets, shows that fPML can more accurately identify noisy labels than related solutions, and consequently can achieve better performance on predicting labels of instances than competitive methods.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121863183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 59

An Integrated Model for Crime Prediction Using Temporal and Spatial Factors 基于时空因素的犯罪预测综合模型

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00190

Fei Yi, Zhiwen Yu, Fuzhen Zhuang, X. Zhang, Hui Xiong

Given its importance, crime prediction has attracted a lot of attention in the literature, and several methods have been proposed to discover different aspects of characteristics for crime prediction. In this paper, we propose a Clustered Continuous Conditional Random Field (Clustered-CCRF) model which is able to effectively exploit both spatial and temporal factors for crime prediction in an integrated way. In particular, we observe that the crime number at one specific area is not only conditioned on its own historical records but also has high correlation to crime records from similar areas. Therefore, we propose two factors: an auto-regressed temporal correlation and a feature-based inter-area spatial correlation, to measure such patterns for crime prediction. Further, we present a tree-structured clustering algorithm to discover high similar areas based on spatial characteristics to improve the performance of our proposed model. Experiments on real-world crime dataset demonstrate the superiority of our proposed model over the state-of-the-art methods.

鉴于其重要性，犯罪预测在文献中引起了很多关注，并提出了几种方法来发现犯罪预测的不同方面的特征。本文提出了一种聚类连续条件随机场(Clustered- ccrf)模型，该模型能够有效地综合利用空间和时间因素进行犯罪预测。特别是，我们观察到，一个特定地区的犯罪数量不仅取决于其自身的历史记录，而且与类似地区的犯罪记录有很高的相关性。因此，我们提出了两个因素:一个自回归的时间相关性和一个基于特征的区域间空间相关性，以衡量犯罪预测的这种模式。此外，我们提出了一种基于空间特征的树结构聚类算法来发现高相似区域，以提高我们提出的模型的性能。在真实犯罪数据集上的实验证明了我们提出的模型优于最先进的方法。

引用次数: 42

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2018 IEEE International Conference on Data Mining (ICDM)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀