Shaghayegh Gharghabi, Shima Imani, A. Bagnall, Amirali Darvishzadeh, Eamonn J. Keogh
At their core, many time series data mining algorithms can be reduced to reasoning about the shapes of time series subsequences. This requires a distance measure, and most algorithms use Euclidean Distance or Dynamic Time Warping (DTW) as their core subroutine. We argue that these distance measures are not as robust as the community believes. The undue faith in these measures derives from an overreliance on benchmark datasets and self-selection bias. The community is reluctant to address more difficult domains, for which current distance measures are ill-suited. In this work, we introduce a novel distance measure MPdist. We show that our proposed distance measure is much more robust than current distance measures. Furthermore, it allows us to successfully mine datasets that would defeat any Euclidean or DTW distance-based algorithm. Additionally, we show that our distance measure can be computed so efficiently, it allows analytics on fast streams.
{"title":"Matrix Profile XII: MPdist: A Novel Time Series Distance Measure to Allow Data Mining in More Challenging Scenarios","authors":"Shaghayegh Gharghabi, Shima Imani, A. Bagnall, Amirali Darvishzadeh, Eamonn J. Keogh","doi":"10.1109/ICDM.2018.00119","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00119","url":null,"abstract":"At their core, many time series data mining algorithms can be reduced to reasoning about the shapes of time series subsequences. This requires a distance measure, and most algorithms use Euclidean Distance or Dynamic Time Warping (DTW) as their core subroutine. We argue that these distance measures are not as robust as the community believes. The undue faith in these measures derives from an overreliance on benchmark datasets and self-selection bias. The community is reluctant to address more difficult domains, for which current distance measures are ill-suited. In this work, we introduce a novel distance measure MPdist. We show that our proposed distance measure is much more robust than current distance measures. Furthermore, it allows us to successfully mine datasets that would defeat any Euclidean or DTW distance-based algorithm. Additionally, we show that our distance measure can be computed so efficiently, it allows analytics on fast streams.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123523271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The collective behavior, describing spontaneously emerging social processes and events, is ubiquitous in both physical society and online social media. The knowledge of collective behavior is critical in understanding and predicting social movements, fads, riots and so on. However, detecting, quantifying and modeling the collective behavior in online social media at large scale are seldom unexplored. In this paper, we examine a real-world online social media with more than 1.7 million information spreading records, which explicitly document the detailed human behavior in this online information cascading system. We observe evident collective behavior in information cascading, and then propose metrics to quantify the collectivity. We find that previous information cascading models cannot capture the collective behavior in the real-world and thus never utilize it. Furthermore, we propose a generative framework with a latent user interest layer to capture the collective behavior in cascading system. Our framework achieves high accuracy in modeling the information cascades with respect to popularity, structure and collectivity. By leveraging the knowledge of collective behavior, our model shows the capability of making predictions without temporal features or early-stage information. Our framework can serve as a more generalized one in modeling cascading system, and, together with empirical discovery and applications, advance our understanding of human behavior.
{"title":"Collective Human Behavior in Cascading System: Discovery, Modeling and Applications","authors":"Yunfei Lu, Linyun Yu, T. Zhang, Chengxi Zang, Peng Cui, Chaoming Song, Wenwu Zhu","doi":"10.1109/ICDM.2018.00045","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00045","url":null,"abstract":"The collective behavior, describing spontaneously emerging social processes and events, is ubiquitous in both physical society and online social media. The knowledge of collective behavior is critical in understanding and predicting social movements, fads, riots and so on. However, detecting, quantifying and modeling the collective behavior in online social media at large scale are seldom unexplored. In this paper, we examine a real-world online social media with more than 1.7 million information spreading records, which explicitly document the detailed human behavior in this online information cascading system. We observe evident collective behavior in information cascading, and then propose metrics to quantify the collectivity. We find that previous information cascading models cannot capture the collective behavior in the real-world and thus never utilize it. Furthermore, we propose a generative framework with a latent user interest layer to capture the collective behavior in cascading system. Our framework achieves high accuracy in modeling the information cascades with respect to popularity, structure and collectivity. By leveraging the knowledge of collective behavior, our model shows the capability of making predictions without temporal features or early-stage information. Our framework can serve as a more generalized one in modeling cascading system, and, together with empirical discovery and applications, advance our understanding of human behavior.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126492799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Publisher's Information","authors":"","doi":"10.1109/icdm.2018.00206","DOIUrl":"https://doi.org/10.1109/icdm.2018.00206","url":null,"abstract":"","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131391586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shan-Yun Teng, Jundong Li, Lo Pang-Yun Ting, Kun-Ta Chuang, Huan Liu
The arise of E-learning systems has led to an anytime-anywhere-learning environment for everyone by providing various online courses and tests. However, due to the lack of teacher-student interaction, such ubiquitous learning is generally not as effective as offline classes. In traditional offline courses, teachers facilitate real-time interaction to teach students in accordance with personal aptitude from students' feedback in classes. Without the interruption of instructors, it is difficult for users to be aware of personal unknowns. In this paper, we address an important issue on the exploration of 'user unknowns' from an interactive question-answering process in E-learning systems. A novel interactive learning system, called CagMab, is devised to interactively recommend questions with a round-by-round strategy, which contributes to applications such as a conversational bot for self-evaluation. The flow enables users to discover their weakness and further helps them to progress. In fact, despite its importance, discovering personal unknowns remains a challenging problem in E-learning systems. Even though formulating the problem with the multi-armed bandit framework provides a solution, it often leads to suboptimal results for interactive unknowns recommendation as it simply relies on the contextual features of answered questions. Note that each question is associated with concepts and similar concepts are likely to be linked manually or systematically, which naturally forms the concept graphs. Mining the rich relationships among users, questions and concepts could be potentially helpful in providing better unknowns recommendation. To this end, in this paper, we develop a novel interactive learning framework by borrowing strengths from concept-aware graph embedding for learning user unknowns. Our experimental studies on real data show that the proposed framework can effectively discover user unknowns in an interactive fashion for the recommendation in E-learning systems.
{"title":"Interactive Unknowns Recommendation in E-Learning Systems","authors":"Shan-Yun Teng, Jundong Li, Lo Pang-Yun Ting, Kun-Ta Chuang, Huan Liu","doi":"10.1109/ICDM.2018.00065","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00065","url":null,"abstract":"The arise of E-learning systems has led to an anytime-anywhere-learning environment for everyone by providing various online courses and tests. However, due to the lack of teacher-student interaction, such ubiquitous learning is generally not as effective as offline classes. In traditional offline courses, teachers facilitate real-time interaction to teach students in accordance with personal aptitude from students' feedback in classes. Without the interruption of instructors, it is difficult for users to be aware of personal unknowns. In this paper, we address an important issue on the exploration of 'user unknowns' from an interactive question-answering process in E-learning systems. A novel interactive learning system, called CagMab, is devised to interactively recommend questions with a round-by-round strategy, which contributes to applications such as a conversational bot for self-evaluation. The flow enables users to discover their weakness and further helps them to progress. In fact, despite its importance, discovering personal unknowns remains a challenging problem in E-learning systems. Even though formulating the problem with the multi-armed bandit framework provides a solution, it often leads to suboptimal results for interactive unknowns recommendation as it simply relies on the contextual features of answered questions. Note that each question is associated with concepts and similar concepts are likely to be linked manually or systematically, which naturally forms the concept graphs. Mining the rich relationships among users, questions and concepts could be potentially helpful in providing better unknowns recommendation. To this end, in this paper, we develop a novel interactive learning framework by borrowing strengths from concept-aware graph embedding for learning user unknowns. Our experimental studies on real data show that the proposed framework can effectively discover user unknowns in an interactive fashion for the recommendation in E-learning systems.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132381686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A theory is developed to unify the original form, and its many variations, of the mobile sequential recommendation (MSR) problem. The unified theory, expressing the same MSR problem, is superior to the original form in many aspects including a more standardized form. In addition to a newly proposed expected traveling time (ETT) function to measure the quality of recommended routes, we introduce five additional improvements. Also, three essential mathematical properties of the new objective function enable the development of the methods to solve realistic MSR problems with complex conditions. The MSR solutions also support the discovered properties of the proposed objective function. The unified theory should support the long-term decision making for drivers and the traffic department in general.
{"title":"A Unified Theory of the Mobile Sequential Recommendation Problem","authors":"Zeyang Ye, Keli Xiao, Yuefan Deng","doi":"10.1109/ICDM.2018.00189","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00189","url":null,"abstract":"A theory is developed to unify the original form, and its many variations, of the mobile sequential recommendation (MSR) problem. The unified theory, expressing the same MSR problem, is superior to the original form in many aspects including a more standardized form. In addition to a newly proposed expected traveling time (ETT) function to measure the quality of recommended routes, we introduce five additional improvements. Also, three essential mathematical properties of the new objective function enable the development of the methods to solve realistic MSR problems with complex conditions. The MSR solutions also support the discovered properties of the proposed objective function. The unified theory should support the long-term decision making for drivers and the traffic department in general.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132468788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a vital process to the success of an organization, salary benchmarking aims at identifying the right market rate for each job position. Traditional approaches for salary benchmarking heavily rely on the experiences from domain experts and limited market survey data, which have difficulties in handling the dynamic scenarios with the timely benchmarking requirement. To this end, in this paper, we propose a data-driven approach for intelligent salary benchmarking based on large-scale fine-grained online recruitment data. Specifically, we first construct a salary matrix based on the large-scale recruitment data and creatively formalize the salary benchmarking problem as a matrix completion task. Along this line, we develop a Holistic Salary Benchmarking Matrix Factorization (HSBMF) model for predicting the missing salary information in the salary matrix. Indeed, by integrating multiple confounding factors, such as company similarity, job similarity, and spatial-temporal similarity, HSBMF is able to provide a holistic and dynamic view for fine-grained salary benchmarking. Finally, extensive experiments on large-scale real-world data clearly validate the effectiveness of our approach for job salary benchmarking.
{"title":"Intelligent Salary Benchmarking for Talent Recruitment: A Holistic Matrix Factorization Approach","authors":"Qingxin Meng, Hengshu Zhu, Keli Xiao, Hui Xiong","doi":"10.1109/ICDM.2018.00049","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00049","url":null,"abstract":"As a vital process to the success of an organization, salary benchmarking aims at identifying the right market rate for each job position. Traditional approaches for salary benchmarking heavily rely on the experiences from domain experts and limited market survey data, which have difficulties in handling the dynamic scenarios with the timely benchmarking requirement. To this end, in this paper, we propose a data-driven approach for intelligent salary benchmarking based on large-scale fine-grained online recruitment data. Specifically, we first construct a salary matrix based on the large-scale recruitment data and creatively formalize the salary benchmarking problem as a matrix completion task. Along this line, we develop a Holistic Salary Benchmarking Matrix Factorization (HSBMF) model for predicting the missing salary information in the salary matrix. Indeed, by integrating multiple confounding factors, such as company similarity, job similarity, and spatial-temporal similarity, HSBMF is able to provide a holistic and dynamic view for fine-grained salary benchmarking. Finally, extensive experiments on large-scale real-world data clearly validate the effectiveness of our approach for job salary benchmarking.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131159001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Model-free identification of a nonlinear dynamical system from the noisy observations is of current interest due to its direct relevance to many applications in Industry 4.0. Making a prediction of such noisy time series constitutes a problem of learning the nonlinear time evolution of a probability distribution. Capability of most of the conventional time series models is limited when the underlying dynamics is nonlinear, multi-scale or when there is no prior knowledge at all on the system dynamics. We propose DE-RNN (Density Estimation Recurrent Neural Network) to learn the probability density function (PDF) of a stochastic process with an underlying nonlinear dynamics and compute the time evolution of the PDF for a probabilistic forecast. A Recurrent Neural Network (RNN)-based model is employed to learn a nonlinear operator for the temporal evolution of the stochastic process. We use a softmax layer for a numerical discretization of a smooth PDF, which transforms a function approximation problem to a classification task. A regularized cross-entropy method is introduced to impose a smoothness condition on the estimated probability distribution. A Monte Carlo procedure to compute the temporal evolution of the distribution for a multiple-step forecast is presented. It is shown that the proposed algorithm can learn the nonlinear multi-scale dynamics from the noisy observations and provides an effective tool to forecast time evolution of the underlying probability distribution. Evaluation of the algorithm on three synthetic and two real data sets shows advantage over the compared baselines, and a potential value to a wide range of problems in physics and engineering.
{"title":"DE-RNN: Forecasting the Probability Density Function of Nonlinear Time Series","authors":"K. Yeo, Igor Melnyk, Nam H. Nguyen, Eun Kyung Lee","doi":"10.1109/ICDM.2018.00085","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00085","url":null,"abstract":"Model-free identification of a nonlinear dynamical system from the noisy observations is of current interest due to its direct relevance to many applications in Industry 4.0. Making a prediction of such noisy time series constitutes a problem of learning the nonlinear time evolution of a probability distribution. Capability of most of the conventional time series models is limited when the underlying dynamics is nonlinear, multi-scale or when there is no prior knowledge at all on the system dynamics. We propose DE-RNN (Density Estimation Recurrent Neural Network) to learn the probability density function (PDF) of a stochastic process with an underlying nonlinear dynamics and compute the time evolution of the PDF for a probabilistic forecast. A Recurrent Neural Network (RNN)-based model is employed to learn a nonlinear operator for the temporal evolution of the stochastic process. We use a softmax layer for a numerical discretization of a smooth PDF, which transforms a function approximation problem to a classification task. A regularized cross-entropy method is introduced to impose a smoothness condition on the estimated probability distribution. A Monte Carlo procedure to compute the temporal evolution of the distribution for a multiple-step forecast is presented. It is shown that the proposed algorithm can learn the nonlinear multi-scale dynamics from the noisy observations and provides an effective tool to forecast time evolution of the underlying probability distribution. Evaluation of the algorithm on three synthetic and two real data sets shows advantage over the compared baselines, and a potential value to a wide range of problems in physics and engineering.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133289966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose PsiRec, a novel user preference propagation recommender that incorporates pseudo-implicit feedback for enriching the original sparse implicit feedback dataset. Three of the unique characteristics of PsiRec are: (i) it views user-item interactions as a bipartite graph and models pseudo-implicit feedback from this perspective; (ii) its random walks-based approach extracts graph structure information from this bipartite graph, toward estimating pseudo-implicit feedback; and (iii) it adopts a Skip-gram inspired measure of confidence in pseudo-implicit feedback that captures the pointwise mutual information between users and items. This pseudo-implicit feedback is ultimately incorporated into a new latent factor model to estimate user preference in cases of extreme sparsity. PsiRec results in improvements of 21.5% and 22.7% in terms of Precision@10 and Recall@10 over state-of-the-art Collaborative Denoising Auto-Encoders. Our implementation is available at https://github.com/heyunh2015/PsiRecICDM2018.
{"title":"Pseudo-Implicit Feedback for Alleviating Data Sparsity in Top-K Recommendation","authors":"Yun He, Haochen Chen, Ziwei Zhu, James Caverlee","doi":"10.1109/ICDM.2018.00129","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00129","url":null,"abstract":"We propose PsiRec, a novel user preference propagation recommender that incorporates pseudo-implicit feedback for enriching the original sparse implicit feedback dataset. Three of the unique characteristics of PsiRec are: (i) it views user-item interactions as a bipartite graph and models pseudo-implicit feedback from this perspective; (ii) its random walks-based approach extracts graph structure information from this bipartite graph, toward estimating pseudo-implicit feedback; and (iii) it adopts a Skip-gram inspired measure of confidence in pseudo-implicit feedback that captures the pointwise mutual information between users and items. This pseudo-implicit feedback is ultimately incorporated into a new latent factor model to estimate user preference in cases of extreme sparsity. PsiRec results in improvements of 21.5% and 22.7% in terms of Precision@10 and Recall@10 over state-of-the-art Collaborative Denoising Auto-Encoders. Our implementation is available at https://github.com/heyunh2015/PsiRecICDM2018.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134624331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guoxian Yu, Xia Chen, C. Domeniconi, J. Wang, Zhao Li, Z. Zhang, Xindong Wu
Current efforts on multi-label learning generally assume that the given labels of training instances are noise-free. However, obtaining noise-free labels is quite difficult and often impractical, and the presence of noisy labels may compromise the performance of multi-label learning. Partial multi-label learning (PML) addresses the scenario in which each instance is annotated with a set of candidate labels, of which only a subset corresponds to the ground-truth. The PML problem is more challenging than partial-label learning, since the latter assumes that only one label is valid and may ignore the correlation among candidate labels. To tackle the PML challenge, we introduce a feature induced PML approach called fPML, which simultaneously estimates noisy labels and trains multi-label classifiers. In particular, fPML simultaneously factorizes the observed instance-label association matrix and the instance-feature matrix into low-rank matrices to achieve coherent low-rank matrices from the label and the feature spaces, and a low-rank label correlation matrix as well. The low-rank approximation of the instance-label association matrix is leveraged to estimate the association confidence. To predict the labels of unlabeled instances, fPML learns a matrix that maps the instances to labels based on the estimated association confidence. An empirical study on public multi-label datasets with injected noisy labels, and on archived proteomic datasets, shows that fPML can more accurately identify noisy labels than related solutions, and consequently can achieve better performance on predicting labels of instances than competitive methods.
{"title":"Feature-Induced Partial Multi-label Learning","authors":"Guoxian Yu, Xia Chen, C. Domeniconi, J. Wang, Zhao Li, Z. Zhang, Xindong Wu","doi":"10.1109/ICDM.2018.00192","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00192","url":null,"abstract":"Current efforts on multi-label learning generally assume that the given labels of training instances are noise-free. However, obtaining noise-free labels is quite difficult and often impractical, and the presence of noisy labels may compromise the performance of multi-label learning. Partial multi-label learning (PML) addresses the scenario in which each instance is annotated with a set of candidate labels, of which only a subset corresponds to the ground-truth. The PML problem is more challenging than partial-label learning, since the latter assumes that only one label is valid and may ignore the correlation among candidate labels. To tackle the PML challenge, we introduce a feature induced PML approach called fPML, which simultaneously estimates noisy labels and trains multi-label classifiers. In particular, fPML simultaneously factorizes the observed instance-label association matrix and the instance-feature matrix into low-rank matrices to achieve coherent low-rank matrices from the label and the feature spaces, and a low-rank label correlation matrix as well. The low-rank approximation of the instance-label association matrix is leveraged to estimate the association confidence. To predict the labels of unlabeled instances, fPML learns a matrix that maps the instances to labels based on the estimated association confidence. An empirical study on public multi-label datasets with injected noisy labels, and on archived proteomic datasets, shows that fPML can more accurately identify noisy labels than related solutions, and consequently can achieve better performance on predicting labels of instances than competitive methods.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121863183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given its importance, crime prediction has attracted a lot of attention in the literature, and several methods have been proposed to discover different aspects of characteristics for crime prediction. In this paper, we propose a Clustered Continuous Conditional Random Field (Clustered-CCRF) model which is able to effectively exploit both spatial and temporal factors for crime prediction in an integrated way. In particular, we observe that the crime number at one specific area is not only conditioned on its own historical records but also has high correlation to crime records from similar areas. Therefore, we propose two factors: an auto-regressed temporal correlation and a feature-based inter-area spatial correlation, to measure such patterns for crime prediction. Further, we present a tree-structured clustering algorithm to discover high similar areas based on spatial characteristics to improve the performance of our proposed model. Experiments on real-world crime dataset demonstrate the superiority of our proposed model over the state-of-the-art methods.
{"title":"An Integrated Model for Crime Prediction Using Temporal and Spatial Factors","authors":"Fei Yi, Zhiwen Yu, Fuzhen Zhuang, X. Zhang, Hui Xiong","doi":"10.1109/ICDM.2018.00190","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00190","url":null,"abstract":"Given its importance, crime prediction has attracted a lot of attention in the literature, and several methods have been proposed to discover different aspects of characteristics for crime prediction. In this paper, we propose a Clustered Continuous Conditional Random Field (Clustered-CCRF) model which is able to effectively exploit both spatial and temporal factors for crime prediction in an integrated way. In particular, we observe that the crime number at one specific area is not only conditioned on its own historical records but also has high correlation to crime records from similar areas. Therefore, we propose two factors: an auto-regressed temporal correlation and a feature-based inter-area spatial correlation, to measure such patterns for crime prediction. Further, we present a tree-structured clustering algorithm to discover high similar areas based on spatial characteristics to improve the performance of our proposed model. Experiments on real-world crime dataset demonstrate the superiority of our proposed model over the state-of-the-art methods.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125849480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}