The goal of crime prediction is to forecast the number of crime incidents at each region of a city based on the historical crime data. It has attracted a great deal of attention from both academic and industrial communities due to its considerable significance in improving urban safety and reducing financial losses. Although much progress has been made in this field, most of the existing approaches assume that the historical crime data are complete, which does not hold in many real-world scenarios. Meanwhile, crime incidents are affected by multiple factors and have intricate spatial, temporal, and categorical correlations, which are not fully utilized by the current methods. In this article, we propose a novel tensor decomposition based framework, named TD-Crime, to conduct prediction directly on the incomplete crime data. Specifically, we first organize the crime data as a tensor and then apply the nonnegative CP decomposition to it, which not only provides a natural solution to the missing data problem but also captures the spatial, temporal, and categorical correlations implicitly. Moreover, we attempt to exploit the spatial and temporal correlations explicitly by directly learning from the crime data to further improve the forecasting performance. Finally, we obtain a joint optimization problem and present an efficient alternating optimization scheme to find a satisfactory solution. Extensive experiments on the real-world crime datasets show that TD-Crime can address the crime prediction task effectively under different missing data scenarios.
{"title":"Crime Prediction With Missing Data Via Spatiotemporal Regularized Tensor Decomposition","authors":"Weichao Liang;Jie Cao;Lei Chen;Youquan Wang;Jia Wu;Amin Beheshti;Jiangnan Tang","doi":"10.1109/TBDATA.2023.3283098","DOIUrl":"10.1109/TBDATA.2023.3283098","url":null,"abstract":"The goal of crime prediction is to forecast the number of crime incidents at each region of a city based on the historical crime data. It has attracted a great deal of attention from both academic and industrial communities due to its considerable significance in improving urban safety and reducing financial losses. Although much progress has been made in this field, most of the existing approaches assume that the historical crime data are complete, which does not hold in many real-world scenarios. Meanwhile, crime incidents are affected by multiple factors and have intricate spatial, temporal, and categorical correlations, which are not fully utilized by the current methods. In this article, we propose a novel tensor decomposition based framework, named TD-Crime, to conduct prediction directly on the incomplete crime data. Specifically, we first organize the crime data as a tensor and then apply the nonnegative CP decomposition to it, which not only provides a natural solution to the missing data problem but also captures the spatial, temporal, and categorical correlations implicitly. Moreover, we attempt to exploit the spatial and temporal correlations explicitly by directly learning from the crime data to further improve the forecasting performance. Finally, we obtain a joint optimization problem and present an efficient alternating optimization scheme to find a satisfactory solution. Extensive experiments on the real-world crime datasets show that TD-Crime can address the crime prediction task effectively under different missing data scenarios.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 5","pages":"1392-1407"},"PeriodicalIF":7.2,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42653830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-17DOI: 10.1109/TBDATA.2023.3265509
Jiachen Zhao;Fang Deng;Jiaqi Zhu;Jie Chen
Unsupervised anomaly detection (AD) is a challenging problem in the data mining community. Clustering-based AD methods aim to group normal data points into clusters and then regard a point belonging to none of the clusters as an anomaly. However, they may suffer from the problems of unknown cluster numbers and arbitrary cluster shapes. This paper presents a novel clustering-based AD method named Density-increasing Path (DIP) to tackle these challenges. DIP searches a path for each data point. The path starts at the data point itself, passes through several points with monotonically increasing densities, and ends at a density peak. Further, DIP defines the climbing difficulty of each path by combining the distance and density increment of each step along the path, which can be regarded as the anomaly score of the path starting point. DIP can adaptively decide the number of peaks to address the challenge of unknown cluster numbers. Since DIP requires the path to pass several points rather than directly reaching the peak, it handles arbitrary cluster shapes. We also propose the ensemble DIP to improve prediction accuracy. The experimental results on four synthetic datasets and eleven real-world benchmarks demonstrate that DIP outperforms existing methods.
{"title":"Searching Density-Increasing Path to Local Density Peaks for Unsupervised Anomaly Detection","authors":"Jiachen Zhao;Fang Deng;Jiaqi Zhu;Jie Chen","doi":"10.1109/TBDATA.2023.3265509","DOIUrl":"10.1109/TBDATA.2023.3265509","url":null,"abstract":"Unsupervised anomaly detection (AD) is a challenging problem in the data mining community. Clustering-based AD methods aim to group normal data points into clusters and then regard a point belonging to none of the clusters as an anomaly. However, they may suffer from the problems of unknown cluster numbers and arbitrary cluster shapes. This paper presents a novel clustering-based AD method named Density-increasing Path (DIP) to tackle these challenges. DIP searches a path for each data point. The path starts at the data point itself, passes through several points with monotonically increasing densities, and ends at a density peak. Further, DIP defines the climbing difficulty of each path by combining the distance and density increment of each step along the path, which can be regarded as the anomaly score of the path starting point. DIP can adaptively decide the number of peaks to address the challenge of unknown cluster numbers. Since DIP requires the path to pass several points rather than directly reaching the peak, it handles arbitrary cluster shapes. We also propose the ensemble DIP to improve prediction accuracy. The experimental results on four synthetic datasets and eleven real-world benchmarks demonstrate that DIP outperforms existing methods.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 4","pages":"1198-1209"},"PeriodicalIF":7.2,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41278104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-12DOI: 10.1109/TBDATA.2023.3266590
Yanbei Liu;Shichuan Zhao;Xiao Wang;Lei Geng;Zhitao Xiao;Jerry Chun-Wei Lin
Graph Neural Networks (GNNs), the powerful graph representation technique based on deep learning, have attracted great research interest in recent years. Although many GNNs have achieved the state-of-the-art accuracy on a set of standard benchmark datasets, they are still limited to traditional semi-supervised framework and lack of sufficient supervision information, especially for the large amount of unlabeled data. To overcome this issue, we propose a novel self-consistent graph neural networks (SCGNN) framework to enrich the supervision information from two aspects: the self-consistency of unlabeled data and the label information of labeled data. First, in order to extract the self-supervision information