P. Agarwal, K. Fox, Kamesh Munagala, Abhinandan Nath, Jiangwei Pan, Erin Taylor
{"title":"Subtrajectory Clustering: Models and Algorithms","authors":"P. Agarwal, K. Fox, Kamesh Munagala, Abhinandan Nath, Jiangwei Pan, Erin Taylor","doi":"10.1145/3196959.3196972","DOIUrl":null,"url":null,"abstract":"We propose a model for subtrajectory clustering ---the clustering of subsequences of trajectories; each cluster of subtrajectories is represented as a pathlet, a sequence of points that is not necessarily a subsequence of an input trajectory. Given a set of trajectories, our clustering model attempts to capture the shared portions between them by assuming each trajectory is a concatenation of a small set of pathlets, with possible gaps in between. We present a single objective function for finding the optimal collection of pathlets that best represents the trajectories taking into account noise and other artifacts of the data. We show that the subtrajectory clustering problem is NP-Hard and present fast approximation algorithms for subtrajectory clustering. We further improve the running time of our algorithm if the input trajectories are \"well-behaved.\" Finally, we present experimental results on both real and synthetic data sets. We show via visualization and quantitative analysis that the algorithm indeed handles the desiderata of being robust to variations, being efficient and accurate, and being data-driven.","PeriodicalId":344370,"journal":{"name":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3196959.3196972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35
Abstract
We propose a model for subtrajectory clustering ---the clustering of subsequences of trajectories; each cluster of subtrajectories is represented as a pathlet, a sequence of points that is not necessarily a subsequence of an input trajectory. Given a set of trajectories, our clustering model attempts to capture the shared portions between them by assuming each trajectory is a concatenation of a small set of pathlets, with possible gaps in between. We present a single objective function for finding the optimal collection of pathlets that best represents the trajectories taking into account noise and other artifacts of the data. We show that the subtrajectory clustering problem is NP-Hard and present fast approximation algorithms for subtrajectory clustering. We further improve the running time of our algorithm if the input trajectories are "well-behaved." Finally, we present experimental results on both real and synthetic data sets. We show via visualization and quantitative analysis that the algorithm indeed handles the desiderata of being robust to variations, being efficient and accurate, and being data-driven.