Ivor van der Hoog, Thijs van der Horst, Tim Ophelders
{"title":"Faster and Deterministic Subtrajectory Clustering","authors":"Ivor van der Hoog, Thijs van der Horst, Tim Ophelders","doi":"arxiv-2402.13117","DOIUrl":null,"url":null,"abstract":"Given a trajectory $T$ and a distance $\\Delta$, we wish to find a set $C$ of\ncurves of complexity at most $\\ell$, such that we can cover $T$ with subcurves\nthat each are within Fr\\'echet distance $\\Delta$ to at least one curve in $C$.\nWe call $C$ an $(\\ell,\\Delta)$-clustering and aim to find an\n$(\\ell,\\Delta)$-clustering of minimum cardinality. This problem was introduced\nby Akitaya $et$ $al.$ (2021) and shown to be NP-complete. The main focus has\ntherefore been on bicriterial approximation algorithms, allowing for the\nclustering to be an $(\\ell, \\Theta(\\Delta))$-clustering of roughly optimal\nsize. We present algorithms that construct $(\\ell,4\\Delta)$-clusterings of\n$\\mathcal{O}(k \\log n)$ size, where $k$ is the size of the optimal $(\\ell,\n\\Delta)$-clustering. For the discrete Fr\\'echet distance, we use $\\mathcal{O}(n\n\\ell \\log n)$ space and $\\mathcal{O}(k n^2 \\log^3 n)$ deterministic worst case\ntime. For the continuous Fr\\'echet distance, we use $\\mathcal{O}(n^2 \\log n)$\nspace and $\\mathcal{O}(k n^3 \\log^3 n)$ time. Our algorithms significantly\nimprove upon the clustering quality (improving the approximation factor in\n$\\Delta$) and size (whenever $\\ell \\in \\Omega(\\log n)$). We offer deterministic\nrunning times comparable to known expected bounds. Additionally, in the\ncontinuous setting, we give a near-linear improvement upon the space usage.\nWhen compared only to deterministic results, we offer a near-linear speedup and\na near-quadratic improvement in the space usage. When we may restrict ourselves\nto only considering clusters where all subtrajectories are vertex-to-vertex\nsubcurves, we obtain even better results under the continuous Fr\\'echet\ndistance. Our algorithm becomes near quadratic and uses space that is near\nlinear in $n \\ell$.","PeriodicalId":501570,"journal":{"name":"arXiv - CS - Computational Geometry","volume":"43 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Geometry","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2402.13117","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Given a trajectory $T$ and a distance $\Delta$, we wish to find a set $C$ of
curves of complexity at most $\ell$, such that we can cover $T$ with subcurves
that each are within Fr\'echet distance $\Delta$ to at least one curve in $C$.
We call $C$ an $(\ell,\Delta)$-clustering and aim to find an
$(\ell,\Delta)$-clustering of minimum cardinality. This problem was introduced
by Akitaya $et$ $al.$ (2021) and shown to be NP-complete. The main focus has
therefore been on bicriterial approximation algorithms, allowing for the
clustering to be an $(\ell, \Theta(\Delta))$-clustering of roughly optimal
size. We present algorithms that construct $(\ell,4\Delta)$-clusterings of
$\mathcal{O}(k \log n)$ size, where $k$ is the size of the optimal $(\ell,
\Delta)$-clustering. For the discrete Fr\'echet distance, we use $\mathcal{O}(n
\ell \log n)$ space and $\mathcal{O}(k n^2 \log^3 n)$ deterministic worst case
time. For the continuous Fr\'echet distance, we use $\mathcal{O}(n^2 \log n)$
space and $\mathcal{O}(k n^3 \log^3 n)$ time. Our algorithms significantly
improve upon the clustering quality (improving the approximation factor in
$\Delta$) and size (whenever $\ell \in \Omega(\log n)$). We offer deterministic
running times comparable to known expected bounds. Additionally, in the
continuous setting, we give a near-linear improvement upon the space usage.
When compared only to deterministic results, we offer a near-linear speedup and
a near-quadratic improvement in the space usage. When we may restrict ourselves
to only considering clusters where all subtrajectories are vertex-to-vertex
subcurves, we obtain even better results under the continuous Fr\'echet
distance. Our algorithm becomes near quadratic and uses space that is near
linear in $n \ell$.