{"title":"TISIS:用于相似性搜索的轨迹索引法","authors":"Sara Jarrad, Hubert Naacke, Stephane Gancarski","doi":"arxiv-2409.11301","DOIUrl":null,"url":null,"abstract":"Social media platforms enable users to share diverse types of information,\nincluding geolocation data that captures their movement patterns. Such\ngeolocation data can be leveraged to reconstruct the trajectory of a user's\nvisited Points of Interest (POIs). A key requirement in numerous applications\nis the ability to measure the similarity between such trajectories, as this\nfacilitates the retrieval of trajectories that are similar to a given reference\ntrajectory. This is the main focus of our work. Existing methods predominantly\nrely on applying a similarity function to each candidate trajectory to identify\nthose that are sufficiently similar. However, this approach becomes\ncomputationally expensive when dealing with large-scale datasets. To mitigate\nthis challenge, we propose TISIS, an efficient method that uses trajectory\nindexing to quickly find similar trajectories that share common POIs in the\nsame order. Furthermore, to account for scenarios where POIs in trajectories\nmay not exactly match but are contextually similar, we introduce TISIS*, a\nvariant of TISIS that incorporates POI embeddings. This extension allows for\nmore comprehensive retrieval of similar trajectories by considering semantic\nsimilarities between POIs, beyond mere exact matches. Extensive experimental\nevaluations demonstrate that the proposed approach significantly outperforms a\nbaseline method based on the well-known Longest Common SubSequence (LCSS)\nalgorithm, yielding substantial performance improvements across various\nreal-world datasets.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TISIS : Trajectory Indexing for SImilarity Search\",\"authors\":\"Sara Jarrad, Hubert Naacke, Stephane Gancarski\",\"doi\":\"arxiv-2409.11301\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media platforms enable users to share diverse types of information,\\nincluding geolocation data that captures their movement patterns. Such\\ngeolocation data can be leveraged to reconstruct the trajectory of a user's\\nvisited Points of Interest (POIs). A key requirement in numerous applications\\nis the ability to measure the similarity between such trajectories, as this\\nfacilitates the retrieval of trajectories that are similar to a given reference\\ntrajectory. This is the main focus of our work. Existing methods predominantly\\nrely on applying a similarity function to each candidate trajectory to identify\\nthose that are sufficiently similar. However, this approach becomes\\ncomputationally expensive when dealing with large-scale datasets. To mitigate\\nthis challenge, we propose TISIS, an efficient method that uses trajectory\\nindexing to quickly find similar trajectories that share common POIs in the\\nsame order. Furthermore, to account for scenarios where POIs in trajectories\\nmay not exactly match but are contextually similar, we introduce TISIS*, a\\nvariant of TISIS that incorporates POI embeddings. This extension allows for\\nmore comprehensive retrieval of similar trajectories by considering semantic\\nsimilarities between POIs, beyond mere exact matches. Extensive experimental\\nevaluations demonstrate that the proposed approach significantly outperforms a\\nbaseline method based on the well-known Longest Common SubSequence (LCSS)\\nalgorithm, yielding substantial performance improvements across various\\nreal-world datasets.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":\"18 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11301\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11301","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
社交媒体平台使用户能够分享各种类型的信息,包括捕捉其移动模式的地理位置数据。这些地理位置数据可用于重建用户访问过的兴趣点(POIs)的轨迹。许多应用的一个关键要求是能够测量这些轨迹之间的相似性,因为这有助于检索与给定参考轨迹相似的轨迹。这是我们工作的重点。现有的方法主要是对每个候选轨迹应用一个相似度函数来识别那些足够相似的轨迹。然而,在处理大规模数据集时,这种方法的计算成本变得非常昂贵。为了缓解这一难题,我们提出了 TISIS,这是一种高效的方法,它使用轨迹索引来快速找到以相同顺序共享共同 POI 的相似轨迹。此外,为了考虑到轨迹中的 POI 可能不完全匹配但上下文相似的情况,我们引入了 TISIS*,它是 TISIS 的一个变体,包含 POI 嵌入。这种扩展通过考虑 POI 之间的语义相似性(不仅仅是完全匹配),可以更全面地检索相似轨迹。广泛的实验评估表明,所提出的方法明显优于基于著名的最长公共子序列(LCSS)算法的基准方法,在各种真实世界数据集上取得了显著的性能改进。
Social media platforms enable users to share diverse types of information,
including geolocation data that captures their movement patterns. Such
geolocation data can be leveraged to reconstruct the trajectory of a user's
visited Points of Interest (POIs). A key requirement in numerous applications
is the ability to measure the similarity between such trajectories, as this
facilitates the retrieval of trajectories that are similar to a given reference
trajectory. This is the main focus of our work. Existing methods predominantly
rely on applying a similarity function to each candidate trajectory to identify
those that are sufficiently similar. However, this approach becomes
computationally expensive when dealing with large-scale datasets. To mitigate
this challenge, we propose TISIS, an efficient method that uses trajectory
indexing to quickly find similar trajectories that share common POIs in the
same order. Furthermore, to account for scenarios where POIs in trajectories
may not exactly match but are contextually similar, we introduce TISIS*, a
variant of TISIS that incorporates POI embeddings. This extension allows for
more comprehensive retrieval of similar trajectories by considering semantic
similarities between POIs, beyond mere exact matches. Extensive experimental
evaluations demonstrate that the proposed approach significantly outperforms a
baseline method based on the well-known Longest Common SubSequence (LCSS)
algorithm, yielding substantial performance improvements across various
real-world datasets.