{"title":"物联网数据中的高效 $k$-NN 搜索:基于树的索引结构中的重叠优化","authors":"Ala-Eddine Benrazek, Zineddine Kouahla, Brahim Farou, Hamid Seridi, Ibtissem Kemouguette","doi":"arxiv-2408.16036","DOIUrl":null,"url":null,"abstract":"The proliferation of interconnected devices in the Internet of Things (IoT)\nhas led to an exponential increase in data, commonly known as Big IoT Data.\nEfficient retrieval of this heterogeneous data demands a robust indexing\nmechanism for effective organization. However, a significant challenge remains:\nthe overlap in data space partitions during index construction. This overlap\nincreases node access during search and retrieval, resulting in higher resource\nconsumption, performance bottlenecks, and impedes system scalability. To\naddress this issue, we propose three innovative heuristics designed to quantify\nand strategically reduce data space partition overlap. The volume-based method\n(VBM) offers a detailed assessment by calculating the intersection volume\nbetween partitions, providing deeper insights into spatial relationships. The\ndistance-based method (DBM) enhances efficiency by using the distance between\npartition centers and radii to evaluate overlap, offering a streamlined yet\naccurate approach. Finally, the object-based method (OBM) provides a practical\nsolution by counting objects across multiple partitions, delivering an\nintuitive understanding of data space dynamics. Experimental results\ndemonstrate the effectiveness of these methods in reducing search time,\nunderscoring their potential to improve data space partitioning and enhance\noverall system performance.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"16 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient $k$-NN Search in IoT Data: Overlap Optimization in Tree-Based Indexing Structures\",\"authors\":\"Ala-Eddine Benrazek, Zineddine Kouahla, Brahim Farou, Hamid Seridi, Ibtissem Kemouguette\",\"doi\":\"arxiv-2408.16036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The proliferation of interconnected devices in the Internet of Things (IoT)\\nhas led to an exponential increase in data, commonly known as Big IoT Data.\\nEfficient retrieval of this heterogeneous data demands a robust indexing\\nmechanism for effective organization. However, a significant challenge remains:\\nthe overlap in data space partitions during index construction. This overlap\\nincreases node access during search and retrieval, resulting in higher resource\\nconsumption, performance bottlenecks, and impedes system scalability. To\\naddress this issue, we propose three innovative heuristics designed to quantify\\nand strategically reduce data space partition overlap. The volume-based method\\n(VBM) offers a detailed assessment by calculating the intersection volume\\nbetween partitions, providing deeper insights into spatial relationships. The\\ndistance-based method (DBM) enhances efficiency by using the distance between\\npartition centers and radii to evaluate overlap, offering a streamlined yet\\naccurate approach. Finally, the object-based method (OBM) provides a practical\\nsolution by counting objects across multiple partitions, delivering an\\nintuitive understanding of data space dynamics. Experimental results\\ndemonstrate the effectiveness of these methods in reducing search time,\\nunderscoring their potential to improve data space partitioning and enhance\\noverall system performance.\",\"PeriodicalId\":501291,\"journal\":{\"name\":\"arXiv - CS - Performance\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Performance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.16036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.16036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient $k$-NN Search in IoT Data: Overlap Optimization in Tree-Based Indexing Structures
The proliferation of interconnected devices in the Internet of Things (IoT)
has led to an exponential increase in data, commonly known as Big IoT Data.
Efficient retrieval of this heterogeneous data demands a robust indexing
mechanism for effective organization. However, a significant challenge remains:
the overlap in data space partitions during index construction. This overlap
increases node access during search and retrieval, resulting in higher resource
consumption, performance bottlenecks, and impedes system scalability. To
address this issue, we propose three innovative heuristics designed to quantify
and strategically reduce data space partition overlap. The volume-based method
(VBM) offers a detailed assessment by calculating the intersection volume
between partitions, providing deeper insights into spatial relationships. The
distance-based method (DBM) enhances efficiency by using the distance between
partition centers and radii to evaluate overlap, offering a streamlined yet
accurate approach. Finally, the object-based method (OBM) provides a practical
solution by counting objects across multiple partitions, delivering an
intuitive understanding of data space dynamics. Experimental results
demonstrate the effectiveness of these methods in reducing search time,
underscoring their potential to improve data space partitioning and enhance
overall system performance.