Abstract Given a user dataset $$varvec{U}$$ U and an object dataset $$varvec{I}$$ I , a kNN join query in high-dimensional space returns the $$varvec{k}$$ k nearest neighbors of each object in dataset $$varvec{U}$$ U from the object dataset $$varvec{I}$$ I . The kNN join is a basic and necessary operation in many applications, such as databases, data mining, computer vision, multi-media, machine learning, recommendation systems, and many more. In the real world, datasets frequently update dynamically as objects are added or removed. In this paper, we propose novel methods of continuous kNN join over dynamic high-dimensional data. We firstly propose the HDR $$^+$$ + Tree, which supports more efficient insertion, deletion, and batch update. Further observed that the existing methods rely on globally correlated datasets for effective dimensionality reduction, we then propose the HDR Forest. It clusters the dataset and constructs multiple HDR Trees to capture local correlations among the data. As a result, our HDR Forest is able to process non-globally correlated datasets efficiently. Two novel optimisations are applied to the proposed HDR Forest, including the precomputation of the PCA states of data items and pruning-based kNN recomputation during item deletion. For the completeness of the work, we also present the proof of computing distances in reduced dimensions of PCA in HDR Tree. Extensive experiments on real-world datasets show that the proposed methods and optimisations outperform the baseline algorithms of naive RkNN join and HDR Tree.
{"title":"Efficient continuous kNN join over dynamic high-dimensional data","authors":"Nimish Ukey, Guangjian Zhang, Zhengyi Yang, Binghao Li, Wei Li, Wenjie Zhang","doi":"10.1007/s11280-023-01204-9","DOIUrl":"https://doi.org/10.1007/s11280-023-01204-9","url":null,"abstract":"Abstract Given a user dataset $$varvec{U}$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>U</mml:mi> </mml:mrow> </mml:math> and an object dataset $$varvec{I}$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>I</mml:mi> </mml:mrow> </mml:math> , a kNN join query in high-dimensional space returns the $$varvec{k}$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>k</mml:mi> </mml:mrow> </mml:math> nearest neighbors of each object in dataset $$varvec{U}$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>U</mml:mi> </mml:mrow> </mml:math> from the object dataset $$varvec{I}$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mi>I</mml:mi> </mml:mrow> </mml:math> . The kNN join is a basic and necessary operation in many applications, such as databases, data mining, computer vision, multi-media, machine learning, recommendation systems, and many more. In the real world, datasets frequently update dynamically as objects are added or removed. In this paper, we propose novel methods of continuous kNN join over dynamic high-dimensional data. We firstly propose the HDR $$^+$$ <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:msup> <mml:mrow /> <mml:mo>+</mml:mo> </mml:msup> </mml:math> Tree, which supports more efficient insertion, deletion, and batch update. Further observed that the existing methods rely on globally correlated datasets for effective dimensionality reduction, we then propose the HDR Forest. It clusters the dataset and constructs multiple HDR Trees to capture local correlations among the data. As a result, our HDR Forest is able to process non-globally correlated datasets efficiently. Two novel optimisations are applied to the proposed HDR Forest, including the precomputation of the PCA states of data items and pruning-based kNN recomputation during item deletion. For the completeness of the work, we also present the proof of computing distances in reduced dimensions of PCA in HDR Tree. Extensive experiments on real-world datasets show that the proposed methods and optimisations outperform the baseline algorithms of naive RkNN join and HDR Tree.","PeriodicalId":49356,"journal":{"name":"World Wide Web-Internet and Web Information Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135979768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-11DOI: 10.1007/s11280-023-01202-x
Feng Chen, Jiayuan Xie, Yi Cai, Zehang Lin, Qing Li, Tao Wang
{"title":"Graph convolutional network for difficulty-controllable visual question generation","authors":"Feng Chen, Jiayuan Xie, Yi Cai, Zehang Lin, Qing Li, Tao Wang","doi":"10.1007/s11280-023-01202-x","DOIUrl":"https://doi.org/10.1007/s11280-023-01202-x","url":null,"abstract":"","PeriodicalId":49356,"journal":{"name":"World Wide Web-Internet and Web Information Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135981209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-25DOI: 10.1007/s11280-023-01196-6
Qi Luo, Dongxiao Yu, Z. Cai, Yanwei Zheng, Xiuzhen Cheng, Xuemin Lin
{"title":"Core maintenance for hypergraph streams","authors":"Qi Luo, Dongxiao Yu, Z. Cai, Yanwei Zheng, Xiuzhen Cheng, Xuemin Lin","doi":"10.1007/s11280-023-01196-6","DOIUrl":"https://doi.org/10.1007/s11280-023-01196-6","url":null,"abstract":"","PeriodicalId":49356,"journal":{"name":"World Wide Web-Internet and Web Information Systems","volume":"95 1","pages":"3709-3733"},"PeriodicalIF":3.7,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78677064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-23DOI: 10.1007/s11280-023-01203-w
{"title":"Editor’s note: Special issue on Fairness-driven User Behavior Modelling and Analysis for Online Recommendation","authors":"","doi":"10.1007/s11280-023-01203-w","DOIUrl":"https://doi.org/10.1007/s11280-023-01203-w","url":null,"abstract":"","PeriodicalId":49356,"journal":{"name":"World Wide Web-Internet and Web Information Systems","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135520533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-16DOI: 10.1007/s11280-023-01177-9
Minghe Yu, Xu Chen, Xinhao Gu, Hengyu Liu, Lun Du
{"title":"A subspace constraint based approach for fast hierarchical graph embedding","authors":"Minghe Yu, Xu Chen, Xinhao Gu, Hengyu Liu, Lun Du","doi":"10.1007/s11280-023-01177-9","DOIUrl":"https://doi.org/10.1007/s11280-023-01177-9","url":null,"abstract":"","PeriodicalId":49356,"journal":{"name":"World Wide Web-Internet and Web Information Systems","volume":"85 1","pages":"3691 - 3705"},"PeriodicalIF":3.7,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83916866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-15DOI: 10.1007/s11280-023-01193-9
Yunliang Chen, G. Huang, Yuewei Wang, Xiaohui Huang, Geyong Min
{"title":"A graph neural network incorporating spatio-temporal information for location recommendation","authors":"Yunliang Chen, G. Huang, Yuewei Wang, Xiaohui Huang, Geyong Min","doi":"10.1007/s11280-023-01193-9","DOIUrl":"https://doi.org/10.1007/s11280-023-01193-9","url":null,"abstract":"","PeriodicalId":49356,"journal":{"name":"World Wide Web-Internet and Web Information Systems","volume":"30 1","pages":"3633 - 3654"},"PeriodicalIF":3.7,"publicationDate":"2023-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81778550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-15DOI: 10.1007/s11280-023-01198-4
Linlin Ding, Haiyou Yu, Chen Zhu, Ji Ma, Yue Zhao
{"title":"Attribute prediction of spatio-temporal graph nodes based on weighted graph diffusion convolution network","authors":"Linlin Ding, Haiyou Yu, Chen Zhu, Ji Ma, Yue Zhao","doi":"10.1007/s11280-023-01198-4","DOIUrl":"https://doi.org/10.1007/s11280-023-01198-4","url":null,"abstract":"","PeriodicalId":49356,"journal":{"name":"World Wide Web-Internet and Web Information Systems","volume":"7 1","pages":"3655 - 3690"},"PeriodicalIF":3.7,"publicationDate":"2023-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72719840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-09DOI: 10.1007/s11280-023-01191-x
Haiyang Wang, Ye Wang, Xin Song, Bin Zhou, Xuechen Zhao, Feng Xie
{"title":"Quantifying controversy from stance, sentiment, offensiveness and sarcasm: a fine-grained controversy intensity measurement framework on a Chinese dataset","authors":"Haiyang Wang, Ye Wang, Xin Song, Bin Zhou, Xuechen Zhao, Feng Xie","doi":"10.1007/s11280-023-01191-x","DOIUrl":"https://doi.org/10.1007/s11280-023-01191-x","url":null,"abstract":"","PeriodicalId":49356,"journal":{"name":"World Wide Web-Internet and Web Information Systems","volume":"154 1","pages":"3607 - 3632"},"PeriodicalIF":3.7,"publicationDate":"2023-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86621415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}