首页 > 最新文献

IEEE Transactions on Knowledge and Data Engineering最新文献

英文 中文
Online Dynamic Hybrid Broad Learning System for Real-Time Safety Assessment of Dynamic Systems 用于动态系统实时安全评估的在线动态混合广泛学习系统
IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-12 DOI: 10.1109/TKDE.2024.3475028
Zeyi Liu;Xiao He
Real-time safety assessment of dynamic systems is of paramount importance in industrial processes since it provides continuous monitoring and evaluation to prevent potential harm to the environment and individuals. However, there are still several challenges to be resolved due to the requirements of time consumption and the non-stationary nature of real-world environments. In this paper, a novel online dynamic hybrid broad learning system, termed ODH-BLS, is proposed to more fully utilize the co-design advantages of active adaptation and passive adaptation. It makes effective use of limited annotations with the proposed sample value function. Simultaneously, anchor points can be dynamically adjusted to accommodate changes of the underlying distribution, thereby leveraging the value of unlabeled samples. An iterative update rule is also derived to ensure adaptation of the assessment model to real-time data at low computational costs. We also provide theoretical analyses to illustrate its practicality. Several experiments regarding the JiaoLong deep-sea manned submersible are carried out. The results demonstrate that the proposed ODH-BLS method achieves a performance improvement of approximately 8% over the baseline method on the benchmark dataset, showing its effectiveness in solving real-time safety assessment tasks for dynamic systems.
动态系统的实时安全评估在工业流程中至关重要,因为它可以提供持续的监测和评估,防止对环境和个人造成潜在危害。然而,由于时间消耗的要求和现实世界环境的非稳态性质,仍有一些难题有待解决。本文提出了一种新颖的在线动态混合广泛学习系统(ODH-BLS),以更充分地利用主动适应和被动适应的协同设计优势。它利用所提出的样本值函数有效地利用了有限的注释。同时,可以动态调整锚点以适应底层分布的变化,从而充分利用未标注样本的价值。我们还推导出一种迭代更新规则,以确保评估模型能以较低的计算成本适应实时数据。我们还提供了理论分析,以说明其实用性。我们对 "蛟龙 "号深海载人潜水器进行了多次实验。结果表明,在基准数据集上,所提出的 ODH-BLS 方法比基准方法的性能提高了约 8%,显示了其在解决动态系统实时安全评估任务方面的有效性。
{"title":"Online Dynamic Hybrid Broad Learning System for Real-Time Safety Assessment of Dynamic Systems","authors":"Zeyi Liu;Xiao He","doi":"10.1109/TKDE.2024.3475028","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3475028","url":null,"abstract":"Real-time safety assessment of dynamic systems is of paramount importance in industrial processes since it provides continuous monitoring and evaluation to prevent potential harm to the environment and individuals. However, there are still several challenges to be resolved due to the requirements of time consumption and the non-stationary nature of real-world environments. In this paper, a novel online dynamic hybrid broad learning system, termed ODH-BLS, is proposed to more fully utilize the co-design advantages of active adaptation and passive adaptation. It makes effective use of limited annotations with the proposed sample value function. Simultaneously, anchor points can be dynamically adjusted to accommodate changes of the underlying distribution, thereby leveraging the value of unlabeled samples. An iterative update rule is also derived to ensure adaptation of the assessment model to real-time data at low computational costs. We also provide theoretical analyses to illustrate its practicality. Several experiments regarding the JiaoLong deep-sea manned submersible are carried out. The results demonstrate that the proposed ODH-BLS method achieves a performance improvement of approximately 8% over the baseline method on the benchmark dataset, showing its effectiveness in solving real-time safety assessment tasks for dynamic systems.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8928-8938"},"PeriodicalIF":8.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SE Factual Knowledge in Frozen Giant Code Model: A Study on FQN and Its Retrieval 冷冻巨码模型中的 SE 事实知识:FQN 及其检索研究
IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-11-12 DOI: 10.1109/TKDE.2024.3436883
Qing Huang;Dianshu Liao;Zhenchang Xing;Zhiqiang Yuan;Qinghua Lu;Xiwei Xu;Jiaxing Lu
Giant pre-trained code models (PCMs) start coming into the developers’ daily practices. Understanding the type and amount of software knowledge in PCMs is essential for integrating PCMs into software engineering (SE) tasks and unlocking their potential. In this work, we conduct the first systematic study on the SE factual knowledge in the state-of-the-art PCM CoPilot, focusing on APIs’ Fully Qualified Names (FQNs), the fundamental knowledge for effective code analysis, search and reuse. Driven by FQNs’ data distribution properties, we design a novel lightweight in-context learning on Copilot for FQN inference, which does not require code compilation as traditional methods or gradient update by recent FQN prompt-tuning. We systematically experiment with five in-context learning design factors to identify the best configuration for practical use. With this best configuration, we investigate the impact of example prompts and FQN data properties on CoPilot's FQN inference capability. Our results confirm that CoPilot stores diverse FQN knowledge and can be applied for FQN inference due to its high accuracy and non-reliance on code analysis. Additionally, our extended study shows that the in-context learning method can be generalized to retrieve other SE factual knowledge embedded in giant PCMs. Furthermore, we find that the advanced general model GPT-4 also stores substantial SE knowledge. Comparing FQN inference between CoPilot and GPT-4, we observe that as model capabilities improve, the same prompts yield better results. Based on our experience interacting with Copilot, we discuss various opportunities to improve human-CoPilot interaction in the FQN inference task.
巨型预训练代码模型(PCM)开始进入开发人员的日常工作。了解 PCM 中软件知识的类型和数量对于将 PCM 整合到软件工程(SE)任务中并释放其潜力至关重要。在这项工作中,我们首次对最先进的 PCM CoPilot 中的 SE 事实知识进行了系统研究,重点关注 API 的完全限定名称(FQN),这是有效代码分析、搜索和重用的基础知识。在 FQN 数据分布特性的驱动下,我们在 Copilot 上设计了一种用于 FQN 推断的新型轻量级上下文学习方法,它不需要像传统方法那样进行代码编译,也不需要通过最近的 FQN 提示调整进行梯度更新。我们系统地试验了五种上下文学习设计因素,以确定实际应用中的最佳配置。在这种最佳配置下,我们研究了示例提示和 FQN 数据属性对 CoPilot FQN 推断能力的影响。我们的研究结果证实,CoPilot 可存储各种 FQN 知识,并且由于其高精度和不依赖代码分析,可用于 FQN 推断。此外,我们的扩展研究还表明,上下文学习方法可以推广到检索巨型 PCM 中嵌入的其他 SE 事实知识。此外,我们还发现高级通用模型 GPT-4 也存储了大量 SE 知识。对比 CoPilot 和 GPT-4 的 FQN 推断,我们发现随着模型能力的提高,相同的提示会产生更好的结果。根据我们与 Copilot 交互的经验,我们讨论了在 FQN 推断任务中改进人类与 CoPilot 交互的各种机会。
{"title":"SE Factual Knowledge in Frozen Giant Code Model: A Study on FQN and Its Retrieval","authors":"Qing Huang;Dianshu Liao;Zhenchang Xing;Zhiqiang Yuan;Qinghua Lu;Xiwei Xu;Jiaxing Lu","doi":"10.1109/TKDE.2024.3436883","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3436883","url":null,"abstract":"Giant pre-trained code models (PCMs) start coming into the developers’ daily practices. Understanding the type and amount of software knowledge in PCMs is essential for integrating PCMs into software engineering (SE) tasks and unlocking their potential. In this work, we conduct the first systematic study on the SE factual knowledge in the state-of-the-art PCM CoPilot, focusing on APIs’ Fully Qualified Names (FQNs), the fundamental knowledge for effective code analysis, search and reuse. Driven by FQNs’ data distribution properties, we design a novel lightweight in-context learning on Copilot for FQN inference, which does not require code compilation as traditional methods or gradient update by recent FQN prompt-tuning. We systematically experiment with five in-context learning design factors to identify the best configuration for practical use. With this best configuration, we investigate the impact of example prompts and FQN data properties on CoPilot's FQN inference capability. Our results confirm that CoPilot stores diverse FQN knowledge and can be applied for FQN inference due to its high accuracy and non-reliance on code analysis. Additionally, our extended study shows that the in-context learning method can be generalized to retrieve other SE factual knowledge embedded in giant PCMs. Furthermore, we find that the advanced general model GPT-4 also stores substantial SE knowledge. Comparing FQN inference between CoPilot and GPT-4, we observe that as model capabilities improve, the same prompts yield better results. Based on our experience interacting with Copilot, we discuss various opportunities to improve human-CoPilot interaction in the FQN inference task.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9220-9234"},"PeriodicalIF":8.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Derivative Topic Dissemination Model Based on Representation Learning and Topic Relevance 基于表征学习和主题相关性的衍生主题传播模型
IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-21 DOI: 10.1109/TKDE.2024.3484496
Qian Li;Yunpeng Xiao;Xinming Zhou;Rong Wang;Sirui Duan;Xiang Yu
In social networks, topics often demonstrate a “fission” trend, where new topics arise from existing ones. Effectively predicting collective behavioral patterns during the dissemination of derivative topics is crucial for public opinion management. Addressing the symbiotic, antagonistic nature of “native-derived” topics, a derivative topic propagation model based on representation learning, topic relevance is proposed herein. First, considering the transition in user interest levels, cognitive accumulation at different evolutionary stages of native-derivative topics, a user content representation method, namely DTR2vec, is introduced, based on topic-related feature associations, for learning user content features. Then, evolutionary game theory is introduced by recognizing the symbiotic, antagonistic nature of “native-derived” topics during their propagation. Moreover, implicit relationships between users are explored, user influence is quantified for learning user structural features. Finally, considering the graph convolutional network’s ability to process non-euclidean structured data, the proposed model integrates user content, structural features to predict user forwarding behavior. Experimental results indicate that the proposed model not only effectively predicts the dissemination trends of derivative topics but also more authentically reflects the association, game relationships between native, derivative topics during their dissemination.
在社交网络中,话题经常呈现 "裂变 "趋势,即从现有话题衍生出新的话题。有效预测衍生话题传播过程中的集体行为模式对于舆论管理至关重要。针对 "原生衍生 "话题的共生、拮抗特性,本文提出了一种基于表征学习、话题相关性的衍生话题传播模型。首先,考虑到原生衍生话题在不同演化阶段用户兴趣水平、认知积累的转变,引入了一种基于话题相关特征关联的用户内容表征方法,即 DTR2vec,用于学习用户内容特征。然后,通过认识 "原生衍生 "话题在传播过程中的共生和对抗性质,引入进化博弈论。此外,还探索了用户之间的隐含关系,量化了用户影响力,从而学习用户结构特征。最后,考虑到图卷积网络处理非欧几里得结构数据的能力,提出的模型整合了用户内容、结构特征来预测用户转发行为。实验结果表明,所提出的模型不仅能有效预测衍生话题的传播趋势,还能更真实地反映原生话题和衍生话题在传播过程中的关联、博弈关系。
{"title":"A Derivative Topic Dissemination Model Based on Representation Learning and Topic Relevance","authors":"Qian Li;Yunpeng Xiao;Xinming Zhou;Rong Wang;Sirui Duan;Xiang Yu","doi":"10.1109/TKDE.2024.3484496","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3484496","url":null,"abstract":"In social networks, topics often demonstrate a “fission” trend, where new topics arise from existing ones. Effectively predicting collective behavioral patterns during the dissemination of derivative topics is crucial for public opinion management. Addressing the symbiotic, antagonistic nature of “native-derived” topics, a derivative topic propagation model based on representation learning, topic relevance is proposed herein. First, considering the transition in user interest levels, cognitive accumulation at different evolutionary stages of native-derivative topics, a user content representation method, namely DTR2vec, is introduced, based on topic-related feature associations, for learning user content features. Then, evolutionary game theory is introduced by recognizing the symbiotic, antagonistic nature of “native-derived” topics during their propagation. Moreover, implicit relationships between users are explored, user influence is quantified for learning user structural features. Finally, considering the graph convolutional network’s ability to process non-euclidean structured data, the proposed model integrates user content, structural features to predict user forwarding behavior. Experimental results indicate that the proposed model not only effectively predicts the dissemination trends of derivative topics but also more authentically reflects the association, game relationships between native, derivative topics during their dissemination.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7468-7482"},"PeriodicalIF":8.9,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative Soft Prompt-Tuning for Unsupervised Domain Adaptation 用于无监督领域适应的迭代软提示调整
IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-21 DOI: 10.1109/TKDE.2024.3483903
Yi Zhu;Shuqin Wang;Jipeng Qiang;Xindong Wu
Unsupervised domain adaptation aims to facilitate learning tasks in unlabeled target domain with knowledge in the related source domain, which has achieved awesome performance with the pre-trained language models (PLMs). Recently, inspired by GPT, the prompt-tuning model has been widely explored in stimulating rich knowledge in PLMs for language understanding. However, existing prompt-tuning methods still directly applied the model that was learned in the source domain into the target domain to minimize the discrepancy between different domains, e.g., the prompts or the template are trained separately to learn embeddings for transferring to the target domain, which is actually the intuition of end-to-end deep-based approach. In this paper, we propose an Iterative Soft Prompt-Tuning method (ItSPT) for better unsupervised domain adaptation. On the one hand, the prompt-tuning model learned in the source domain is converted into an iterative model to find the true label information in the target domain, the domain adaptation method is then regarded as a few-shot learning task. On the other hand, instead of hand-crafted templates, ItSPT adopts soft prompts for both considering the automatic template generation and classification performance. Experiments on both English and Chinese datasets demonstrate that our method surpasses the performance of SOTA methods.
无监督领域适应旨在利用相关源领域的知识来促进无标记目标领域的学习任务,这在预训练语言模型(PLMs)方面取得了令人赞叹的成绩。最近,受 GPT 的启发,提示调整模型在激发 PLMs 中丰富的语言理解知识方面得到了广泛的探索。然而,现有的提示调整方法仍然是直接将源领域学习到的模型应用到目标领域,以尽量减少不同领域之间的差异,例如,分别训练提示或模板以学习嵌入,从而转移到目标领域,这实际上是基于端到端深度方法的直观做法。本文提出了一种迭代软提示调整方法(ItSPT),以实现更好的无监督领域适应。一方面,将源域中学习到的提示调谐模型转换为迭代模型,以找到目标域中的真实标签信息,然后将域适应方法视为少数几次学习任务。另一方面,考虑到模板的自动生成和分类性能,ItSPT 采用软提示代替手工模板。在中英文数据集上的实验证明,我们的方法超越了 SOTA 方法的性能。
{"title":"Iterative Soft Prompt-Tuning for Unsupervised Domain Adaptation","authors":"Yi Zhu;Shuqin Wang;Jipeng Qiang;Xindong Wu","doi":"10.1109/TKDE.2024.3483903","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3483903","url":null,"abstract":"Unsupervised domain adaptation aims to facilitate learning tasks in unlabeled target domain with knowledge in the related source domain, which has achieved awesome performance with the pre-trained language models (PLMs). Recently, inspired by GPT, the prompt-tuning model has been widely explored in stimulating rich knowledge in PLMs for language understanding. However, existing prompt-tuning methods still directly applied the model that was learned in the source domain into the target domain to minimize the discrepancy between different domains, e.g., the prompts or the template are trained separately to learn embeddings for transferring to the target domain, which is actually the intuition of end-to-end deep-based approach. In this paper, we propose an Iterative Soft Prompt-Tuning method (ItSPT) for better unsupervised domain adaptation. On the one hand, the prompt-tuning model learned in the source domain is converted into an iterative model to find the true label information in the target domain, the domain adaptation method is then regarded as a few-shot learning task. On the other hand, instead of hand-crafted templates, ItSPT adopts soft prompts for both considering the automatic template generation and classification performance. Experiments on both English and Chinese datasets demonstrate that our method surpasses the performance of SOTA methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8580-8592"},"PeriodicalIF":8.9,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is Sharing Neighbor Generator in Federated Graph Learning Safe? 联盟图学习中共享邻居生成器安全吗?
IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-18 DOI: 10.1109/TKDE.2024.3482448
Liuyi Yao;Zhen Wang;Yuexiang Xie;Yaliang Li;Weirui Kuang;Daoyuan Chen;Bolin Ding
Nowadays, as privacy concerns continue to rise, federated graph learning (FGL) which generalizes the classic federated learning to graph data has attracted increasing attention. However, while the focus has been on designing collaborative learning algorithms, the potential risks of privacy leakage through the sharing of necessary graph-related information in FGL, such as node embeddings and neighbor generators, have been largely neglected. In this paper, we verify the potential risks of privacy leakage in FGL, and provide insights about the cautions in FGL algorithm design. Specifically, we propose a novel privacy attack algorithm named Privacy Attack on federated Graph learning (PAG) towards reconstructing participants’ private node attributes and the linkage relationships. The participant performing the PAG attack is able to reconstruct the node attributes of the victim by matching the received gradients of the generator, and then train a link prediction model based on its local sub-graph to inductively infer the linkages connected to these reconstructed nodes. We theoretically and empirically demonstrate that under PAG attack, directly sharing the neighbor generators makes the FGL vulnerable to the data reconstruction attack. Furthermore, an investigation into the key factors that can hinder the success of the PAG attack provides insights into corresponding defense strategies and inspires future research into privacy-preserving FGL.
如今,随着人们对隐私问题的关注持续升温,将经典的联合学习推广到图数据的联合图学习(FGL)引起了越来越多的关注。然而,当人们把注意力集中在设计协作学习算法时,却忽略了在 FGL 中共享必要的图相关信息(如节点嵌入和邻居生成器)可能带来的隐私泄露风险。本文验证了 FGL 中潜在的隐私泄露风险,并就 FGL 算法设计中的注意事项提出了见解。具体来说,我们提出了一种名为 "联合图学习隐私攻击(PAG)"的新型隐私攻击算法,旨在重建参与者的隐私节点属性和链接关系。执行 PAG 攻击的参与者能够通过匹配接收到的生成器梯度来重构受害者的节点属性,然后基于其本地子图训练链接预测模型,从而归纳推断出与这些重构节点相连的链接关系。我们从理论和经验上证明,在 PAG 攻击下,直接共享邻居生成器会使 FGL 容易受到数据重建攻击。此外,对阻碍 PAG 攻击成功的关键因素的研究为相应的防御策略提供了启示,并激发了对保护隐私的 FGL 的未来研究。
{"title":"Is Sharing Neighbor Generator in Federated Graph Learning Safe?","authors":"Liuyi Yao;Zhen Wang;Yuexiang Xie;Yaliang Li;Weirui Kuang;Daoyuan Chen;Bolin Ding","doi":"10.1109/TKDE.2024.3482448","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3482448","url":null,"abstract":"Nowadays, as privacy concerns continue to rise, federated graph learning (FGL) which generalizes the classic federated learning to graph data has attracted increasing attention. However, while the focus has been on designing collaborative learning algorithms, the potential risks of privacy leakage through the sharing of necessary graph-related information in FGL, such as node embeddings and neighbor generators, have been largely neglected. In this paper, we verify the potential risks of privacy leakage in FGL, and provide insights about the cautions in FGL algorithm design. Specifically, we propose a novel privacy attack algorithm named Privacy Attack on federated Graph learning (PAG) towards reconstructing participants’ private node attributes and the linkage relationships. The participant performing the PAG attack is able to reconstruct the node attributes of the victim by matching the received gradients of the generator, and then train a link prediction model based on its local sub-graph to inductively infer the linkages connected to these reconstructed nodes. We theoretically and empirically demonstrate that under PAG attack, directly sharing the neighbor generators makes the FGL vulnerable to the data reconstruction attack. Furthermore, an investigation into the key factors that can hinder the success of the PAG attack provides insights into corresponding defense strategies and inspires future research into privacy-preserving FGL.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8568-8579"},"PeriodicalIF":8.9,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
L-ASCRA: A Linearithmic Time Approximate Spectral Clustering Algorithm Using Topologically-Preserved Representatives L-ASCRA:使用拓扑保留代表的线性算术时间近似谱聚类算法
IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-18 DOI: 10.1109/TKDE.2024.3483572
Abdul Atif Khan;Mohammad Maksood Akhter;Rashmi Maheshwari;Sraban Kumar Mohanty
Approximate spectral clustering (ASC) algorithms work on the representative points of the data for discovering intrinsic groups. The existing ASC methods identify fewer representatives as compared to the number of data points to reduce the cubic computational overhead of the spectral clustering technique. However, identifying such representative points without any domain knowledge to capture the shapes and topology of the clusters remains a challenge. This work proposes an ASC method that suitably computes enough well-scattered representatives to efficiently capture the topology of the data, making the ASC faster without the requirement of tuning any external parameters. The proposed ASC algorithm first applies two-level partitioning using both boundary points and centroids-based partitioning to identify quality representatives in less time. In the next step, we calculate the proximity between the neighboring representatives using $k$-rounds of minimum spanning tree (MST) by considering the distribution of edge weights in each round to find $k$. The proposed method effectively utilizes the number of representatives in a way that the overall computational time is bounded by $O(Nlg N)$. The experimental results suggest that the proposed ASC method outperforms the competing ASC methods in terms of both running time and clustering quality.
近似光谱聚类(ASC)算法利用数据的代表点来发现固有群组。与数据点数量相比,现有的近似光谱聚类方法能识别较少的代表点,以减少光谱聚类技术的立方计算开销。然而,在没有任何领域知识来捕捉聚类的形状和拓扑结构的情况下识别这些代表点仍然是一个挑战。本研究提出了一种 ASC 方法,它能适当地计算出足够多的散布良好的代表点,从而有效地捕捉数据的拓扑结构,使 ASC 更快,而无需调整任何外部参数。所提出的 ASC 算法首先使用边界点和基于中心点的两级分区来识别高质量的代表,以缩短时间。下一步,我们利用最小生成树(MST)的 $k$ 轮计算相邻代表之间的邻近度,并考虑每轮的边权重分布,以找到 $k$。所提出的方法有效地利用了代表的数量,使总体计算时间限制为 $O(N/lg N)$。实验结果表明,所提出的 ASC 方法在运行时间和聚类质量方面都优于其他同类 ASC 方法。
{"title":"L-ASCRA: A Linearithmic Time Approximate Spectral Clustering Algorithm Using Topologically-Preserved Representatives","authors":"Abdul Atif Khan;Mohammad Maksood Akhter;Rashmi Maheshwari;Sraban Kumar Mohanty","doi":"10.1109/TKDE.2024.3483572","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3483572","url":null,"abstract":"Approximate spectral clustering (ASC) algorithms work on the representative points of the data for discovering intrinsic groups. The existing ASC methods identify fewer representatives as compared to the number of data points to reduce the cubic computational overhead of the spectral clustering technique. However, identifying such representative points without any domain knowledge to capture the shapes and topology of the clusters remains a challenge. This work proposes an ASC method that suitably computes enough well-scattered representatives to efficiently capture the topology of the data, making the ASC faster without the requirement of tuning any external parameters. The proposed ASC algorithm first applies two-level partitioning using both boundary points and centroids-based partitioning to identify quality representatives in less time. In the next step, we calculate the proximity between the neighboring representatives using \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000-rounds of minimum spanning tree (MST) by considering the distribution of edge weights in each round to find \u0000<inline-formula><tex-math>$k$</tex-math></inline-formula>\u0000. The proposed method effectively utilizes the number of representatives in a way that the overall computational time is bounded by \u0000<inline-formula><tex-math>$O(Nlg N)$</tex-math></inline-formula>\u0000. The experimental results suggest that the proposed ASC method outperforms the competing ASC methods in terms of both running time and clustering quality.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8643-8654"},"PeriodicalIF":8.9,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human-AI Interaction: Human Behavior Routineness Shapes AI Performance 人与人工智能的交互:人类行为的常规性决定了人工智能的性能
IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-14 DOI: 10.1109/TKDE.2024.3480317
Tianao Sun;Kai Zhao;Meng Chen
A crucial area of research in Human-AI Interaction focuses on understanding how the integration of AI into social systems influences human behavior, for example, how news-feeding algorithms affect people’s voting decisions. But little attention has been paid to how human behavior shapes AI performance. We fill this research gap by introducing routineness to measure human behavior for the AI system, which assesses the degree of routine in a person’s activity based on their past activities. We apply the proposed routineness metric to two extensive human behavior datasets: the human mobility dataset with over 700 million data samples and the social media dataset with over 3.8 million data samples. Our analysis reveals routineness can effectively detect behavioral changes in human activities. The performance of AI algorithms is profoundly determined by human routineness, which provides valuable guidance for the selection of AI algorithms.
人机交互研究的一个重要领域是了解人工智能与社会系统的整合如何影响人类行为,例如,新闻推送算法如何影响人们的投票决定。但人们很少关注人类行为如何影响人工智能的表现。我们通过引入常规性来衡量人工智能系统中的人类行为,从而填补了这一研究空白。常规性是根据一个人过去的活动来评估其活动的常规程度。我们将提出的例行性度量方法应用于两个广泛的人类行为数据集:拥有 7 亿多个数据样本的人类移动数据集和拥有 380 多万个数据样本的社交媒体数据集。我们的分析表明,常规性可以有效检测人类活动中的行为变化。人工智能算法的性能在很大程度上取决于人类的常规性,这为人工智能算法的选择提供了宝贵的指导。
{"title":"Human-AI Interaction: Human Behavior Routineness Shapes AI Performance","authors":"Tianao Sun;Kai Zhao;Meng Chen","doi":"10.1109/TKDE.2024.3480317","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3480317","url":null,"abstract":"A crucial area of research in Human-AI Interaction focuses on understanding how the integration of AI into social systems influences human behavior, for example, how news-feeding algorithms affect people’s voting decisions. But little attention has been paid to how human behavior shapes AI performance. We fill this research gap by introducing \u0000<italic>routineness</i>\u0000 to measure human behavior for the AI system, which assesses the degree of routine in a person’s activity based on their past activities. We apply the proposed \u0000<italic>routineness</i>\u0000 metric to two extensive human behavior datasets: the human mobility dataset with over 700 million data samples and the social media dataset with over 3.8 million data samples. Our analysis reveals \u0000<italic>routineness</i>\u0000 can effectively detect behavioral changes in human activities. The performance of AI algorithms is profoundly determined by human \u0000<italic>routineness</i>\u0000, which provides valuable guidance for the selection of AI algorithms.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8476-8487"},"PeriodicalIF":8.9,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
When Quantum Computing Meets Database: A Hybrid Sampling Framework for Approximate Query Processing 当量子计算遇上数据库:近似查询处理的混合采样框架
IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-14 DOI: 10.1109/TKDE.2024.3480278
Sai Wu;Meng Shi;Dongxiang Zhang;Junbo Zhao;Gongsheng Yuan;Gang Chen
Quantum computing represents a next-generation technology in data processing, promising to transcend the limitations of traditional computation. In this paper, we undertake an early exploration of the potential integration of quantum computing with database query optimization. We introduce a pioneering hybrid classical-quantum algorithm for sampling-based approximate query processing (AQP). The core concept of the algorithm revolves around identifying rare groups, which often follow a long-tail distribution, and applying distinct sampling methodologies to normal and rare groups. By leveraging the quantum capabilities of the diffusion gate and QRAM, the algorithm defines a novel quantum sampling approach that iteratively amplifies the signals of these infrequent groups. The algorithm operates without the need for preprocessing or prior knowledge of workloads or data. It utilizes the power of quadratic acceleration to achieve well-balanced sampling across various data categories. Experimental results demonstrate that in the context of AQP, the new sampling scheme provides higher accuracy at the same sampling cost. Additionally, the benefits of quantum computing become more pronounced as query selectivity increases.
量子计算是数据处理领域的新一代技术,有望超越传统计算的局限性。在本文中,我们对量子计算与数据库查询优化的潜在整合进行了初步探索。我们为基于采样的近似查询处理(AQP)引入了一种开创性的经典-量子混合算法。该算法的核心理念围绕着识别稀有组(通常遵循长尾分布),并对正常组和稀有组应用不同的抽样方法。通过利用扩散门和 QRAM 的量子功能,该算法定义了一种新颖的量子采样方法,可以迭代放大这些不常见组的信号。该算法无需预处理,也无需事先了解工作负载或数据。它利用二次加速的能力,在各种数据类别中实现均衡采样。实验结果表明,在 AQP 的背景下,新的采样方案能以相同的采样成本提供更高的精度。此外,随着查询选择性的增加,量子计算的优势也变得更加明显。
{"title":"When Quantum Computing Meets Database: A Hybrid Sampling Framework for Approximate Query Processing","authors":"Sai Wu;Meng Shi;Dongxiang Zhang;Junbo Zhao;Gongsheng Yuan;Gang Chen","doi":"10.1109/TKDE.2024.3480278","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3480278","url":null,"abstract":"Quantum computing represents a next-generation technology in data processing, promising to transcend the limitations of traditional computation. In this paper, we undertake an early exploration of the potential integration of quantum computing with database query optimization. We introduce a pioneering hybrid classical-quantum algorithm for sampling-based approximate query processing (AQP). The core concept of the algorithm revolves around identifying rare groups, which often follow a long-tail distribution, and applying distinct sampling methodologies to normal and rare groups. By leveraging the quantum capabilities of the diffusion gate and QRAM, the algorithm defines a novel quantum sampling approach that iteratively amplifies the signals of these infrequent groups. The algorithm operates without the need for preprocessing or prior knowledge of workloads or data. It utilizes the power of quadratic acceleration to achieve well-balanced sampling across various data categories. Experimental results demonstrate that in the context of AQP, the new sampling scheme provides higher accuracy at the same sampling cost. Additionally, the benefits of quantum computing become more pronounced as query selectivity increases.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9532-9546"},"PeriodicalIF":8.9,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SUHDSA: Secure, Useful, and High-Performance Data Stream Anonymization SUHDSA:安全、实用、高性能的数据流匿名化
IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-14 DOI: 10.1109/TKDE.2024.3476684
Yongwan Joo;Soonseok Kim
This study addresses privacy concerns in real-time streaming data, including personal biometric signals and private information from sources such as real-time crime reporting, online sales transactions, and hospital patient-monitoring devices. Anonymization is crucial because it hides sensitive personal data. Achieving anonymity in real-time streaming data involves satisfying the unique demands of real-time scenarios, which is distinct from traditional methods. Specifically, security and minimal information loss must be maintained within a specified timeframe (referred to as the average delay time). The most recent solution in this context is the utility-based approach to data stream anonymization (UBDSA) algorithm developed by Sopaoglu and Abul. This study aims to enhance the performance of UBDSA by introducing a secure, useful, and high-performance data stream anonymization (SUHDSA) algorithm. SUHDSA outperforms UBDSA in terms of runtime and information loss while still ensuring privacy protection and an average delay time. The experimental results, using the same dataset and cluster size as in a previous UBDSA study, demonstrate significant performance improvements with the proposed algorithm. It achieves a minimum runtime of 24.05 s and a maximum runtime of 29.88 s, with information loss rates ranging from 14% to 77%. These results surpass the performance of the previous UBDSA algorithm.
本研究探讨了实时流数据中的隐私问题,包括来自实时犯罪报告、在线销售交易和医院病人监控设备等来源的个人生物识别信号和私人信息。匿名化至关重要,因为它可以隐藏敏感的个人数据。在实时流数据中实现匿名化需要满足实时场景的独特需求,这与传统方法不同。具体来说,必须在指定的时间范围内(称为平均延迟时间)保持安全性和最小的信息丢失。这方面最新的解决方案是 Sopaoglu 和 Abul 开发的基于效用的数据流匿名化(UBDSA)算法。本研究旨在通过引入一种安全、实用和高性能的数据流匿名化算法(SUHDSA)来提高 UBDSA 的性能。SUHDSA 在运行时间和信息丢失方面优于 UBDSA,同时还能确保隐私保护和平均延迟时间。实验结果表明,使用与之前 UBDSA 研究相同的数据集和群集规模,所提算法的性能有了显著提高。它的最短运行时间为 24.05 秒,最长运行时间为 29.88 秒,信息丢失率从 14% 到 77% 不等。这些结果超过了之前的 UBDSA 算法。
{"title":"SUHDSA: Secure, Useful, and High-Performance Data Stream Anonymization","authors":"Yongwan Joo;Soonseok Kim","doi":"10.1109/TKDE.2024.3476684","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3476684","url":null,"abstract":"This study addresses privacy concerns in real-time streaming data, including personal biometric signals and private information from sources such as real-time crime reporting, online sales transactions, and hospital patient-monitoring devices. Anonymization is crucial because it hides sensitive personal data. Achieving anonymity in real-time streaming data involves satisfying the unique demands of real-time scenarios, which is distinct from traditional methods. Specifically, security and minimal information loss must be maintained within a specified timeframe (referred to as the average delay time). The most recent solution in this context is the utility-based approach to data stream anonymization (UBDSA) algorithm developed by Sopaoglu and Abul. This study aims to enhance the performance of UBDSA by introducing a secure, useful, and high-performance data stream anonymization (SUHDSA) algorithm. SUHDSA outperforms UBDSA in terms of runtime and information loss while still ensuring privacy protection and an average delay time. The experimental results, using the same dataset and cluster size as in a previous UBDSA study, demonstrate significant performance improvements with the proposed algorithm. It achieves a minimum runtime of 24.05 s and a maximum runtime of 29.88 s, with information loss rates ranging from 14% to 77%. These results surpass the performance of the previous UBDSA algorithm.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9336-9347"},"PeriodicalIF":8.9,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10715680","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Debiased Pairwise Learning for Implicit Collaborative Filtering 隐式协作过滤的有偏差配对学习
IF 8.9 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-14 DOI: 10.1109/TKDE.2024.3479240
Bin Liu;Qin Luo;Bang Wang
Learning representations from pairwise comparisons has achieved significant success in various fields, including computer vision and information retrieval. In recommendation systems, collaborative filtering algorithms based on pairwise learning are also rooted in this approach. However, a major challenge in collaborative filtering is the lack of labels for negative instances in implicit feedback data, leading to the inclusion of false negatives among randomly selected instances. This issue causes biased optimization objectives and results in biased parameter estimation. In this paper, we propose a novel method to address learning biases arising from implicit feedback data and introduce a modified loss function for pairwise learning, called debiased pairwise loss (DPL). The core idea of DPL is to correct the biased probability estimates caused by false negatives, thereby adjusting the gradients to more closely approximate those of fully supervised data. Implementing DPL requires only a small modification to the existing codebase. Experimental studies on public datasets demonstrate the effectiveness of the proposed method.
从成对比较中学习表征已在计算机视觉和信息检索等多个领域取得了巨大成功。在推荐系统中,基于成对比较学习的协同过滤算法也源于这种方法。然而,协同过滤的一个主要挑战是隐式反馈数据中缺乏负面实例的标签,从而导致在随机选择的实例中包含错误的负面实例。这个问题会导致优化目标出现偏差,并导致参数估计出现偏差。在本文中,我们提出了一种新方法来解决隐式反馈数据引起的学习偏差,并引入了一种用于配对学习的修正损失函数,称为去偏配对损失(DPL)。DPL 的核心思想是纠正由假否定引起的概率估计偏差,从而调整梯度,使其更接近完全监督数据的梯度。实现 DPL 只需对现有代码库做少量修改。对公共数据集的实验研究证明了所提方法的有效性。
{"title":"Debiased Pairwise Learning for Implicit Collaborative Filtering","authors":"Bin Liu;Qin Luo;Bang Wang","doi":"10.1109/TKDE.2024.3479240","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3479240","url":null,"abstract":"Learning representations from pairwise comparisons has achieved significant success in various fields, including computer vision and information retrieval. In recommendation systems, collaborative filtering algorithms based on pairwise learning are also rooted in this approach. However, a major challenge in collaborative filtering is the lack of labels for negative instances in implicit feedback data, leading to the inclusion of false negatives among randomly selected instances. This issue causes biased optimization objectives and results in biased parameter estimation. In this paper, we propose a novel method to address learning biases arising from implicit feedback data and introduce a modified loss function for pairwise learning, called debiased pairwise loss (DPL). The core idea of DPL is to correct the biased probability estimates caused by false negatives, thereby adjusting the gradients to more closely approximate those of fully supervised data. Implementing DPL requires only a small modification to the existing codebase. Experimental studies on public datasets demonstrate the effectiveness of the proposed method.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7878-7892"},"PeriodicalIF":8.9,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Knowledge and Data Engineering
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1