Pub Date : 2024-11-12DOI: 10.1109/TKDE.2024.3475028
Zeyi Liu;Xiao He
Real-time safety assessment of dynamic systems is of paramount importance in industrial processes since it provides continuous monitoring and evaluation to prevent potential harm to the environment and individuals. However, there are still several challenges to be resolved due to the requirements of time consumption and the non-stationary nature of real-world environments. In this paper, a novel online dynamic hybrid broad learning system, termed ODH-BLS, is proposed to more fully utilize the co-design advantages of active adaptation and passive adaptation. It makes effective use of limited annotations with the proposed sample value function. Simultaneously, anchor points can be dynamically adjusted to accommodate changes of the underlying distribution, thereby leveraging the value of unlabeled samples. An iterative update rule is also derived to ensure adaptation of the assessment model to real-time data at low computational costs. We also provide theoretical analyses to illustrate its practicality. Several experiments regarding the JiaoLong deep-sea manned submersible are carried out. The results demonstrate that the proposed ODH-BLS method achieves a performance improvement of approximately 8% over the baseline method on the benchmark dataset, showing its effectiveness in solving real-time safety assessment tasks for dynamic systems.
{"title":"Online Dynamic Hybrid Broad Learning System for Real-Time Safety Assessment of Dynamic Systems","authors":"Zeyi Liu;Xiao He","doi":"10.1109/TKDE.2024.3475028","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3475028","url":null,"abstract":"Real-time safety assessment of dynamic systems is of paramount importance in industrial processes since it provides continuous monitoring and evaluation to prevent potential harm to the environment and individuals. However, there are still several challenges to be resolved due to the requirements of time consumption and the non-stationary nature of real-world environments. In this paper, a novel online dynamic hybrid broad learning system, termed ODH-BLS, is proposed to more fully utilize the co-design advantages of active adaptation and passive adaptation. It makes effective use of limited annotations with the proposed sample value function. Simultaneously, anchor points can be dynamically adjusted to accommodate changes of the underlying distribution, thereby leveraging the value of unlabeled samples. An iterative update rule is also derived to ensure adaptation of the assessment model to real-time data at low computational costs. We also provide theoretical analyses to illustrate its practicality. Several experiments regarding the JiaoLong deep-sea manned submersible are carried out. The results demonstrate that the proposed ODH-BLS method achieves a performance improvement of approximately 8% over the baseline method on the benchmark dataset, showing its effectiveness in solving real-time safety assessment tasks for dynamic systems.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8928-8938"},"PeriodicalIF":8.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142636267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-12DOI: 10.1109/TKDE.2024.3436883
Qing Huang;Dianshu Liao;Zhenchang Xing;Zhiqiang Yuan;Qinghua Lu;Xiwei Xu;Jiaxing Lu
Giant pre-trained code models (PCMs) start coming into the developers’ daily practices. Understanding the type and amount of software knowledge in PCMs is essential for integrating PCMs into software engineering (SE) tasks and unlocking their potential. In this work, we conduct the first systematic study on the SE factual knowledge in the state-of-the-art PCM CoPilot, focusing on APIs’ Fully Qualified Names (FQNs), the fundamental knowledge for effective code analysis, search and reuse. Driven by FQNs’ data distribution properties, we design a novel lightweight in-context learning on Copilot for FQN inference, which does not require code compilation as traditional methods or gradient update by recent FQN prompt-tuning. We systematically experiment with five in-context learning design factors to identify the best configuration for practical use. With this best configuration, we investigate the impact of example prompts and FQN data properties on CoPilot's FQN inference capability. Our results confirm that CoPilot stores diverse FQN knowledge and can be applied for FQN inference due to its high accuracy and non-reliance on code analysis. Additionally, our extended study shows that the in-context learning method can be generalized to retrieve other SE factual knowledge embedded in giant PCMs. Furthermore, we find that the advanced general model GPT-4 also stores substantial SE knowledge. Comparing FQN inference between CoPilot and GPT-4, we observe that as model capabilities improve, the same prompts yield better results. Based on our experience interacting with Copilot, we discuss various opportunities to improve human-CoPilot interaction in the FQN inference task.
{"title":"SE Factual Knowledge in Frozen Giant Code Model: A Study on FQN and Its Retrieval","authors":"Qing Huang;Dianshu Liao;Zhenchang Xing;Zhiqiang Yuan;Qinghua Lu;Xiwei Xu;Jiaxing Lu","doi":"10.1109/TKDE.2024.3436883","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3436883","url":null,"abstract":"Giant pre-trained code models (PCMs) start coming into the developers’ daily practices. Understanding the type and amount of software knowledge in PCMs is essential for integrating PCMs into software engineering (SE) tasks and unlocking their potential. In this work, we conduct the first systematic study on the SE factual knowledge in the state-of-the-art PCM CoPilot, focusing on APIs’ Fully Qualified Names (FQNs), the fundamental knowledge for effective code analysis, search and reuse. Driven by FQNs’ data distribution properties, we design a novel lightweight in-context learning on Copilot for FQN inference, which does not require code compilation as traditional methods or gradient update by recent FQN prompt-tuning. We systematically experiment with five in-context learning design factors to identify the best configuration for practical use. With this best configuration, we investigate the impact of example prompts and FQN data properties on CoPilot's FQN inference capability. Our results confirm that CoPilot stores diverse FQN knowledge and can be applied for FQN inference due to its high accuracy and non-reliance on code analysis. Additionally, our extended study shows that the in-context learning method can be generalized to retrieve other SE factual knowledge embedded in giant PCMs. Furthermore, we find that the advanced general model GPT-4 also stores substantial SE knowledge. Comparing FQN inference between CoPilot and GPT-4, we observe that as model capabilities improve, the same prompts yield better results. Based on our experience interacting with Copilot, we discuss various opportunities to improve human-CoPilot interaction in the FQN inference task.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9220-9234"},"PeriodicalIF":8.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In social networks, topics often demonstrate a “fission” trend, where new topics arise from existing ones. Effectively predicting collective behavioral patterns during the dissemination of derivative topics is crucial for public opinion management. Addressing the symbiotic, antagonistic nature of “native-derived” topics, a derivative topic propagation model based on representation learning, topic relevance is proposed herein. First, considering the transition in user interest levels, cognitive accumulation at different evolutionary stages of native-derivative topics, a user content representation method, namely DTR2vec, is introduced, based on topic-related feature associations, for learning user content features. Then, evolutionary game theory is introduced by recognizing the symbiotic, antagonistic nature of “native-derived” topics during their propagation. Moreover, implicit relationships between users are explored, user influence is quantified for learning user structural features. Finally, considering the graph convolutional network’s ability to process non-euclidean structured data, the proposed model integrates user content, structural features to predict user forwarding behavior. Experimental results indicate that the proposed model not only effectively predicts the dissemination trends of derivative topics but also more authentically reflects the association, game relationships between native, derivative topics during their dissemination.
{"title":"A Derivative Topic Dissemination Model Based on Representation Learning and Topic Relevance","authors":"Qian Li;Yunpeng Xiao;Xinming Zhou;Rong Wang;Sirui Duan;Xiang Yu","doi":"10.1109/TKDE.2024.3484496","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3484496","url":null,"abstract":"In social networks, topics often demonstrate a “fission” trend, where new topics arise from existing ones. Effectively predicting collective behavioral patterns during the dissemination of derivative topics is crucial for public opinion management. Addressing the symbiotic, antagonistic nature of “native-derived” topics, a derivative topic propagation model based on representation learning, topic relevance is proposed herein. First, considering the transition in user interest levels, cognitive accumulation at different evolutionary stages of native-derivative topics, a user content representation method, namely DTR2vec, is introduced, based on topic-related feature associations, for learning user content features. Then, evolutionary game theory is introduced by recognizing the symbiotic, antagonistic nature of “native-derived” topics during their propagation. Moreover, implicit relationships between users are explored, user influence is quantified for learning user structural features. Finally, considering the graph convolutional network’s ability to process non-euclidean structured data, the proposed model integrates user content, structural features to predict user forwarding behavior. Experimental results indicate that the proposed model not only effectively predicts the dissemination trends of derivative topics but also more authentically reflects the association, game relationships between native, derivative topics during their dissemination.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"7468-7482"},"PeriodicalIF":8.9,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-21DOI: 10.1109/TKDE.2024.3483903
Yi Zhu;Shuqin Wang;Jipeng Qiang;Xindong Wu
Unsupervised domain adaptation aims to facilitate learning tasks in unlabeled target domain with knowledge in the related source domain, which has achieved awesome performance with the pre-trained language models (PLMs). Recently, inspired by GPT, the prompt-tuning model has been widely explored in stimulating rich knowledge in PLMs for language understanding. However, existing prompt-tuning methods still directly applied the model that was learned in the source domain into the target domain to minimize the discrepancy between different domains, e.g., the prompts or the template are trained separately to learn embeddings for transferring to the target domain, which is actually the intuition of end-to-end deep-based approach. In this paper, we propose an Iterative Soft Prompt-Tuning method (ItSPT) for better unsupervised domain adaptation. On the one hand, the prompt-tuning model learned in the source domain is converted into an iterative model to find the true label information in the target domain, the domain adaptation method is then regarded as a few-shot learning task. On the other hand, instead of hand-crafted templates, ItSPT adopts soft prompts for both considering the automatic template generation and classification performance. Experiments on both English and Chinese datasets demonstrate that our method surpasses the performance of SOTA methods.
无监督领域适应旨在利用相关源领域的知识来促进无标记目标领域的学习任务,这在预训练语言模型(PLMs)方面取得了令人赞叹的成绩。最近,受 GPT 的启发,提示调整模型在激发 PLMs 中丰富的语言理解知识方面得到了广泛的探索。然而,现有的提示调整方法仍然是直接将源领域学习到的模型应用到目标领域,以尽量减少不同领域之间的差异,例如,分别训练提示或模板以学习嵌入,从而转移到目标领域,这实际上是基于端到端深度方法的直观做法。本文提出了一种迭代软提示调整方法(ItSPT),以实现更好的无监督领域适应。一方面,将源域中学习到的提示调谐模型转换为迭代模型,以找到目标域中的真实标签信息,然后将域适应方法视为少数几次学习任务。另一方面,考虑到模板的自动生成和分类性能,ItSPT 采用软提示代替手工模板。在中英文数据集上的实验证明,我们的方法超越了 SOTA 方法的性能。
{"title":"Iterative Soft Prompt-Tuning for Unsupervised Domain Adaptation","authors":"Yi Zhu;Shuqin Wang;Jipeng Qiang;Xindong Wu","doi":"10.1109/TKDE.2024.3483903","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3483903","url":null,"abstract":"Unsupervised domain adaptation aims to facilitate learning tasks in unlabeled target domain with knowledge in the related source domain, which has achieved awesome performance with the pre-trained language models (PLMs). Recently, inspired by GPT, the prompt-tuning model has been widely explored in stimulating rich knowledge in PLMs for language understanding. However, existing prompt-tuning methods still directly applied the model that was learned in the source domain into the target domain to minimize the discrepancy between different domains, e.g., the prompts or the template are trained separately to learn embeddings for transferring to the target domain, which is actually the intuition of end-to-end deep-based approach. In this paper, we propose an Iterative Soft Prompt-Tuning method (ItSPT) for better unsupervised domain adaptation. On the one hand, the prompt-tuning model learned in the source domain is converted into an iterative model to find the true label information in the target domain, the domain adaptation method is then regarded as a few-shot learning task. On the other hand, instead of hand-crafted templates, ItSPT adopts soft prompts for both considering the automatic template generation and classification performance. Experiments on both English and Chinese datasets demonstrate that our method surpasses the performance of SOTA methods.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8580-8592"},"PeriodicalIF":8.9,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays, as privacy concerns continue to rise, federated graph learning (FGL) which generalizes the classic federated learning to graph data has attracted increasing attention. However, while the focus has been on designing collaborative learning algorithms, the potential risks of privacy leakage through the sharing of necessary graph-related information in FGL, such as node embeddings and neighbor generators, have been largely neglected. In this paper, we verify the potential risks of privacy leakage in FGL, and provide insights about the cautions in FGL algorithm design. Specifically, we propose a novel privacy attack algorithm named Privacy Attack on federated Graph learning (PAG) towards reconstructing participants’ private node attributes and the linkage relationships. The participant performing the PAG attack is able to reconstruct the node attributes of the victim by matching the received gradients of the generator, and then train a link prediction model based on its local sub-graph to inductively infer the linkages connected to these reconstructed nodes. We theoretically and empirically demonstrate that under PAG attack, directly sharing the neighbor generators makes the FGL vulnerable to the data reconstruction attack. Furthermore, an investigation into the key factors that can hinder the success of the PAG attack provides insights into corresponding defense strategies and inspires future research into privacy-preserving FGL.
{"title":"Is Sharing Neighbor Generator in Federated Graph Learning Safe?","authors":"Liuyi Yao;Zhen Wang;Yuexiang Xie;Yaliang Li;Weirui Kuang;Daoyuan Chen;Bolin Ding","doi":"10.1109/TKDE.2024.3482448","DOIUrl":"https://doi.org/10.1109/TKDE.2024.3482448","url":null,"abstract":"Nowadays, as privacy concerns continue to rise, federated graph learning (FGL) which generalizes the classic federated learning to graph data has attracted increasing attention. However, while the focus has been on designing collaborative learning algorithms, the potential risks of privacy leakage through the sharing of necessary graph-related information in FGL, such as node embeddings and neighbor generators, have been largely neglected. In this paper, we verify the potential risks of privacy leakage in FGL, and provide insights about the cautions in FGL algorithm design. Specifically, we propose a novel privacy attack algorithm named Privacy Attack on federated Graph learning (PAG) towards reconstructing participants’ private node attributes and the linkage relationships. The participant performing the PAG attack is able to reconstruct the node attributes of the victim by matching the received gradients of the generator, and then train a link prediction model based on its local sub-graph to inductively infer the linkages connected to these reconstructed nodes. We theoretically and empirically demonstrate that under PAG attack, directly sharing the neighbor generators makes the FGL vulnerable to the data reconstruction attack. Furthermore, an investigation into the key factors that can hinder the success of the PAG attack provides insights into corresponding defense strategies and inspires future research into privacy-preserving FGL.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"8568-8579"},"PeriodicalIF":8.9,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142645572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-18DOI: 10.1109/TKDE.2024.3483572
Abdul Atif Khan;Mohammad Maksood Akhter;Rashmi Maheshwari;Sraban Kumar Mohanty
Approximate spectral clustering (ASC) algorithms work on the representative points of the data for discovering intrinsic groups. The existing ASC methods identify fewer representatives as compared to the number of data points to reduce the cubic computational overhead of the spectral clustering technique. However, identifying such representative points without any domain knowledge to capture the shapes and topology of the clusters remains a challenge. This work proposes an ASC method that suitably computes enough well-scattered representatives to efficiently capture the topology of the data, making the ASC faster without the requirement of tuning any external parameters. The proposed ASC algorithm first applies two-level partitioning using both boundary points and centroids-based partitioning to identify quality representatives in less time. In the next step, we calculate the proximity between the neighboring representatives using $k$