Visual Question and Answering (VQA) has garnered significant attention as a domain that requires the synthesis of visual and textual information to produce accurate responses. While existing methods often rely on Convolutional Neural Networks (CNNs) for feature extraction and attention mechanisms for embedding learning, they frequently fail to capture the nuanced interactions between entities within images, leading to potential ambiguities in answer generation. In this paper, we introduce a novel network architecture, Dual-modality Integration Attention with Graph-based Feature Extraction (DIAGFE), which addresses these limitations by incorporating two key innovations: a Graph-based Feature Extraction (GFE) module that enhances the precision of visual semantics extraction, and a Dual-modality Integration Attention (DIA) mechanism that efficiently fuses visual and question features to guide the model towards more accurate answer generation. Our model is trained with a composite loss function to refine its predictive accuracy. Rigorous experiments on the VQA2.0 dataset demonstrate that DIAGFE outperforms existing methods, underscoring the effectiveness of our approach in advancing VQA research and its potential for cross-modal understanding.
{"title":"Dual-Modality Integration Attention with Graph-Based Feature Extraction for Visual Question and Answering","authors":"Jing Lu;Chunlei Wu;Leiquan Wang;Ran Li;Xiuxuan Shen","doi":"10.26599/TST.2024.9010093","DOIUrl":"https://doi.org/10.26599/TST.2024.9010093","url":null,"abstract":"Visual Question and Answering (VQA) has garnered significant attention as a domain that requires the synthesis of visual and textual information to produce accurate responses. While existing methods often rely on Convolutional Neural Networks (CNNs) for feature extraction and attention mechanisms for embedding learning, they frequently fail to capture the nuanced interactions between entities within images, leading to potential ambiguities in answer generation. In this paper, we introduce a novel network architecture, Dual-modality Integration Attention with Graph-based Feature Extraction (DIAGFE), which addresses these limitations by incorporating two key innovations: a Graph-based Feature Extraction (GFE) module that enhances the precision of visual semantics extraction, and a Dual-modality Integration Attention (DIA) mechanism that efficiently fuses visual and question features to guide the model towards more accurate answer generation. Our model is trained with a composite loss function to refine its predictive accuracy. Rigorous experiments on the VQA2.0 dataset demonstrate that DIAGFE outperforms existing methods, underscoring the effectiveness of our approach in advancing VQA research and its potential for cross-modal understanding.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2133-2145"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979795","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-29DOI: 10.26599/TST.2024.9010095
Ling Zhou;Qirong Mao;Ming Dong
Micro-Expression Recognition (MER) is a challenging task as the subtle changes occur over different action regions of a face. Changes in facial action regions are formed as Action Units (AUs), and AUs in micro-expressions can be seen as the actors in cooperative group activities. In this paper, we propose a novel deep neural network model for objective class-based MER, which simultaneously detects AUs and aggregates AU-level features into micro-expression-level representation through Graph Convolutional Networks (GCN). Specifically, we propose two new strategies in our AU detection module for more effective AU feature learning: the attention mechanism and the balanced detection loss function. With these two strategies, features are learned for all the AUs in a unified model, eliminating the error-prune landmark detection process and tedious separate training for each AU. Moreover, our model incorporates a tailored objective class-based AU knowledge-graph, which facilitates the GCN to aggregate the AU-level features into a micro-expression-level feature representation. Extensive experiments on two tasks in MEGC 2018 show that our approach outperforms the current state-of-the-art methods in MER. Additionally, we also report our single model-based micro-expression AU detection results.
{"title":"Objective Class-Based Micro-Expression Recognition Through Simultaneous Action Unit Detection and Feature Aggregation","authors":"Ling Zhou;Qirong Mao;Ming Dong","doi":"10.26599/TST.2024.9010095","DOIUrl":"https://doi.org/10.26599/TST.2024.9010095","url":null,"abstract":"Micro-Expression Recognition (MER) is a challenging task as the subtle changes occur over different action regions of a face. Changes in facial action regions are formed as Action Units (AUs), and AUs in micro-expressions can be seen as the actors in cooperative group activities. In this paper, we propose a novel deep neural network model for objective class-based MER, which simultaneously detects AUs and aggregates AU-level features into micro-expression-level representation through Graph Convolutional Networks (GCN). Specifically, we propose two new strategies in our AU detection module for more effective AU feature learning: the attention mechanism and the balanced detection loss function. With these two strategies, features are learned for all the AUs in a unified model, eliminating the error-prune landmark detection process and tedious separate training for each AU. Moreover, our model incorporates a tailored objective class-based AU knowledge-graph, which facilitates the GCN to aggregate the AU-level features into a micro-expression-level feature representation. Extensive experiments on two tasks in MEGC 2018 show that our approach outperforms the current state-of-the-art methods in MER. Additionally, we also report our single model-based micro-expression AU detection results.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2114-2132"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979653","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Script event stream prediction is a task that predicts events based on a given context or script. Most existing methods predict one subsequent event, limiting the ability to make a longer inference about the future. Moreover, external knowledge has been proven to be beneficial for event prediction and used in many methods in the form of relations between events. However, these methods focus mainly on the continuity of actions while ignoring the other components of events. To tackle these issues, we propose a Multi-step Script Event Prediction (MuSEP) method that can make a longer inference according to the given events. We adopt reinforcement learning to implement the multi-step prediction by treating the process as a Markov chain and setting the reward considering both chain-level and event-level thus ensuring the overall quality of prediction results. Additionally, we learn the representations of events with external knowledge which could better understand events and their components. Experimental results on four datasets demonstrate that our method not only outperforms state-of-the-art methods on one-step prediction but is also capable of making multi-step prediction.
{"title":"Envisioning a Future Beyond Tomorrow with Script Event Stream Prediction","authors":"Zhiyi Fang;Zhuofeng Li;Qingyong Zhang;Changhua Xu;Pinzhuo Tian;Shaorong Xie","doi":"10.26599/TST.2024.9010158","DOIUrl":"https://doi.org/10.26599/TST.2024.9010158","url":null,"abstract":"Script event stream prediction is a task that predicts events based on a given context or script. Most existing methods predict one subsequent event, limiting the ability to make a longer inference about the future. Moreover, external knowledge has been proven to be beneficial for event prediction and used in many methods in the form of relations between events. However, these methods focus mainly on the continuity of actions while ignoring the other components of events. To tackle these issues, we propose a Multi-step Script Event Prediction (MuSEP) method that can make a longer inference according to the given events. We adopt reinforcement learning to implement the multi-step prediction by treating the process as a Markov chain and setting the reward considering both chain-level and event-level thus ensuring the overall quality of prediction results. Additionally, we learn the representations of events with external knowledge which could better understand events and their components. Experimental results on four datasets demonstrate that our method not only outperforms state-of-the-art methods on one-step prediction but is also capable of making multi-step prediction.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2048-2059"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979651","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-29DOI: 10.26599/TST.2024.9010115
Yunyi Gao;Qiankun Liu;Lin Gu;Ying Fu
Near-InfraRed (NIR) imaging technology plays a pivotal role in assisted driving and safety surveillance systems, yet its monochromatic nature and deficiency in detail limit its further application. Recent methods aim to recover the corresponding RGB image directly from the NIR image using Convolutional Neural Networks (CNN). However, these methods struggle with accurately recovering both luminance and chrominance information and the inherent deficiencies in NIR image details. In this paper, we propose grayscale-assisted RGB image restoration from NIR images to recover luminance and chrominance information in two stages. We address the complex NIR-to-RGB conversion challenge by decoupling it into two separate stages. First, it converts NIR to grayscale images, focusing on luminance learning. Then, it transforms grayscale to RGB images, concentrating on chrominance information. In addition, we incorporate frequency domain learning to shift the image processing from the spatial domain to the frequency domain, facilitating the restoration of the detailed textures often lost in NIR images. Empirical evaluations of our grayscale-assisted framework and existing state-of-the-art methods demonstrate its superior performance and yield more visually appealing results. Code is accessible at: https://github.com/Yiiclass/RING
{"title":"Grayscale-Assisted RGB Image Conversion from Near-Infrared Images","authors":"Yunyi Gao;Qiankun Liu;Lin Gu;Ying Fu","doi":"10.26599/TST.2024.9010115","DOIUrl":"https://doi.org/10.26599/TST.2024.9010115","url":null,"abstract":"Near-InfraRed (NIR) imaging technology plays a pivotal role in assisted driving and safety surveillance systems, yet its monochromatic nature and deficiency in detail limit its further application. Recent methods aim to recover the corresponding RGB image directly from the NIR image using Convolutional Neural Networks (CNN). However, these methods struggle with accurately recovering both luminance and chrominance information and the inherent deficiencies in NIR image details. In this paper, we propose grayscale-assisted RGB image restoration from NIR images to recover luminance and chrominance information in two stages. We address the complex NIR-to-RGB conversion challenge by decoupling it into two separate stages. First, it converts NIR to grayscale images, focusing on luminance learning. Then, it transforms grayscale to RGB images, concentrating on chrominance information. In addition, we incorporate frequency domain learning to shift the image processing from the spatial domain to the frequency domain, facilitating the restoration of the detailed textures often lost in NIR images. Empirical evaluations of our grayscale-assisted framework and existing state-of-the-art methods demonstrate its superior performance and yield more visually appealing results. Code is accessible at: https://github.com/Yiiclass/RING","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2215-2226"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979784","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-29DOI: 10.26599/TST.2024.9010127
Yating Hu;Qingwen Du;Jun Luo;Changlin Yu;Bo Zhao;Yingyi Sun
Nonconvex Activated Fuzzy Zeroing Neural Network-based (NAFZNN) and Nonconvex Activated Fuzzy Noise-Tolerant Zeroing Neural Network-based (NAFNTZNN) models are devised and analyzed, drawing inspiration from the classical ZNN/NTZNN-based model for online addressing Time-Varying Quadratic Programming Problems (TVQPPs) with Equality and Inequality Constraints (EICs) in noisy circumstances, respectively. Furthermore, the proposed NAFZNN model and NAFNTZNN model are considered as general proportion-differentiation controller, along with general proportion-integration-differentiation controller. Besides, theoretical results demonstrate the global convergence of both the NAFZNN and NAFNTZNN models for TVQPPs with EIC under noisy conditions. Moreover, numerical results illustrate the efficiency, robustness, and ascendancy of the NAFZNN and NAFZNN models in addressing TVQPPs online, exhibiting inherent noise tolerance. Ultimately, an application example to plant leaf disease identification is conducted to support the feasibility and efficacy of the designed NAFNTZNN model, which shows its potential practical value in the field of image recognition.
{"title":"A Nonconvex Activated Fuzzy RNN with Noise-Immune for Time-Varying Quadratic Programming Problems: Application to Plant Leaf Disease Identification","authors":"Yating Hu;Qingwen Du;Jun Luo;Changlin Yu;Bo Zhao;Yingyi Sun","doi":"10.26599/TST.2024.9010127","DOIUrl":"https://doi.org/10.26599/TST.2024.9010127","url":null,"abstract":"Nonconvex Activated Fuzzy Zeroing Neural Network-based (NAFZNN) and Nonconvex Activated Fuzzy Noise-Tolerant Zeroing Neural Network-based (NAFNTZNN) models are devised and analyzed, drawing inspiration from the classical ZNN/NTZNN-based model for online addressing Time-Varying Quadratic Programming Problems (TVQPPs) with Equality and Inequality Constraints (EICs) in noisy circumstances, respectively. Furthermore, the proposed NAFZNN model and NAFNTZNN model are considered as general proportion-differentiation controller, along with general proportion-integration-differentiation controller. Besides, theoretical results demonstrate the global convergence of both the NAFZNN and NAFNTZNN models for TVQPPs with EIC under noisy conditions. Moreover, numerical results illustrate the efficiency, robustness, and ascendancy of the NAFZNN and NAFZNN models in addressing TVQPPs online, exhibiting inherent noise tolerance. Ultimately, an application example to plant leaf disease identification is conducted to support the feasibility and efficacy of the designed NAFNTZNN model, which shows its potential practical value in the field of image recognition.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"1994-2013"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979779","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-03DOI: 10.26599/TST.2024.9010059
Cheng Zhang;Yibin Hou;Jian He;Xiaoyang Xie
Human gesture recognition is an important research field of human-computer interaction due to its potential applications in various fields, but existing methods still face challenges in achieving high levels of accuracy. To address this issue, some existing researches propose to fuse the global features with the cropped features called focuses on vital body parts like hands. However, most methods rely on experience when choosing the focus, the scheme of focus selection is not discussed in detail. In this paper, a hierarchical body part combination method is proposed to take into account the number, combinations, and logical relationships between body parts. The proposed method generates multiple focuses using this method and employs chart-based surface modality alongside red-green-blue and optical flow modalities to enhance each focus. A feature-level fusion scheme based on the residual connection structure is proposed to fuse different modalities at convolution stages, and a focus fusion scheme is proposed to learn the relevancy of focus channels for each gesture class individually. Experiments conducted on ChaLearn isolated gesture dataset show that the use of multiple focuses in conjunction with multi-modal features and fusion strategies leads to better gesture recognition accuracy.
{"title":"Gesture Recognition with Focuses Using Hierarchical Body Part Combination","authors":"Cheng Zhang;Yibin Hou;Jian He;Xiaoyang Xie","doi":"10.26599/TST.2024.9010059","DOIUrl":"https://doi.org/10.26599/TST.2024.9010059","url":null,"abstract":"Human gesture recognition is an important research field of human-computer interaction due to its potential applications in various fields, but existing methods still face challenges in achieving high levels of accuracy. To address this issue, some existing researches propose to fuse the global features with the cropped features called focuses on vital body parts like hands. However, most methods rely on experience when choosing the focus, the scheme of focus selection is not discussed in detail. In this paper, a hierarchical body part combination method is proposed to take into account the number, combinations, and logical relationships between body parts. The proposed method generates multiple focuses using this method and employs chart-based surface modality alongside red-green-blue and optical flow modalities to enhance each focus. A feature-level fusion scheme based on the residual connection structure is proposed to fuse different modalities at convolution stages, and a focus fusion scheme is proposed to learn the relevancy of focus channels for each gesture class individually. Experiments conducted on ChaLearn isolated gesture dataset show that the use of multiple focuses in conjunction with multi-modal features and fusion strategies leads to better gesture recognition accuracy.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 4","pages":"1583-1599"},"PeriodicalIF":6.6,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908593","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-03DOI: 10.26599/TST.2024.9010065
Jianye Xie;Xudong Wang;Yuwen Liu;Wenwen Gong;Chao Yan;Wajid Rafique;Maqbool Khan;Arif Ali Khan
In the digital era, social media platforms play a crucial role in forming user communities, yet the challenge of protecting user privacy remains paramount. This paper proposes a novel framework for identifying and analyzing user communities within social media networks, emphasizing privacy protection. In detail, we implement a social media-driven user community finding approach with hashing named MCF to ensure that the extracted information cannot be traced back to specific users, thereby maintaining confidentiality. Finally, we design a set of experiments to verify the effectiveness and efficiency of our proposed MCF approach by comparing it with other existing approaches, demonstrating its effectiveness in community detection while upholding stringent privacy standards. This research contributes to the growing field of social network analysis by providing a balanced solution that respects user privacy while uncovering valuable insights into community dynamics on social media platforms.
{"title":"Social Media-Driven User Community Finding with Privacy Protection","authors":"Jianye Xie;Xudong Wang;Yuwen Liu;Wenwen Gong;Chao Yan;Wajid Rafique;Maqbool Khan;Arif Ali Khan","doi":"10.26599/TST.2024.9010065","DOIUrl":"https://doi.org/10.26599/TST.2024.9010065","url":null,"abstract":"In the digital era, social media platforms play a crucial role in forming user communities, yet the challenge of protecting user privacy remains paramount. This paper proposes a novel framework for identifying and analyzing user communities within social media networks, emphasizing privacy protection. In detail, we implement a social media-driven user community finding approach with hashing named MCF to ensure that the extracted information cannot be traced back to specific users, thereby maintaining confidentiality. Finally, we design a set of experiments to verify the effectiveness and efficiency of our proposed MCF approach by comparing it with other existing approaches, demonstrating its effectiveness in community detection while upholding stringent privacy standards. This research contributes to the growing field of social network analysis by providing a balanced solution that respects user privacy while uncovering valuable insights into community dynamics on social media platforms.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 4","pages":"1782-1792"},"PeriodicalIF":6.6,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908665","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Knowledge distillation has demonstrated considerable success in scenarios involving multi-class single-label learning. However, its direct application to multi-label learning proves challenging due to complex correlations in multi-label structures, causing student models to overlook more finely structured semantic relations present in the teacher model. In this paper, we present a solution called multi-label prototype-aware structured contrastive distillation, comprising two modules: Prototype-aware Contrastive Representation Distillation (PCRD) and prototype-aware cross-image structure distillation. The PCRD module maximizes the mutual information of prototype-aware representation between the student and teacher, ensuring semantic representation structure consistency to improve the compactness of intra-class and dispersion of inter-class representations. In the PCSD module, we introduce sample-to-sample and sample-to-prototype structured contrastive distillation to model prototype-aware cross-image structure consistency, guiding the student model to maintain a coherent label semantic structure with the teacher across multiple instances. To enhance prototype guidance stability, we introduce batch-wise dynamic prototype correction for updating class prototypes. Experimental results on three public benchmark datasets validate the effectiveness of our proposed method, demonstrating its superiority over state-of-the-art methods.
{"title":"Multi-Label Prototype-Aware Structured Contrastive Distillation","authors":"Yuelong Xia;Yihang Tong;Jing Yang;Xiaodi Sun;Yungang Zhang;Huihua Wang;Lijun Yun","doi":"10.26599/TST.2024.9010182","DOIUrl":"https://doi.org/10.26599/TST.2024.9010182","url":null,"abstract":"Knowledge distillation has demonstrated considerable success in scenarios involving multi-class single-label learning. However, its direct application to multi-label learning proves challenging due to complex correlations in multi-label structures, causing student models to overlook more finely structured semantic relations present in the teacher model. In this paper, we present a solution called multi-label prototype-aware structured contrastive distillation, comprising two modules: Prototype-aware Contrastive Representation Distillation (PCRD) and prototype-aware cross-image structure distillation. The PCRD module maximizes the mutual information of prototype-aware representation between the student and teacher, ensuring semantic representation structure consistency to improve the compactness of intra-class and dispersion of inter-class representations. In the PCSD module, we introduce sample-to-sample and sample-to-prototype structured contrastive distillation to model prototype-aware cross-image structure consistency, guiding the student model to maintain a coherent label semantic structure with the teacher across multiple instances. To enhance prototype guidance stability, we introduce batch-wise dynamic prototype correction for updating class prototypes. Experimental results on three public benchmark datasets validate the effectiveness of our proposed method, demonstrating its superiority over state-of-the-art methods.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 4","pages":"1808-1830"},"PeriodicalIF":6.6,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908678","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143535439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-03DOI: 10.26599/TST.2024.9010077
Hanwen Liu;Nianzhe Li;Huaizhen Kou;Shunmei Meng;Qianmu Li
Cross-Platform Social Relationship Prediction (CPSRP) aims to utilize users' data information on multiple platforms to enhance the performance of social relationship prediction, thereby promoting socio-economic development. Due to the highly sensitive nature of users' data in terms of privacy, CPSRP typically introduces various privacy-preserving mechanisms to safeguard users' confidential information. Although the introduction mechanism guarantees the security of the users' private information, it tends to degrade the performance of the social relationship prediction. Additionally, existing social relationship prediction schemes overlook the interdependencies among items invoked in a user behavior sequence. For this purpose, we propose a novel privacy-preserve Federated Social Relationship Prediction with Contrastive Learning framework called FSRPCL, which is a multi-task learning framework based on vertical federated learning. Specifically, the users' rating information is perturbed with a bounded differential privacy technology, and then the users' sequential representation information acquired through Transformer is applied for social relationship prediction and contrastive learning. Furthermore, each client uploads their respective weight information to the server, and the server aggregates the weight information and distributes it purposes to each client for updating. Numerous experiments on real-world datasets prove that FSRPCL delivers exceptional performance in social relationship prediction and privacy preservation, and effectively minimizes the impact of privacy-preserving technology on social relationship prediction accuracy.
{"title":"FSRPCL: Privacy-Preserve Federated Social Relationship Prediction with Contrastive Learning","authors":"Hanwen Liu;Nianzhe Li;Huaizhen Kou;Shunmei Meng;Qianmu Li","doi":"10.26599/TST.2024.9010077","DOIUrl":"https://doi.org/10.26599/TST.2024.9010077","url":null,"abstract":"Cross-Platform Social Relationship Prediction (CPSRP) aims to utilize users' data information on multiple platforms to enhance the performance of social relationship prediction, thereby promoting socio-economic development. Due to the highly sensitive nature of users' data in terms of privacy, CPSRP typically introduces various privacy-preserving mechanisms to safeguard users' confidential information. Although the introduction mechanism guarantees the security of the users' private information, it tends to degrade the performance of the social relationship prediction. Additionally, existing social relationship prediction schemes overlook the interdependencies among items invoked in a user behavior sequence. For this purpose, we propose a novel privacy-preserve Federated Social Relationship Prediction with Contrastive Learning framework called FSRPCL, which is a multi-task learning framework based on vertical federated learning. Specifically, the users' rating information is perturbed with a bounded differential privacy technology, and then the users' sequential representation information acquired through Transformer is applied for social relationship prediction and contrastive learning. Furthermore, each client uploads their respective weight information to the server, and the server aggregates the weight information and distributes it purposes to each client for updating. Numerous experiments on real-world datasets prove that FSRPCL delivers exceptional performance in social relationship prediction and privacy preservation, and effectively minimizes the impact of privacy-preserving technology on social relationship prediction accuracy.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 4","pages":"1762-1781"},"PeriodicalIF":6.6,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908667","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}