首页 > 最新文献

Tsinghua Science and Technology最新文献

英文 中文
Dual-Modality Integration Attention with Graph-Based Feature Extraction for Visual Question and Answering 基于图的视觉问答特征提取的双模态集成关注
IF 6.6 1区 计算机科学 Q1 Multidisciplinary Pub Date : 2025-04-29 DOI: 10.26599/TST.2024.9010093
Jing Lu;Chunlei Wu;Leiquan Wang;Ran Li;Xiuxuan Shen
Visual Question and Answering (VQA) has garnered significant attention as a domain that requires the synthesis of visual and textual information to produce accurate responses. While existing methods often rely on Convolutional Neural Networks (CNNs) for feature extraction and attention mechanisms for embedding learning, they frequently fail to capture the nuanced interactions between entities within images, leading to potential ambiguities in answer generation. In this paper, we introduce a novel network architecture, Dual-modality Integration Attention with Graph-based Feature Extraction (DIAGFE), which addresses these limitations by incorporating two key innovations: a Graph-based Feature Extraction (GFE) module that enhances the precision of visual semantics extraction, and a Dual-modality Integration Attention (DIA) mechanism that efficiently fuses visual and question features to guide the model towards more accurate answer generation. Our model is trained with a composite loss function to refine its predictive accuracy. Rigorous experiments on the VQA2.0 dataset demonstrate that DIAGFE outperforms existing methods, underscoring the effectiveness of our approach in advancing VQA research and its potential for cross-modal understanding.
视觉问答(VQA)作为一个需要综合视觉和文本信息来产生准确响应的领域,已经引起了人们的广泛关注。虽然现有的方法通常依赖于卷积神经网络(cnn)进行特征提取和嵌入学习的注意机制,但它们经常无法捕获图像中实体之间细微的相互作用,从而导致答案生成中的潜在歧义。在本文中,我们介绍了一种新的网络架构,双模态集成注意与基于图的特征提取(DIAGFE),它通过结合两个关键创新来解决这些限制:基于图的特征提取(GFE)模块,提高了视觉语义提取的精度,以及双模态集成注意(DIA)机制,有效地融合了视觉和问题特征,以指导模型更准确地生成答案。我们的模型是用一个复合损失函数来训练的,以提高其预测精度。在VQA2.0数据集上进行的严格实验表明,DIAGFE优于现有方法,强调了我们的方法在推进VQA研究方面的有效性及其跨模态理解的潜力。
{"title":"Dual-Modality Integration Attention with Graph-Based Feature Extraction for Visual Question and Answering","authors":"Jing Lu;Chunlei Wu;Leiquan Wang;Ran Li;Xiuxuan Shen","doi":"10.26599/TST.2024.9010093","DOIUrl":"https://doi.org/10.26599/TST.2024.9010093","url":null,"abstract":"Visual Question and Answering (VQA) has garnered significant attention as a domain that requires the synthesis of visual and textual information to produce accurate responses. While existing methods often rely on Convolutional Neural Networks (CNNs) for feature extraction and attention mechanisms for embedding learning, they frequently fail to capture the nuanced interactions between entities within images, leading to potential ambiguities in answer generation. In this paper, we introduce a novel network architecture, Dual-modality Integration Attention with Graph-based Feature Extraction (DIAGFE), which addresses these limitations by incorporating two key innovations: a Graph-based Feature Extraction (GFE) module that enhances the precision of visual semantics extraction, and a Dual-modality Integration Attention (DIA) mechanism that efficiently fuses visual and question features to guide the model towards more accurate answer generation. Our model is trained with a composite loss function to refine its predictive accuracy. Rigorous experiments on the VQA2.0 dataset demonstrate that DIAGFE outperforms existing methods, underscoring the effectiveness of our approach in advancing VQA research and its potential for cross-modal understanding.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2133-2145"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979795","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Objective Class-Based Micro-Expression Recognition Through Simultaneous Action Unit Detection and Feature Aggregation 目的基于类的动作单元检测和特征聚合微表情识别
IF 6.6 1区 计算机科学 Q1 Multidisciplinary Pub Date : 2025-04-29 DOI: 10.26599/TST.2024.9010095
Ling Zhou;Qirong Mao;Ming Dong
Micro-Expression Recognition (MER) is a challenging task as the subtle changes occur over different action regions of a face. Changes in facial action regions are formed as Action Units (AUs), and AUs in micro-expressions can be seen as the actors in cooperative group activities. In this paper, we propose a novel deep neural network model for objective class-based MER, which simultaneously detects AUs and aggregates AU-level features into micro-expression-level representation through Graph Convolutional Networks (GCN). Specifically, we propose two new strategies in our AU detection module for more effective AU feature learning: the attention mechanism and the balanced detection loss function. With these two strategies, features are learned for all the AUs in a unified model, eliminating the error-prune landmark detection process and tedious separate training for each AU. Moreover, our model incorporates a tailored objective class-based AU knowledge-graph, which facilitates the GCN to aggregate the AU-level features into a micro-expression-level feature representation. Extensive experiments on two tasks in MEGC 2018 show that our approach outperforms the current state-of-the-art methods in MER. Additionally, we also report our single model-based micro-expression AU detection results.
微表情识别是一项具有挑战性的任务,因为面部的不同动作区域会发生细微的变化。面部动作区域的变化形成动作单元(action Units, au),微表情中的au可以看作是合作群体活动中的行动者。在本文中,我们提出了一种新的基于目标类的深度神经网络模型,该模型同时检测au,并通过图卷积网络(GCN)将au级特征聚合为微表达级表示。具体来说,我们在我们的AU检测模块中提出了两种更有效的AU特征学习策略:注意机制和平衡检测损失函数。使用这两种策略,可以在一个统一的模型中学习所有AU的特征,从而消除了对每个AU进行错误修剪的标记检测过程和繁琐的单独训练。此外,我们的模型结合了一个定制的基于目标类的AU知识图,这有助于GCN将AU级特征聚合成微表达级特征表示。在MEGC 2018的两个任务上进行的大量实验表明,我们的方法优于当前最先进的MER方法。此外,我们还报告了基于单一模型的微表情AU检测结果。
{"title":"Objective Class-Based Micro-Expression Recognition Through Simultaneous Action Unit Detection and Feature Aggregation","authors":"Ling Zhou;Qirong Mao;Ming Dong","doi":"10.26599/TST.2024.9010095","DOIUrl":"https://doi.org/10.26599/TST.2024.9010095","url":null,"abstract":"Micro-Expression Recognition (MER) is a challenging task as the subtle changes occur over different action regions of a face. Changes in facial action regions are formed as Action Units (AUs), and AUs in micro-expressions can be seen as the actors in cooperative group activities. In this paper, we propose a novel deep neural network model for objective class-based MER, which simultaneously detects AUs and aggregates AU-level features into micro-expression-level representation through Graph Convolutional Networks (GCN). Specifically, we propose two new strategies in our AU detection module for more effective AU feature learning: the attention mechanism and the balanced detection loss function. With these two strategies, features are learned for all the AUs in a unified model, eliminating the error-prune landmark detection process and tedious separate training for each AU. Moreover, our model incorporates a tailored objective class-based AU knowledge-graph, which facilitates the GCN to aggregate the AU-level features into a micro-expression-level feature representation. Extensive experiments on two tasks in MEGC 2018 show that our approach outperforms the current state-of-the-art methods in MER. Additionally, we also report our single model-based micro-expression AU detection results.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2114-2132"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979653","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Envisioning a Future Beyond Tomorrow with Script Event Stream Prediction 用脚本事件流预测展望未来
IF 6.6 1区 计算机科学 Q1 Multidisciplinary Pub Date : 2025-04-29 DOI: 10.26599/TST.2024.9010158
Zhiyi Fang;Zhuofeng Li;Qingyong Zhang;Changhua Xu;Pinzhuo Tian;Shaorong Xie
Script event stream prediction is a task that predicts events based on a given context or script. Most existing methods predict one subsequent event, limiting the ability to make a longer inference about the future. Moreover, external knowledge has been proven to be beneficial for event prediction and used in many methods in the form of relations between events. However, these methods focus mainly on the continuity of actions while ignoring the other components of events. To tackle these issues, we propose a Multi-step Script Event Prediction (MuSEP) method that can make a longer inference according to the given events. We adopt reinforcement learning to implement the multi-step prediction by treating the process as a Markov chain and setting the reward considering both chain-level and event-level thus ensuring the overall quality of prediction results. Additionally, we learn the representations of events with external knowledge which could better understand events and their components. Experimental results on four datasets demonstrate that our method not only outperforms state-of-the-art methods on one-step prediction but is also capable of making multi-step prediction.
脚本事件流预测是一项基于给定上下文或脚本预测事件的任务。大多数现有的方法预测一个后续事件,限制了对未来做出更长时间推断的能力。此外,外部知识已被证明对事件预测是有益的,并以事件之间关系的形式应用于许多方法中。然而,这些方法主要关注动作的连续性,而忽略了事件的其他组成部分。为了解决这些问题,我们提出了一种多步骤脚本事件预测(MuSEP)方法,该方法可以根据给定的事件进行更长的推理。我们采用强化学习实现多步预测,将过程视为马尔可夫链,同时考虑链级和事件级设置奖励,从而保证预测结果的整体质量。此外,我们用外部知识来学习事件的表示,这可以更好地理解事件及其组成部分。在四个数据集上的实验结果表明,我们的方法不仅在单步预测上优于现有的方法,而且能够进行多步预测。
{"title":"Envisioning a Future Beyond Tomorrow with Script Event Stream Prediction","authors":"Zhiyi Fang;Zhuofeng Li;Qingyong Zhang;Changhua Xu;Pinzhuo Tian;Shaorong Xie","doi":"10.26599/TST.2024.9010158","DOIUrl":"https://doi.org/10.26599/TST.2024.9010158","url":null,"abstract":"Script event stream prediction is a task that predicts events based on a given context or script. Most existing methods predict one subsequent event, limiting the ability to make a longer inference about the future. Moreover, external knowledge has been proven to be beneficial for event prediction and used in many methods in the form of relations between events. However, these methods focus mainly on the continuity of actions while ignoring the other components of events. To tackle these issues, we propose a Multi-step Script Event Prediction (MuSEP) method that can make a longer inference according to the given events. We adopt reinforcement learning to implement the multi-step prediction by treating the process as a Markov chain and setting the reward considering both chain-level and event-level thus ensuring the overall quality of prediction results. Additionally, we learn the representations of events with external knowledge which could better understand events and their components. Experimental results on four datasets demonstrate that our method not only outperforms state-of-the-art methods on one-step prediction but is also capable of making multi-step prediction.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2048-2059"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979651","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grayscale-Assisted RGB Image Conversion from Near-Infrared Images 近红外图像的灰度辅助RGB图像转换
IF 6.6 1区 计算机科学 Q1 Multidisciplinary Pub Date : 2025-04-29 DOI: 10.26599/TST.2024.9010115
Yunyi Gao;Qiankun Liu;Lin Gu;Ying Fu
Near-InfraRed (NIR) imaging technology plays a pivotal role in assisted driving and safety surveillance systems, yet its monochromatic nature and deficiency in detail limit its further application. Recent methods aim to recover the corresponding RGB image directly from the NIR image using Convolutional Neural Networks (CNN). However, these methods struggle with accurately recovering both luminance and chrominance information and the inherent deficiencies in NIR image details. In this paper, we propose grayscale-assisted RGB image restoration from NIR images to recover luminance and chrominance information in two stages. We address the complex NIR-to-RGB conversion challenge by decoupling it into two separate stages. First, it converts NIR to grayscale images, focusing on luminance learning. Then, it transforms grayscale to RGB images, concentrating on chrominance information. In addition, we incorporate frequency domain learning to shift the image processing from the spatial domain to the frequency domain, facilitating the restoration of the detailed textures often lost in NIR images. Empirical evaluations of our grayscale-assisted framework and existing state-of-the-art methods demonstrate its superior performance and yield more visually appealing results. Code is accessible at: https://github.com/Yiiclass/RING
近红外(NIR)成像技术在辅助驾驶和安全监控系统中发挥着至关重要的作用,但其单色性和细节性的不足限制了其进一步应用。最近的方法是利用卷积神经网络(CNN)直接从近红外图像中恢复相应的RGB图像。然而,这些方法难以准确地恢复亮度和色度信息以及近红外图像细节的固有缺陷。本文提出了灰度辅助RGB图像复原方法,分两个阶段从近红外图像中恢复亮度和色度信息。我们通过将其解耦为两个独立的阶段来解决复杂的nir到rgb转换挑战。首先,它将近红外图像转换为灰度图像,专注于亮度学习。然后,将灰度图像转换为RGB图像,集中处理色度信息。此外,我们结合频域学习将图像处理从空间域转移到频域,促进了近红外图像中经常丢失的细节纹理的恢复。我们的灰度辅助框架和现有的最先进的方法的经验评估表明,其优越的性能和产生更多的视觉吸引力的结果。代码可访问:https://github.com/Yiiclass/RING
{"title":"Grayscale-Assisted RGB Image Conversion from Near-Infrared Images","authors":"Yunyi Gao;Qiankun Liu;Lin Gu;Ying Fu","doi":"10.26599/TST.2024.9010115","DOIUrl":"https://doi.org/10.26599/TST.2024.9010115","url":null,"abstract":"Near-InfraRed (NIR) imaging technology plays a pivotal role in assisted driving and safety surveillance systems, yet its monochromatic nature and deficiency in detail limit its further application. Recent methods aim to recover the corresponding RGB image directly from the NIR image using Convolutional Neural Networks (CNN). However, these methods struggle with accurately recovering both luminance and chrominance information and the inherent deficiencies in NIR image details. In this paper, we propose grayscale-assisted RGB image restoration from NIR images to recover luminance and chrominance information in two stages. We address the complex NIR-to-RGB conversion challenge by decoupling it into two separate stages. First, it converts NIR to grayscale images, focusing on luminance learning. Then, it transforms grayscale to RGB images, concentrating on chrominance information. In addition, we incorporate frequency domain learning to shift the image processing from the spatial domain to the frequency domain, facilitating the restoration of the detailed textures often lost in NIR images. Empirical evaluations of our grayscale-assisted framework and existing state-of-the-art methods demonstrate its superior performance and yield more visually appealing results. Code is accessible at: https://github.com/Yiiclass/RING","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"2215-2226"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979784","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Nonconvex Activated Fuzzy RNN with Noise-Immune for Time-Varying Quadratic Programming Problems: Application to Plant Leaf Disease Identification 时变二次规划问题的非凸激活模糊抗噪声RNN在植物叶片病害识别中的应用
IF 6.6 1区 计算机科学 Q1 Multidisciplinary Pub Date : 2025-04-29 DOI: 10.26599/TST.2024.9010127
Yating Hu;Qingwen Du;Jun Luo;Changlin Yu;Bo Zhao;Yingyi Sun
Nonconvex Activated Fuzzy Zeroing Neural Network-based (NAFZNN) and Nonconvex Activated Fuzzy Noise-Tolerant Zeroing Neural Network-based (NAFNTZNN) models are devised and analyzed, drawing inspiration from the classical ZNN/NTZNN-based model for online addressing Time-Varying Quadratic Programming Problems (TVQPPs) with Equality and Inequality Constraints (EICs) in noisy circumstances, respectively. Furthermore, the proposed NAFZNN model and NAFNTZNN model are considered as general proportion-differentiation controller, along with general proportion-integration-differentiation controller. Besides, theoretical results demonstrate the global convergence of both the NAFZNN and NAFNTZNN models for TVQPPs with EIC under noisy conditions. Moreover, numerical results illustrate the efficiency, robustness, and ascendancy of the NAFZNN and NAFZNN models in addressing TVQPPs online, exhibiting inherent noise tolerance. Ultimately, an application example to plant leaf disease identification is conducted to support the feasibility and efficacy of the designed NAFNTZNN model, which shows its potential practical value in the field of image recognition.
基于非凸激活模糊归零神经网络(NAFZNN)和基于非凸激活模糊容错噪声归零神经网络(NAFNTZNN)模型,分别从经典的基于ZNN/ ntznn模型中获得灵感,设计并分析了在线求解含不等式约束的时变二次规划问题(TVQPPs)。进一步,将所提出的NAFZNN模型和NAFNTZNN模型分别视为一般比例-微分控制器和一般比例-积分-微分控制器。此外,理论结果表明,在噪声条件下,NAFZNN和NAFNTZNN模型对带EIC的tvqpp具有全局收敛性。此外,数值结果表明,NAFZNN和NAFZNN模型在在线寻址tvqpp方面具有效率、鲁棒性和优势,并表现出固有的抗噪声能力。最后,通过对植物叶片病害识别的应用实例,验证了所设计的NAFNTZNN模型的可行性和有效性,显示了其在图像识别领域潜在的实用价值。
{"title":"A Nonconvex Activated Fuzzy RNN with Noise-Immune for Time-Varying Quadratic Programming Problems: Application to Plant Leaf Disease Identification","authors":"Yating Hu;Qingwen Du;Jun Luo;Changlin Yu;Bo Zhao;Yingyi Sun","doi":"10.26599/TST.2024.9010127","DOIUrl":"https://doi.org/10.26599/TST.2024.9010127","url":null,"abstract":"Nonconvex Activated Fuzzy Zeroing Neural Network-based (NAFZNN) and Nonconvex Activated Fuzzy Noise-Tolerant Zeroing Neural Network-based (NAFNTZNN) models are devised and analyzed, drawing inspiration from the classical ZNN/NTZNN-based model for online addressing Time-Varying Quadratic Programming Problems (TVQPPs) with Equality and Inequality Constraints (EICs) in noisy circumstances, respectively. Furthermore, the proposed NAFZNN model and NAFNTZNN model are considered as general proportion-differentiation controller, along with general proportion-integration-differentiation controller. Besides, theoretical results demonstrate the global convergence of both the NAFZNN and NAFNTZNN models for TVQPPs with EIC under noisy conditions. Moreover, numerical results illustrate the efficiency, robustness, and ascendancy of the NAFZNN and NAFZNN models in addressing TVQPPs online, exhibiting inherent noise tolerance. Ultimately, an application example to plant leaf disease identification is conducted to support the feasibility and efficacy of the designed NAFNTZNN model, which shows its potential practical value in the field of image recognition.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"1994-2013"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979779","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Operator and Strengthened Diversity Improving for Multimodal Multi-Objective Optimization: Electronic Supplementary Material 多模态多目标优化的混合算子和强化分集改进:电子补充材料
IF 6.6 1区 计算机科学 Q1 Multidisciplinary Pub Date : 2025-04-29
{"title":"Hybrid Operator and Strengthened Diversity Improving for Multimodal Multi-Objective Optimization: Electronic Supplementary Material","authors":"","doi":"","DOIUrl":"https://doi.org/","url":null,"abstract":"","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 5","pages":"1-39"},"PeriodicalIF":6.6,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10979791","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143888436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gesture Recognition with Focuses Using Hierarchical Body Part Combination 基于层次身体部位组合的焦点手势识别
IF 6.6 1区 计算机科学 Q1 Multidisciplinary Pub Date : 2025-03-03 DOI: 10.26599/TST.2024.9010059
Cheng Zhang;Yibin Hou;Jian He;Xiaoyang Xie
Human gesture recognition is an important research field of human-computer interaction due to its potential applications in various fields, but existing methods still face challenges in achieving high levels of accuracy. To address this issue, some existing researches propose to fuse the global features with the cropped features called focuses on vital body parts like hands. However, most methods rely on experience when choosing the focus, the scheme of focus selection is not discussed in detail. In this paper, a hierarchical body part combination method is proposed to take into account the number, combinations, and logical relationships between body parts. The proposed method generates multiple focuses using this method and employs chart-based surface modality alongside red-green-blue and optical flow modalities to enhance each focus. A feature-level fusion scheme based on the residual connection structure is proposed to fuse different modalities at convolution stages, and a focus fusion scheme is proposed to learn the relevancy of focus channels for each gesture class individually. Experiments conducted on ChaLearn isolated gesture dataset show that the use of multiple focuses in conjunction with multi-modal features and fusion strategies leads to better gesture recognition accuracy.
人体手势识别是人机交互的一个重要研究领域,在各个领域都有潜在的应用前景,但现有的方法在实现高水平的精度方面仍然面临挑战。为了解决这一问题,现有的一些研究提出将全局特征与裁剪的特征融合在一起,这些特征被称为手部等重要部位的焦点。然而,大多数方法在选择焦点时依赖于经验,焦点选择方案没有详细讨论。本文提出了一种考虑身体部位数量、组合和逻辑关系的分层身体部位组合方法。该方法使用该方法生成多个焦点,并采用基于图表的表面模态以及红绿蓝和光流模态来增强每个焦点。提出了一种基于残差连接结构的特征级融合方案,在卷积阶段融合不同的模态;提出了一种焦点融合方案,分别学习每个手势类焦点通道的相关性。在ChaLearn孤立手势数据集上进行的实验表明,使用多焦点结合多模态特征和融合策略可以提高手势识别的精度。
{"title":"Gesture Recognition with Focuses Using Hierarchical Body Part Combination","authors":"Cheng Zhang;Yibin Hou;Jian He;Xiaoyang Xie","doi":"10.26599/TST.2024.9010059","DOIUrl":"https://doi.org/10.26599/TST.2024.9010059","url":null,"abstract":"Human gesture recognition is an important research field of human-computer interaction due to its potential applications in various fields, but existing methods still face challenges in achieving high levels of accuracy. To address this issue, some existing researches propose to fuse the global features with the cropped features called focuses on vital body parts like hands. However, most methods rely on experience when choosing the focus, the scheme of focus selection is not discussed in detail. In this paper, a hierarchical body part combination method is proposed to take into account the number, combinations, and logical relationships between body parts. The proposed method generates multiple focuses using this method and employs chart-based surface modality alongside red-green-blue and optical flow modalities to enhance each focus. A feature-level fusion scheme based on the residual connection structure is proposed to fuse different modalities at convolution stages, and a focus fusion scheme is proposed to learn the relevancy of focus channels for each gesture class individually. Experiments conducted on ChaLearn isolated gesture dataset show that the use of multiple focuses in conjunction with multi-modal features and fusion strategies leads to better gesture recognition accuracy.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 4","pages":"1583-1599"},"PeriodicalIF":6.6,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908593","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Social Media-Driven User Community Finding with Privacy Protection 社交媒体驱动的用户社区发现与隐私保护
IF 6.6 1区 计算机科学 Q1 Multidisciplinary Pub Date : 2025-03-03 DOI: 10.26599/TST.2024.9010065
Jianye Xie;Xudong Wang;Yuwen Liu;Wenwen Gong;Chao Yan;Wajid Rafique;Maqbool Khan;Arif Ali Khan
In the digital era, social media platforms play a crucial role in forming user communities, yet the challenge of protecting user privacy remains paramount. This paper proposes a novel framework for identifying and analyzing user communities within social media networks, emphasizing privacy protection. In detail, we implement a social media-driven user community finding approach with hashing named MCF to ensure that the extracted information cannot be traced back to specific users, thereby maintaining confidentiality. Finally, we design a set of experiments to verify the effectiveness and efficiency of our proposed MCF approach by comparing it with other existing approaches, demonstrating its effectiveness in community detection while upholding stringent privacy standards. This research contributes to the growing field of social network analysis by providing a balanced solution that respects user privacy while uncovering valuable insights into community dynamics on social media platforms.
在数字时代,社交媒体平台在形成用户社区方面发挥着至关重要的作用,但保护用户隐私的挑战仍然至关重要。本文提出了一个新的框架来识别和分析社交媒体网络中的用户社区,强调隐私保护。具体来说,我们使用名为MCF的散列实现了一种社交媒体驱动的用户社区查找方法,以确保提取的信息不能追溯到特定的用户,从而保持机密性。最后,我们设计了一组实验来验证我们提出的MCF方法的有效性和效率,通过将其与其他现有方法进行比较,证明其在社区检测方面的有效性,同时坚持严格的隐私标准。这项研究提供了一个平衡的解决方案,既尊重用户隐私,又揭示了对社交媒体平台上社区动态的有价值的见解,从而为不断发展的社交网络分析领域做出了贡献。
{"title":"Social Media-Driven User Community Finding with Privacy Protection","authors":"Jianye Xie;Xudong Wang;Yuwen Liu;Wenwen Gong;Chao Yan;Wajid Rafique;Maqbool Khan;Arif Ali Khan","doi":"10.26599/TST.2024.9010065","DOIUrl":"https://doi.org/10.26599/TST.2024.9010065","url":null,"abstract":"In the digital era, social media platforms play a crucial role in forming user communities, yet the challenge of protecting user privacy remains paramount. This paper proposes a novel framework for identifying and analyzing user communities within social media networks, emphasizing privacy protection. In detail, we implement a social media-driven user community finding approach with hashing named MCF to ensure that the extracted information cannot be traced back to specific users, thereby maintaining confidentiality. Finally, we design a set of experiments to verify the effectiveness and efficiency of our proposed MCF approach by comparing it with other existing approaches, demonstrating its effectiveness in community detection while upholding stringent privacy standards. This research contributes to the growing field of social network analysis by providing a balanced solution that respects user privacy while uncovering valuable insights into community dynamics on social media platforms.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 4","pages":"1782-1792"},"PeriodicalIF":6.6,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908665","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Label Prototype-Aware Structured Contrastive Distillation 多标签感知原型的结构化对比蒸馏
IF 6.6 1区 计算机科学 Q1 Multidisciplinary Pub Date : 2025-03-03 DOI: 10.26599/TST.2024.9010182
Yuelong Xia;Yihang Tong;Jing Yang;Xiaodi Sun;Yungang Zhang;Huihua Wang;Lijun Yun
Knowledge distillation has demonstrated considerable success in scenarios involving multi-class single-label learning. However, its direct application to multi-label learning proves challenging due to complex correlations in multi-label structures, causing student models to overlook more finely structured semantic relations present in the teacher model. In this paper, we present a solution called multi-label prototype-aware structured contrastive distillation, comprising two modules: Prototype-aware Contrastive Representation Distillation (PCRD) and prototype-aware cross-image structure distillation. The PCRD module maximizes the mutual information of prototype-aware representation between the student and teacher, ensuring semantic representation structure consistency to improve the compactness of intra-class and dispersion of inter-class representations. In the PCSD module, we introduce sample-to-sample and sample-to-prototype structured contrastive distillation to model prototype-aware cross-image structure consistency, guiding the student model to maintain a coherent label semantic structure with the teacher across multiple instances. To enhance prototype guidance stability, we introduce batch-wise dynamic prototype correction for updating class prototypes. Experimental results on three public benchmark datasets validate the effectiveness of our proposed method, demonstrating its superiority over state-of-the-art methods.
知识蒸馏在涉及多类单标签学习的场景中显示出相当大的成功。然而,将其直接应用于多标签学习证明是具有挑战性的,因为多标签结构中存在复杂的相关性,导致学生模型忽略了教师模型中存在的更精细结构的语义关系。本文提出了一种多标签感知原型的结构化对比蒸馏方法,包括两个模块:感知原型的对比表征蒸馏(PCRD)和感知原型的交叉图像结构蒸馏。PCRD模块最大限度地利用了师生之间原型感知表征的相互信息,保证了语义表征结构的一致性,提高了班级内表征的紧密性和班级间表征的分散性。在PCSD模块中,我们引入了样本到样本和样本到原型的结构化对比蒸馏,以实现模型原型感知的跨图像结构一致性,指导学生模型在多个实例中与教师保持一致的标签语义结构。为了提高原型制导的稳定性,我们引入了批量动态原型校正来更新类原型。在三个公共基准数据集上的实验结果验证了我们提出的方法的有效性,证明了它优于最先进的方法。
{"title":"Multi-Label Prototype-Aware Structured Contrastive Distillation","authors":"Yuelong Xia;Yihang Tong;Jing Yang;Xiaodi Sun;Yungang Zhang;Huihua Wang;Lijun Yun","doi":"10.26599/TST.2024.9010182","DOIUrl":"https://doi.org/10.26599/TST.2024.9010182","url":null,"abstract":"Knowledge distillation has demonstrated considerable success in scenarios involving multi-class single-label learning. However, its direct application to multi-label learning proves challenging due to complex correlations in multi-label structures, causing student models to overlook more finely structured semantic relations present in the teacher model. In this paper, we present a solution called multi-label prototype-aware structured contrastive distillation, comprising two modules: Prototype-aware Contrastive Representation Distillation (PCRD) and prototype-aware cross-image structure distillation. The PCRD module maximizes the mutual information of prototype-aware representation between the student and teacher, ensuring semantic representation structure consistency to improve the compactness of intra-class and dispersion of inter-class representations. In the PCSD module, we introduce sample-to-sample and sample-to-prototype structured contrastive distillation to model prototype-aware cross-image structure consistency, guiding the student model to maintain a coherent label semantic structure with the teacher across multiple instances. To enhance prototype guidance stability, we introduce batch-wise dynamic prototype correction for updating class prototypes. Experimental results on three public benchmark datasets validate the effectiveness of our proposed method, demonstrating its superiority over state-of-the-art methods.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 4","pages":"1808-1830"},"PeriodicalIF":6.6,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908678","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143535439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FSRPCL: Privacy-Preserve Federated Social Relationship Prediction with Contrastive Learning 基于对比学习的隐私保护联合社会关系预测
IF 6.6 1区 计算机科学 Q1 Multidisciplinary Pub Date : 2025-03-03 DOI: 10.26599/TST.2024.9010077
Hanwen Liu;Nianzhe Li;Huaizhen Kou;Shunmei Meng;Qianmu Li
Cross-Platform Social Relationship Prediction (CPSRP) aims to utilize users' data information on multiple platforms to enhance the performance of social relationship prediction, thereby promoting socio-economic development. Due to the highly sensitive nature of users' data in terms of privacy, CPSRP typically introduces various privacy-preserving mechanisms to safeguard users' confidential information. Although the introduction mechanism guarantees the security of the users' private information, it tends to degrade the performance of the social relationship prediction. Additionally, existing social relationship prediction schemes overlook the interdependencies among items invoked in a user behavior sequence. For this purpose, we propose a novel privacy-preserve Federated Social Relationship Prediction with Contrastive Learning framework called FSRPCL, which is a multi-task learning framework based on vertical federated learning. Specifically, the users' rating information is perturbed with a bounded differential privacy technology, and then the users' sequential representation information acquired through Transformer is applied for social relationship prediction and contrastive learning. Furthermore, each client uploads their respective weight information to the server, and the server aggregates the weight information and distributes it purposes to each client for updating. Numerous experiments on real-world datasets prove that FSRPCL delivers exceptional performance in social relationship prediction and privacy preservation, and effectively minimizes the impact of privacy-preserving technology on social relationship prediction accuracy.
跨平台社会关系预测(CPSRP)旨在利用用户在多个平台上的数据信息,提高社会关系预测的绩效,从而促进社会经济发展。由于用户数据在隐私方面的高度敏感性,CPSRP通常引入各种隐私保护机制来保护用户的机密信息。引入机制虽然保证了用户隐私信息的安全性,但往往会降低社会关系预测的性能。此外,现有的社会关系预测方案忽略了用户行为序列中调用的项目之间的相互依赖性。为此,我们提出了一种基于垂直联邦学习的多任务学习框架FSRPCL,即基于隐私保护的联邦社会关系预测对比学习框架。具体而言,利用有界差分隐私技术对用户评价信息进行扰动,然后将Transformer获取的用户顺序表示信息用于社会关系预测和对比学习。此外,每个客户端将各自的权重信息上传到服务器,服务器将权重信息聚合并分发给每个客户端进行更新。在现实数据集上的大量实验证明,FSRPCL在社会关系预测和隐私保护方面具有优异的性能,有效地降低了隐私保护技术对社会关系预测精度的影响。
{"title":"FSRPCL: Privacy-Preserve Federated Social Relationship Prediction with Contrastive Learning","authors":"Hanwen Liu;Nianzhe Li;Huaizhen Kou;Shunmei Meng;Qianmu Li","doi":"10.26599/TST.2024.9010077","DOIUrl":"https://doi.org/10.26599/TST.2024.9010077","url":null,"abstract":"Cross-Platform Social Relationship Prediction (CPSRP) aims to utilize users' data information on multiple platforms to enhance the performance of social relationship prediction, thereby promoting socio-economic development. Due to the highly sensitive nature of users' data in terms of privacy, CPSRP typically introduces various privacy-preserving mechanisms to safeguard users' confidential information. Although the introduction mechanism guarantees the security of the users' private information, it tends to degrade the performance of the social relationship prediction. Additionally, existing social relationship prediction schemes overlook the interdependencies among items invoked in a user behavior sequence. For this purpose, we propose a novel privacy-preserve Federated Social Relationship Prediction with Contrastive Learning framework called FSRPCL, which is a multi-task learning framework based on vertical federated learning. Specifically, the users' rating information is perturbed with a bounded differential privacy technology, and then the users' sequential representation information acquired through Transformer is applied for social relationship prediction and contrastive learning. Furthermore, each client uploads their respective weight information to the server, and the server aggregates the weight information and distributes it purposes to each client for updating. Numerous experiments on real-world datasets prove that FSRPCL delivers exceptional performance in social relationship prediction and privacy preservation, and effectively minimizes the impact of privacy-preserving technology on social relationship prediction accuracy.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 4","pages":"1762-1781"},"PeriodicalIF":6.6,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908667","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143553139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Tsinghua Science and Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1