首页 > 最新文献

IEEE transactions on artificial intelligence最新文献

英文 中文
Correlation-Guided Information Deep Fusion for Multimodal Recommendation 多模态推荐的关联引导信息深度融合
Pub Date : 2025-08-27 DOI: 10.1109/TAI.2025.3602935
Gang-Feng Ma;Xu-Hua Yang;Peng Jiang
Multimodal recommendation plays a crucial role on online platforms by integrating modalities information such as visual, textual, and audio, which significantly mitigates the sparsity of user–item interaction networks. However, current multimodal recommendation methods primarily enrich item-side representations while neglecting user-side learning. The fusion of structure and information is insufficient. To address these issues, we propose correlation-guided information deep fusion for multimodal recommendation (CIDF). First, we employ graph neural networks to capture collaborative signals based on ID embeddings and multimodal features separately, thereby capturing the independent information of each node’s different representations. Next, we construct the user–user similarity ID graph and the item–item correlation modality graph to capture connection information on user and item sides, respectively. Finally, we propose an information deep fusion method. This method integrates the aforementioned two graphs and the user–item interaction graph, thereby obtaining fused representations for both users and items through the process of information propagation and aggregation on graphs. The fused representations are further updated in user–item interaction graph to obtain node representations that better align with user interaction behaviors. We conducted experiments on real-world datasets, and the results demonstrate that CIDF outperforms state-of-the-art methods in multimodal recommendation.
多模式推荐通过整合视觉、文本和音频等模式信息,在在线平台上发挥着至关重要的作用,显著减轻了用户-物品交互网络的稀疏性。然而,目前的多模态推荐方法主要是丰富了物品端表示,而忽略了用户端学习。结构与信息的融合不足。为了解决这些问题,我们提出了基于关联引导的信息深度融合的多模式推荐(CIDF)。首先,我们利用图神经网络分别捕获基于ID嵌入和多模态特征的协同信号,从而捕获每个节点不同表示的独立信息。接下来,我们构建用户-用户相似ID图和物品-物品相关模态图,分别捕获用户和物品侧的连接信息。最后,提出了一种信息深度融合方法。该方法将上述两个图与用户-物品交互图相结合,通过图上的信息传播和聚合过程,得到用户和物品的融合表示。在用户项交互图中进一步更新融合后的表示,得到更符合用户交互行为的节点表示。我们在真实世界的数据集上进行了实验,结果表明CIDF在多模式推荐方面优于最先进的方法。
{"title":"Correlation-Guided Information Deep Fusion for Multimodal Recommendation","authors":"Gang-Feng Ma;Xu-Hua Yang;Peng Jiang","doi":"10.1109/TAI.2025.3602935","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602935","url":null,"abstract":"Multimodal recommendation plays a crucial role on online platforms by integrating modalities information such as visual, textual, and audio, which significantly mitigates the sparsity of user–item interaction networks. However, current multimodal recommendation methods primarily enrich item-side representations while neglecting user-side learning. The fusion of structure and information is insufficient. To address these issues, we propose correlation-guided information deep fusion for multimodal recommendation (CIDF). First, we employ graph neural networks to capture collaborative signals based on ID embeddings and multimodal features separately, thereby capturing the independent information of each node’s different representations. Next, we construct the user–user similarity ID graph and the item–item correlation modality graph to capture connection information on user and item sides, respectively. Finally, we propose an information deep fusion method. This method integrates the aforementioned two graphs and the user–item interaction graph, thereby obtaining fused representations for both users and items through the process of information propagation and aggregation on graphs. The fused representations are further updated in user–item interaction graph to obtain node representations that better align with user interaction behaviors. We conducted experiments on real-world datasets, and the results demonstrate that CIDF outperforms state-of-the-art methods in multimodal recommendation.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1584-1595"},"PeriodicalIF":0.0,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Certified Local Transferability for Evaluating Adversarial Attacks 评估对抗性攻击的认证局部可转移性
Pub Date : 2025-08-27 DOI: 10.1109/TAI.2025.3602931
Minyu Chen;Jingyang Li;Ling-I Wu;Guoqiang Li
Deep neural networks (DNNs) are known to be vulnerable to adversarial examples. Though adversarial attacks show effectiveness in misleading models, most attack methods are designed to poison a specific image. To investigate the actual effect on the feature space, we introduce the concept of the certified local transferable region. This is a connected area of inputs where we can mathematically guarantee that a single adversarial perturbation will successfully fool the model. The size of this region is a metric to evaluate the local transferability of perturbations. We present a novel method, reverse attack oracle-based search (RAOS), to estimate the maximum size of this region. Our approach efficiently searches for the largest possible vulnerable area around an original input by iteratively refining its boundaries. Each step is guided with a minimal distance attack algorithm and proven with state-of-the-art verifiers. We conduct empirical experiments to evaluate various attacks on different model structures and adversarial training scenarios. We show the advantage of our proposed metric over existing ones and demonstrate its utility in exploring the robustness of neural networks.
众所周知,深度神经网络(dnn)容易受到对抗性示例的攻击。尽管对抗性攻击在误导模型中显示出有效性,但大多数攻击方法都是为了毒害特定的图像。为了研究对特征空间的实际影响,我们引入了认证局部可转移域的概念。这是一个输入的连接区域,我们可以在数学上保证单个对抗性扰动将成功地欺骗模型。该区域的大小是评价扰动局部可转移性的度量。我们提出了一种新的方法——基于反向攻击预言的搜索(RAOS)来估计该区域的最大大小。我们的方法通过迭代优化其边界,有效地搜索原始输入周围最大可能的脆弱区域。每一步都以最小距离攻击算法为指导,并通过最先进的验证器进行验证。我们进行了经验实验来评估不同模型结构和对抗性训练场景下的各种攻击。我们展示了我们提出的度量相对于现有度量的优势,并展示了它在探索神经网络鲁棒性方面的实用性。
{"title":"Certified Local Transferability for Evaluating Adversarial Attacks","authors":"Minyu Chen;Jingyang Li;Ling-I Wu;Guoqiang Li","doi":"10.1109/TAI.2025.3602931","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602931","url":null,"abstract":"Deep neural networks (DNNs) are known to be vulnerable to adversarial examples. Though adversarial attacks show effectiveness in misleading models, most attack methods are designed to poison a specific image. To investigate the actual effect on the feature space, we introduce the concept of the certified local transferable region. This is a connected area of inputs where we can mathematically guarantee that a single adversarial perturbation will successfully fool the model. The size of this region is a metric to evaluate the local transferability of perturbations. We present a novel method, reverse attack oracle-based search (RAOS), to estimate the maximum size of this region. Our approach efficiently searches for the largest possible vulnerable area around an original input by iteratively refining its boundaries. Each step is guided with a minimal distance attack algorithm and proven with state-of-the-art verifiers. We conduct empirical experiments to evaluate various attacks on different model structures and adversarial training scenarios. We show the advantage of our proposed metric over existing ones and demonstrate its utility in exploring the robustness of neural networks.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1574-1583"},"PeriodicalIF":0.0,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Unknown Object Detection in Dynamic Environments Through Dual-Granularity Reconstruction Error Modeling 基于双粒度重构误差建模的动态环境下鲁棒未知目标检测
Pub Date : 2025-08-27 DOI: 10.1109/TAI.2025.3602943
Linhua Ye;Yangyang Huang;Ronghua Luo
Open world object detection (OWOD) aims to detect both known and unknown objects in dynamic environments, where unknown instances lack ground-truth supervision during training. Existing methods typically rely on supervision from known categories, leading models to overconfidently classify visually similar unknowns as known classes, and dissimilar ones as background. This known-class prior bias severely hinders the detection of truly novel objects. To address this challenge, we propose a robust unknown object detection method based on dual-granularity reconstruction error modeling. At the fine-grained level, we propose fine-grained masked reconstruction (FMR), which randomly masks feature regions to guide reconstruction toward semantic structures, thereby improving foreground–background discrimination. At the coarse-grained level, we propose adaptive region-based error aggregation (AREA), which aggregates reconstruction errors over object proposals to enhance the model’s sensitivity to ambiguous semantic boundaries while suppressing local outliers. Furthermore, we perform decoupled probabilistic modeling of foreground and background reconstruction errors, enabling soft estimation of unknown object likelihoods without supervision. Extensive experiments on standard OWOD benchmarks demonstrate that our method consistently outperforms state-of-the-art (SOTA) approaches, achieving a +20.6 improvement in unknown object recall (U-Recall) while maintaining strong performance on known classes.
开放世界目标检测(OWOD)旨在检测动态环境中的已知和未知目标,其中未知实例在训练过程中缺乏ground-truth监督。现有的方法通常依赖于已知类别的监督,导致模型过于自信地将视觉上相似的未知分类为已知类别,而将不相似的分类为背景。这种已知类别的先验偏差严重阻碍了对真正新对象的检测。为了解决这一挑战,我们提出了一种基于双粒度重构误差建模的鲁棒未知目标检测方法。在细粒度层面,我们提出了细粒度掩蔽重建(FMR),该方法随机掩蔽特征区域,引导重建向语义结构方向,从而提高前景和背景的区分能力。在粗粒度级别,我们提出了自适应基于区域的错误聚集(AREA),该方法将对象建议的重建错误聚集在一起,以增强模型对模糊语义边界的敏感性,同时抑制局部异常值。此外,我们对前景和背景重建误差进行解耦概率建模,从而在没有监督的情况下对未知对象的可能性进行软估计。在标准OWOD基准测试上的大量实验表明,我们的方法始终优于最先进的(SOTA)方法,在未知对象召回(U-Recall)方面实现了+20.6的改进,同时在已知类上保持了强大的性能。
{"title":"Robust Unknown Object Detection in Dynamic Environments Through Dual-Granularity Reconstruction Error Modeling","authors":"Linhua Ye;Yangyang Huang;Ronghua Luo","doi":"10.1109/TAI.2025.3602943","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602943","url":null,"abstract":"Open world object detection (OWOD) aims to detect both known and unknown objects in dynamic environments, where unknown instances lack ground-truth supervision during training. Existing methods typically rely on supervision from known categories, leading models to overconfidently classify visually similar unknowns as known classes, and dissimilar ones as background. This known-class prior bias severely hinders the detection of truly novel objects. To address this challenge, we propose a robust unknown object detection method based on dual-granularity reconstruction error modeling. At the fine-grained level, we propose fine-grained masked reconstruction (FMR), which randomly masks feature regions to guide reconstruction toward semantic structures, thereby improving foreground–background discrimination. At the coarse-grained level, we propose adaptive region-based error aggregation (AREA), which aggregates reconstruction errors over object proposals to enhance the model’s sensitivity to ambiguous semantic boundaries while suppressing local outliers. Furthermore, we perform decoupled probabilistic modeling of foreground and background reconstruction errors, enabling soft estimation of unknown object likelihoods without supervision. Extensive experiments on standard OWOD benchmarks demonstrate that our method consistently outperforms state-of-the-art (SOTA) approaches, achieving a +20.6 improvement in unknown object recall (U-Recall) while maintaining strong performance on known classes.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1596-1609"},"PeriodicalIF":0.0,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Annealing Genetic Slicing Adversarial Networks Based Feedback for Imbalanced Visual Classification 基于退火遗传切片对抗网络的不平衡视觉分类反馈
Pub Date : 2025-08-26 DOI: 10.1109/TAI.2025.3602750
Yongting Zhao;Zhifan Gao;Jingyu Hao;Yiwen Wang;Heye Zhang
Data imbalance is a common challenge in real-world applications, often addressed by augmenting minority classes data for a balanced dataset. Generative adversarial networks (GANs) can generate realistic minority class samples in the dynamical adversarial learning way but face limitations due to the optimization falling into local minima. To address this, incorporating simulated annealing into GANs offers a potential method by accepting worse solutions to expand solution exploration. However, it remains underexplored whether every accepting worse solution in the learning dynamics of GAN benefits the learning of minority or majority classes. Therefore, we propose an annealing genetic slicing adversarial network (AGSAN) learning method for visual imbalance classification. It treats adversarial learning as an evolutionary process with the generator undergoing multiple offspring generation, best offspring selection, and individual updating. AGSAN builds a gradient-informed selection mechanism to facilitate individual updating and the best offspring selection, leveraging gradient consistency—measuring similarity between minority classes gradients and overall gradients—to guide optimization. Furthermore, AGSAN expands optimization ranges to facilitate multiple offspring generation by a mixture of multiple adversarial objectives. Additionally, AGSAN ensures the minimization objective function of GAN equals the distance between the generated and target distributions with relaxing the assumption of the optimal discriminator. Compared with 21 existing methods, our AGSAN can achieve state-of-the-art performance on imbalanced classification.
数据不平衡是现实应用程序中常见的挑战,通常通过为平衡数据集增加少数类数据来解决。生成式对抗网络(Generative adversarial networks, GANs)可以通过动态对抗学习的方式生成真实的少数类样本,但由于优化过程陷入局部极小值而存在局限性。为了解决这个问题,将模拟退火结合到gan中,通过接受较差的解来扩展解的探索,提供了一种潜在的方法。然而,在GAN的学习动力学中,是否每一个接受较差的解决方案都有利于少数或多数班级的学习,这一点仍然没有得到充分的探讨。因此,我们提出了一种退火遗传切片对抗网络(AGSAN)学习方法用于视觉不平衡分类。它将对抗学习视为一个进化过程,生成器经历了多子代产生、最佳子代选择和个体更新。AGSAN建立了一个梯度知情的选择机制,以促进个体更新和最佳后代的选择,利用梯度一致性-测量少数类梯度和整体梯度之间的相似性-来指导优化。此外,AGSAN扩展了优化范围,通过多个对抗目标的混合来促进多个后代的产生。此外,AGSAN保证了GAN的最小化目标函数等于生成分布与目标分布之间的距离,放宽了最优鉴别器的假设。与现有的21种方法相比,我们的AGSAN在不平衡分类上达到了最先进的性能。
{"title":"Annealing Genetic Slicing Adversarial Networks Based Feedback for Imbalanced Visual Classification","authors":"Yongting Zhao;Zhifan Gao;Jingyu Hao;Yiwen Wang;Heye Zhang","doi":"10.1109/TAI.2025.3602750","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602750","url":null,"abstract":"Data imbalance is a common challenge in real-world applications, often addressed by augmenting minority classes data for a balanced dataset. Generative adversarial networks (GANs) can generate realistic minority class samples in the dynamical adversarial learning way but face limitations due to the optimization falling into local minima. To address this, incorporating simulated annealing into GANs offers a potential method by accepting worse solutions to expand solution exploration. However, it remains underexplored whether every accepting worse solution in the learning dynamics of GAN benefits the learning of minority or majority classes. Therefore, we propose an annealing genetic slicing adversarial network (AGSAN) learning method for visual imbalance classification. It treats adversarial learning as an evolutionary process with the generator undergoing multiple offspring generation, best offspring selection, and individual updating. AGSAN builds a gradient-informed selection mechanism to facilitate individual updating and the best offspring selection, leveraging gradient consistency—measuring similarity between minority classes gradients and overall gradients—to guide optimization. Furthermore, AGSAN expands optimization ranges to facilitate multiple offspring generation by a mixture of multiple adversarial objectives. Additionally, AGSAN ensures the minimization objective function of GAN equals the distance between the generated and target distributions with relaxing the assumption of the optimal discriminator. Compared with 21 existing methods, our AGSAN can achieve state-of-the-art performance on imbalanced classification.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1546-1561"},"PeriodicalIF":0.0,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Steganography in Large Language Models 大型语言模型中的隐写
Pub Date : 2025-08-26 DOI: 10.1109/TAI.2025.3602763
Xinxin Li;Zichi Wang;Xinpeng Zhang
The development of deep learning has provided new momentum for steganography. However, existing model steganography methods are generally applicable to convolutional neural networks models, which suffer from low embedding capacity and poor robustness. To this end, we propose a stego scheme designed for large language models based on the transformer architecture. Using the powerful feature representation ability and multilayer self-attention mechanism of transformer, a large amount of secret data can be embedded without significantly affecting the performance of the model. In our scheme, the sender uses matrix multiplication to encode the coverage parameters of a specific transformer block to embed secret data during the training process of the large language models. Ordinary users can use the stego model for text classification, text generation, and other routine tasks, while receivers can use a secret key to extract the secret data from the stego model, allowing for covert communication of secret data. Experimental results affirm the efficacy of our scheme in terms of embedding capacity, undetectability, and robustness.
深度学习的发展为隐写术提供了新的动力。然而,现有的模型隐写方法一般适用于卷积神经网络模型,其嵌入容量低,鲁棒性差。为此,我们提出了一种基于转换器体系结构的针对大型语言模型的隐写方案。利用变压器强大的特征表示能力和多层自关注机制,可以在不显著影响模型性能的情况下嵌入大量的秘密数据。在我们的方案中,在大型语言模型的训练过程中,发送方使用矩阵乘法对特定变压器块的覆盖参数进行编码,以嵌入秘密数据。普通用户可以使用stego模型进行文本分类、文本生成和其他常规任务,而接收者可以使用密钥从stego模型中提取秘密数据,从而允许秘密数据的秘密通信。实验结果证实了该方法在嵌入容量、不可检测性和鲁棒性方面的有效性。
{"title":"Steganography in Large Language Models","authors":"Xinxin Li;Zichi Wang;Xinpeng Zhang","doi":"10.1109/TAI.2025.3602763","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602763","url":null,"abstract":"The development of deep learning has provided new momentum for steganography. However, existing model steganography methods are generally applicable to convolutional neural networks models, which suffer from low embedding capacity and poor robustness. To this end, we propose a stego scheme designed for large language models based on the transformer architecture. Using the powerful feature representation ability and multilayer self-attention mechanism of transformer, a large amount of secret data can be embedded without significantly affecting the performance of the model. In our scheme, the sender uses matrix multiplication to encode the coverage parameters of a specific transformer block to embed secret data during the training process of the large language models. Ordinary users can use the stego model for text classification, text generation, and other routine tasks, while receivers can use a secret key to extract the secret data from the stego model, allowing for covert communication of secret data. Experimental results affirm the efficacy of our scheme in terms of embedding capacity, undetectability, and robustness.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1562-1573"},"PeriodicalIF":0.0,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Fuzzy Distributed Optimal Fault Tolerant Control of Nonlinear Multi-Agent Systems Under Weight-Unbalanced Directed Graphs 权重不平衡有向图下非线性多智能体系统的自适应模糊分布式最优容错控制
Pub Date : 2025-08-25 DOI: 10.1109/TAI.2025.3602015
Mengyuan Cui;Yi Zuo;Shaocheng Tong
The adaptive fuzzy distributed optimal fault tolerant control (FTC) problem is investigated for high-order nonlinear multi-agent (NMA) systems under weight-unbalanced directed graph. Since the optimization point of the high-order NMA systems considered in this study is unknown, the optimal signal generator is formulated to obtain the optimization point. Then, a fuzzy state observer is established to estimate unmeasurable states. Based on the designed optimal signal generator and fuzzy state observer, an adaptive fuzzy distributed optimal output-feedback FTC scheme is proposed by using backstepping control technology. It is proved that the NMA system is asymptotically stable, and the global cost function is minimized. Finally, we apply the proposed adaptive fuzzy distributed optimal output-feedback FTC approach to nonholonomic mobile robots with two actuated wheels, the simulation results and comparison results verify its effectiveness.
研究了权重不平衡有向图下高阶非线性多智能体系统的自适应模糊分布式最优容错控制问题。由于本研究所考虑的高阶NMA系统的最优点是未知的,因此制定了最优信号发生器来获得最优点。然后,建立一个模糊状态观测器来估计不可测状态。基于所设计的最优信号发生器和模糊状态观测器,采用反步控制技术,提出了一种自适应模糊分布式最优输出反馈FTC方案。证明了该NMA系统是渐近稳定的,且全局代价函数是最小的。最后,将所提出的自适应模糊分布最优输出反馈FTC方法应用于具有两个驱动轮的非完整移动机器人,仿真结果和对比结果验证了其有效性。
{"title":"Adaptive Fuzzy Distributed Optimal Fault Tolerant Control of Nonlinear Multi-Agent Systems Under Weight-Unbalanced Directed Graphs","authors":"Mengyuan Cui;Yi Zuo;Shaocheng Tong","doi":"10.1109/TAI.2025.3602015","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602015","url":null,"abstract":"The adaptive fuzzy distributed optimal fault tolerant control (FTC) problem is investigated for high-order nonlinear multi-agent (NMA) systems under weight-unbalanced directed graph. Since the optimization point of the high-order NMA systems considered in this study is unknown, the optimal signal generator is formulated to obtain the optimization point. Then, a fuzzy state observer is established to estimate unmeasurable states. Based on the designed optimal signal generator and fuzzy state observer, an adaptive fuzzy distributed optimal output-feedback FTC scheme is proposed by using backstepping control technology. It is proved that the NMA system is asymptotically stable, and the global cost function is minimized. Finally, we apply the proposed adaptive fuzzy distributed optimal output-feedback FTC approach to nonholonomic mobile robots with two actuated wheels, the simulation results and comparison results verify its effectiveness.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1512-1521"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSAF: Multimodal Sentiment Detection via Multiscale Adaptive Fusion 基于多尺度自适应融合的多模态情感检测
Pub Date : 2025-08-25 DOI: 10.1109/TAI.2025.3602409
Jihong Guan;Yulou Shu;Wuchao Liu;Wengen Li;Shuigeng Zhou;Yichao Zhang
With the rapid increase of multimodal comments on social media, multimodal sentiment detection has become increasingly important. However, most existing methods overlook the difference in information density between text and images, and fall short in fully utilizing multiscale information in images. To address this issue, we propose a multiscale adaptive fusion model termed MSAF for multimodal sentiment detection. MSAF first extracts fine- and coarse-scale features of images through a multiscale visual encoder and uses a multiscale adaptive pooling module to adaptively adjust the weights of different regional features. Then, MSAF incorporates multiscale contrastive learning and multiscale rivalry tasks to ensure that the model retains associations between features at different scales while maintaining their diversity. These features are sequentially fused with text through a hierarchical fusion encoder guided by textual information, enabling MSAF to focus on sentiment-salient regions in the image. Finally, the multimodal fusion embeddings are fed into a classifier to predict the sentiment. Extensive experiments on multiple public datasets demonstrate the effectiveness and superiority of MSAF.
随着社交媒体上多模态评论的快速增加,多模态情感检测变得越来越重要。然而,现有的方法大多忽略了文本和图像之间信息密度的差异,未能充分利用图像中的多尺度信息。为了解决这个问题,我们提出了一种称为MSAF的多尺度自适应融合模型,用于多模态情感检测。MSAF首先通过多尺度视觉编码器提取图像的精细尺度和粗尺度特征,并使用多尺度自适应池化模块自适应调整不同区域特征的权重。然后,MSAF结合了多尺度对比学习和多尺度竞争任务,以确保模型在保留不同尺度特征之间的关联的同时保持其多样性。这些特征通过文本信息引导的分层融合编码器依次与文本融合,使MSAF能够专注于图像中的情感显著区域。最后,将多模态融合嵌入到分类器中进行情感预测。在多个公共数据集上的大量实验证明了MSAF的有效性和优越性。
{"title":"MSAF: Multimodal Sentiment Detection via Multiscale Adaptive Fusion","authors":"Jihong Guan;Yulou Shu;Wuchao Liu;Wengen Li;Shuigeng Zhou;Yichao Zhang","doi":"10.1109/TAI.2025.3602409","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602409","url":null,"abstract":"With the rapid increase of multimodal comments on social media, multimodal sentiment detection has become increasingly important. However, most existing methods overlook the difference in information density between text and images, and fall short in fully utilizing multiscale information in images. To address this issue, we propose a multiscale adaptive fusion model termed MSAF for multimodal sentiment detection. MSAF first extracts fine- and coarse-scale features of images through a multiscale visual encoder and uses a multiscale adaptive pooling module to adaptively adjust the weights of different regional features. Then, MSAF incorporates multiscale contrastive learning and multiscale rivalry tasks to ensure that the model retains associations between features at different scales while maintaining their diversity. These features are sequentially fused with text through a hierarchical fusion encoder guided by textual information, enabling MSAF to focus on sentiment-salient regions in the image. Finally, the multimodal fusion embeddings are fed into a classifier to predict the sentiment. Extensive experiments on multiple public datasets demonstrate the effectiveness and superiority of MSAF.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1533-1545"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299704","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
P-Mix: A Data Augmentation Method for Contrastive Learning Based Human Activity Recognition P-Mix:一种基于对比学习的人类活动识别的数据增强方法
Pub Date : 2025-08-25 DOI: 10.1109/TAI.2025.3601599
Yingjie Chen;Qi Xie;Wenxuan Cui;Liming Chen;Houbing Herbert Song;Tao Zhu
Supervised human activity recognition (HAR) with sensor data typically demands substantial labeled datasets to train robust models. Contrastive learning offers a self-supervised alternative by leveraging data augmentation to improve representation learning. However, most existing augmentation methods operate independently on either the time or channel dimension and often introduce unstructured noise, which can distort meaningful temporal and spectral patterns. To address these limitations, we present a novel P-Mix data augmentation method for contrastive learning in HAR tasks, specifically designed to be compatible with the SimCLR framework. P-Mix is a customized data augmentation method tailored to sensor data for HAR, which slices and recombines both the time and channel dimensions, merging multiple temporal segments to encourage the model to explore the underlying relationships and variations in the data in an unsupervised setting. To capture motion cycles and long-term dependencies, we employ shorter temporal segments as fundamental processing units along the time dimension. By incorporating structured noise patterns based on motion cycle characteristics within these segments, we effectively enhance the model’s robustness and generalization capabilities. Extensive evaluations across five HAR benchmarks demonstrate that P-Mix achieves consistent improvements over the strongest baseline (resample), delivering relative F1-score gains ranging from 1.87% (USC-HAD: 85.63% versus 83.93%) to 6.53% (DSADS: 97.24% versus 91.28%) through controlled multidimensional fusion. These results demonstrate the effectiveness of our approach in optimizing data generation and augmentation strategies for HAR tasks.
具有传感器数据的监督式人类活动识别(HAR)通常需要大量标记数据集来训练鲁棒模型。对比学习通过利用数据增强来改进表征学习,提供了一种自我监督的替代方法。然而,大多数现有的增强方法在时间或信道维度上独立工作,并且经常引入非结构化噪声,这可能会扭曲有意义的时间和频谱模式。为了解决这些限制,我们提出了一种新的P-Mix数据增强方法,用于HAR任务中的对比学习,专门设计用于与SimCLR框架兼容。P-Mix是一种定制的数据增强方法,专为HAR传感器数据量身定制,它对时间和通道维度进行切片和重组,合并多个时间段,以鼓励模型在无监督环境下探索数据中的潜在关系和变化。为了捕获运动周期和长期依赖关系,我们采用较短的时间段作为沿时间维度的基本处理单元。通过在这些片段中结合基于运动周期特征的结构化噪声模式,我们有效地增强了模型的鲁棒性和泛化能力。对五个HAR基准的广泛评估表明,P-Mix在最强基线(样本)上取得了一致的改进,通过可控的多维融合,相对f1分数的提高范围从1.87% (USC-HAD: 85.63%对83.93%)到6.53% (DSADS: 97.24%对91.28%)。这些结果证明了我们的方法在优化HAR任务的数据生成和增强策略方面的有效性。
{"title":"P-Mix: A Data Augmentation Method for Contrastive Learning Based Human Activity Recognition","authors":"Yingjie Chen;Qi Xie;Wenxuan Cui;Liming Chen;Houbing Herbert Song;Tao Zhu","doi":"10.1109/TAI.2025.3601599","DOIUrl":"https://doi.org/10.1109/TAI.2025.3601599","url":null,"abstract":"Supervised human activity recognition (HAR) with sensor data typically demands substantial labeled datasets to train robust models. Contrastive learning offers a self-supervised alternative by leveraging data augmentation to improve representation learning. However, most existing augmentation methods operate independently on either the time or channel dimension and often introduce unstructured noise, which can distort meaningful temporal and spectral patterns. To address these limitations, we present a novel P-Mix data augmentation method for contrastive learning in HAR tasks, specifically designed to be compatible with the SimCLR framework. P-Mix is a customized data augmentation method tailored to sensor data for HAR, which slices and recombines both the time and channel dimensions, merging multiple temporal segments to encourage the model to explore the underlying relationships and variations in the data in an unsupervised setting. To capture motion cycles and long-term dependencies, we employ shorter temporal segments as fundamental processing units along the time dimension. By incorporating structured noise patterns based on motion cycle characteristics within these segments, we effectively enhance the model’s robustness and generalization capabilities. Extensive evaluations across five HAR benchmarks demonstrate that P-Mix achieves consistent improvements over the strongest baseline (resample), delivering relative F1-score gains ranging from 1.87% (USC-HAD: 85.63% versus 83.93%) to 6.53% (DSADS: 97.24% versus 91.28%) through controlled multidimensional fusion. These results demonstrate the effectiveness of our approach in optimizing data generation and augmentation strategies for HAR tasks.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1500-1511"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Novel Ensemble Method Based on Support Vector Dynamic Learning Neural Network for Breast Cancer Diagnosis 基于支持向量动态学习神经网络的乳腺癌诊断集成方法
Pub Date : 2025-08-25 DOI: 10.1109/TAI.2025.3602017
Zhijun Zhang;Yong Ding;Jian Zhang
To improve the accuracy of breast cancer diagnosis and reduce examination costs, a novel ensemble learning method called support vector dynamic learning neural network (SVDL) is proposed in this article. The method first constructs the breast cancer classification problem as a standard quadratic programming (QP) problem based on support vector machine (SVM). Then, a novel dynamic learning neural network (DLNN) solver is designed to solve this problem to obtain the optimal diagnosis model. The experimental results on the Wisconsin diagnostic breast cancer dataset show that the proposed method is superior to traditional and state-of-the-art machine learning methods, achieving the best accuracy ($boldsymbol{98.59}%$) and area under curve value (0.9956), as well as high specificity ($boldsymbol{98.85}%$) and sensitivity ($boldsymbol{98.18}%$). It demonstrates that the proposed method has good classification performance. Furthermore, the proposed method may further enhance the model performance by introducing swarm intelligence algorithm to search for the optimal value of model parameters, which will contribute to the diagnosis of breast cancer and other diseases as well.
为了提高乳腺癌诊断的准确性和降低检查成本,本文提出了一种新的集成学习方法——支持向量动态学习神经网络(SVDL)。该方法首先将乳腺癌分类问题构建为基于支持向量机的标准二次规划(QP)问题。然后,设计了一种新的动态学习神经网络求解器来求解这一问题,以获得最优诊断模型。在威斯康星州诊断乳腺癌数据集上的实验结果表明,所提出的方法优于传统和最先进的机器学习方法,获得了最佳的准确率($boldsymbol{98.59}%$)和曲线下面积(0.9956),以及高特异性($boldsymbol{98.85}%$)和灵敏度($boldsymbol{98.18}%$)。结果表明,该方法具有良好的分类性能。此外,该方法还可以通过引入群体智能算法来搜索模型参数的最优值,从而进一步提高模型的性能,为乳腺癌等疾病的诊断提供帮助。
{"title":"A Novel Ensemble Method Based on Support Vector Dynamic Learning Neural Network for Breast Cancer Diagnosis","authors":"Zhijun Zhang;Yong Ding;Jian Zhang","doi":"10.1109/TAI.2025.3602017","DOIUrl":"https://doi.org/10.1109/TAI.2025.3602017","url":null,"abstract":"To improve the accuracy of breast cancer diagnosis and reduce examination costs, a novel ensemble learning method called support vector dynamic learning neural network (SVDL) is proposed in this article. The method first constructs the breast cancer classification problem as a standard quadratic programming (QP) problem based on support vector machine (SVM). Then, a novel dynamic learning neural network (DLNN) solver is designed to solve this problem to obtain the optimal diagnosis model. The experimental results on the Wisconsin diagnostic breast cancer dataset show that the proposed method is superior to traditional and state-of-the-art machine learning methods, achieving the best accuracy (<inline-formula><tex-math>$boldsymbol{98.59}%$</tex-math></inline-formula>) and area under curve value (0.9956), as well as high specificity (<inline-formula><tex-math>$boldsymbol{98.85}%$</tex-math></inline-formula>) and sensitivity (<inline-formula><tex-math>$boldsymbol{98.18}%$</tex-math></inline-formula>). It demonstrates that the proposed method has good classification performance. Furthermore, the proposed method may further enhance the model performance by introducing swarm intelligence algorithm to search for the optimal value of model parameters, which will contribute to the diagnosis of breast cancer and other diseases as well.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1522-1532"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MFENet: Multiinformation Feature Enhancement Network for Vehicle Reidentification 车辆再识别的多信息特征增强网络
Pub Date : 2025-08-25 DOI: 10.1109/TAI.2025.3601594
Zhangwei Li;Yuhui Deng;Ke Wang;Junhao Huang;Zhimin Tang;Weiping Ding
Vehicle reidentification (Re-ID) targets cross-camera image retrieval and is a widely used technology in intelligent transportation systems. Current Re-ID methods primarily enhance feature extraction by focusing on either global or local features, but they often fail to effectively leverage diverse information. To address these limitations, we propose a multiinformation feature enhancement network (MFENet) that integrates diverse information types to enhance feature representation and boost model accuracy. Specifically, 1) a coarse-grained feature enhancement (CFE) module is employed to remove background influence on image features. This module filters the background, enabling the network model to extract more accurate vehicle features, such as color and model. 2) A fine-grained feature enhancement (FFE) module collects detailed information about vehicles by extracting features from subtle areas (e.g., vehicle lights and rearview mirrors) of an image, providing more unique clues about the vehicle. 3) A latent feature enhancement (LFE) module is designed to mine latent features and enrich vehicle features using nonvisual cues, such as the vehicle’s camera and orientation, without relying on image information. Extensive experiments on vehicle Re-ID datasets demonstrate that MFENet outperforms most existing methods.
车辆再识别(Re-ID)以跨摄像头图像检索为目标,是智能交通系统中广泛应用的技术。当前的Re-ID方法主要通过关注全局或局部特征来增强特征提取,但它们往往不能有效地利用多样化的信息。为了解决这些限制,我们提出了一个多信息特征增强网络(MFENet),它集成了不同的信息类型,以增强特征表示和提高模型精度。具体而言,1)采用粗粒度特征增强(CFE)模块去除背景对图像特征的影响。该模块对背景进行过滤,使网络模型能够提取更准确的车辆特征,如颜色、车型等。2)细粒度特征增强(FFE)模块通过从图像的细微区域(如车灯和后视镜)提取特征来收集车辆的详细信息,提供更多关于车辆的独特线索。3)设计潜在特征增强(latent feature enhancement, LFE)模块,在不依赖图像信息的情况下,利用车辆的摄像头和方向等非视觉线索挖掘潜在特征,丰富车辆特征。在车辆Re-ID数据集上的大量实验表明,MFENet优于大多数现有方法。
{"title":"MFENet: Multiinformation Feature Enhancement Network for Vehicle Reidentification","authors":"Zhangwei Li;Yuhui Deng;Ke Wang;Junhao Huang;Zhimin Tang;Weiping Ding","doi":"10.1109/TAI.2025.3601594","DOIUrl":"https://doi.org/10.1109/TAI.2025.3601594","url":null,"abstract":"Vehicle reidentification (Re-ID) targets cross-camera image retrieval and is a widely used technology in intelligent transportation systems. Current Re-ID methods primarily enhance feature extraction by focusing on either global or local features, but they often fail to effectively leverage diverse information. To address these limitations, we propose a multiinformation feature enhancement network (MFENet) that integrates diverse information types to enhance feature representation and boost model accuracy. Specifically, 1) a coarse-grained feature enhancement (CFE) module is employed to remove background influence on image features. This module filters the background, enabling the network model to extract more accurate vehicle features, such as color and model. 2) A fine-grained feature enhancement (FFE) module collects detailed information about vehicles by extracting features from subtle areas (e.g., vehicle lights and rearview mirrors) of an image, providing more unique clues about the vehicle. 3) A latent feature enhancement (LFE) module is designed to mine latent features and enrich vehicle features using nonvisual cues, such as the vehicle’s camera and orientation, without relying on image information. Extensive experiments on vehicle Re-ID datasets demonstrate that MFENet outperforms most existing methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 3","pages":"1487-1499"},"PeriodicalIF":0.0,"publicationDate":"2025-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147299718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on artificial intelligence
全部 Carbon Balance Manage. Appl. Clay Sci. Ann. Glaciol. Geochem. J. Chem. Ecol. Energy Environ. Global Biogeochem. Cycles npj Clim. Atmos. Sci. ACTA GEOL POL J APPL METEOROL CLIM Big Earth Data ERN: Other Macroeconomics: Aggregative Models (Topic) OCEAN SCI J Conserv. Biol. Bull. Geol. Soc. Den. Space Weather Nat. Hazards Earth Syst. Sci. GROUNDWATER Ann. Phys. Adv. Meteorol. Prog. Oceanogr. 航空科学与技术(英文) THALASSAS Q. J. R. Meteorolog. Soc. ITAL J REMOTE SENS Environ. Geochem. Health GEOHERITAGE J. Hydrol. J. Meteorolog. Res. [1993] Proceedings Eighth Annual IEEE Symposium on Logic in Computer Science Environmental Claims Journal ECOLOGY REV BRAS PALEONTOLOG ACTA NEUROL BELG ASTROBIOLOGY European Journal of Chemistry 2011 IEEE 2nd International Conference on Computing, Control and Industrial Engineering ACTA CHIR BELG Geosci. Front. Chin. J. Phys. Precambrian Res. Acta Neuropathol. essentia law Merchant Shipping Act 1995 N. Z. J. Geol. Geophys. 2009 16th International Conference on Industrial Engineering and Engineering Management HOLOCENE J CAVE KARST STUD Int. Geol. Rev. Ocean Sci. J. Struct. Geol. 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip J. Electron. Spectrosc. Relat. Phenom. ACTA CLIN CROAT IEEE Magn. Lett. J. Nanophotonics Environmental Health Insights Am. Mineral. GEOCHEM GEOPHY GEOSY Rev. Geophys. ERN: Other IO: Empirical Studies of Firms & Markets (Topic) ACTA RADIOL ALCHERINGA J PHYS A-MATH THEOR ENVIRON ENG GEOSCI Yan Ke Xue Bao (Hong Kong) ECOTOXICOLOGY ERN: Other Microeconomics: General Equilibrium & Disequilibrium Models of Financial Markets (Topic) Aerosp. Med. Hum. Perform. Geobiology Int. J. Geog. Inf. Sci. Atmos. Chem. Phys. NEUES JB MINER ABH Am. J. Phys. Anthropol. 2012 SC Companion: High Performance Computing, Networking Storage and Analysis Org. Geochem. Atmos. Meas. Tech. Lith. J. Phys. Geosci. J. Am. J. Sci. J QUANT SPECTROSC RA J. Atmos. Chem. Mineral. Mag. BIOGEOSCIENCES Environ. Technol. Innovation Geochim. Cosmochim. Acta IZV-PHYS SOLID EART+ Acta Geophys. ACTA PETROL SIN 2010 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST) Geostand. Geoanal. Res. ARCT ANTARCT ALP RES Asia-Pac. J. Atmos. Sci. Aquat. Geochem. Basin Res. J. Clim. AAPG Bull. ATMOSPHERE-BASEL ACTA GEOL SIN-ENGL Acta Geochimica Annu. Rev. Earth Planet. Sci.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1