IEEE transactions on artificial intelligence最新文献_第9页

Weighted Concept Factorization Based Incomplete Multi-view Clustering 基于加权概念因式分解的不完整多视角聚类

IEEE transactions on artificial intelligence

Pub Date : 2024-07-25 DOI: 10.1109/TAI.2024.3433379

Ghufran Ahmad Khan;Jalaluddin Khan;Taushif Anwar;Zubair Ashraf;Mohammad Hafeez Javed;Bassoma Diallo

The primary objective of classical multiview clustering (MVC) is to categorize data into separate clusters under the assumption that all perspectives are completely available. However, in practical situations, it is common to encounter cases where not all viewpoints of the data are accessible. This limitation can impede the effectiveness of traditional MVC methods. The incompleteness of the clustering of multiview data has witnessed substantial progress in recent years due to its promising applications. In response to the aforementioned issue, we have tackled it by introducing an inventive MVC algorithm that is tailored to handle incomplete data from various views. Additionally, we have proposed a distinct objective function that leverages a weighted concept factorization technique to address the absence of data instances within each incomplete perspective. To address inconsistencies between different views, we introduced a coregularization factor, which operates in conjunction with a shared consensus matrix. It is important to highlight that the proposed objective function is intrinsically nonconvex, presenting challenges in terms of optimization. To secure the optimal solution for this objective function, we have implemented an iterative optimization approach to reach the local minima for our method. To underscore the efficacy and validation of our approach, we experimented with real-world datasets and used state-of-the-art methods to perform comparative assessments.

经典多视角聚类（MVC）的主要目的是在假设所有视角都完全可用的情况下，将数据归类到不同的聚类中。然而，在实际情况中，经常会遇到并非数据的所有视角都可访问的情况。这种限制会妨碍传统 MVC 方法的有效性。近年来，多视角数据聚类的不完整性因其广阔的应用前景而取得了长足的进步。针对上述问题，我们引入了一种创造性的 MVC 算法，专门用于处理来自不同视图的不完整数据。此外，我们还提出了一个独特的目标函数，利用加权概念因式分解技术来解决每个不完整视角中缺乏数据实例的问题。为了解决不同观点之间的不一致性，我们引入了一个核心模块化因子，该因子与共享共识矩阵共同发挥作用。需要强调的是，所提出的目标函数本质上是非凸的，这给优化带来了挑战。为了确保该目标函数的最优解，我们采用了迭代优化方法，以达到我们方法的局部最小值。为了强调我们方法的有效性和验证，我们使用真实世界的数据集进行了实验，并使用最先进的方法进行了比较评估。

{"title":"Weighted Concept Factorization Based Incomplete Multi-view Clustering","authors":"Ghufran Ahmad Khan;Jalaluddin Khan;Taushif Anwar;Zubair Ashraf;Mohammad Hafeez Javed;Bassoma Diallo","doi":"10.1109/TAI.2024.3433379","DOIUrl":"https://doi.org/10.1109/TAI.2024.3433379","url":null,"abstract":"The primary objective of classical multiview clustering (MVC) is to categorize data into separate clusters under the assumption that all perspectives are completely available. However, in practical situations, it is common to encounter cases where not all viewpoints of the data are accessible. This limitation can impede the effectiveness of traditional MVC methods. The incompleteness of the clustering of multiview data has witnessed substantial progress in recent years due to its promising applications. In response to the aforementioned issue, we have tackled it by introducing an inventive MVC algorithm that is tailored to handle incomplete data from various views. Additionally, we have proposed a distinct objective function that leverages a weighted concept factorization technique to address the absence of data instances within each incomplete perspective. To address inconsistencies between different views, we introduced a coregularization factor, which operates in conjunction with a shared consensus matrix. It is important to highlight that the proposed objective function is intrinsically nonconvex, presenting challenges in terms of optimization. To secure the optimal solution for this objective function, we have implemented an iterative optimization approach to reach the local minima for our method. To underscore the efficacy and validation of our approach, we experimented with real-world datasets and used state-of-the-art methods to perform comparative assessments.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5699-5708"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Broad Siamese Network for Facial Beauty Prediction 用于面部美感预测的广义连体网络

IEEE transactions on artificial intelligence

Pub Date : 2024-07-24 DOI: 10.1109/TAI.2024.3429293

Yikai Li;Tong Zhang;C. L. Philip Chen

Facial beauty prediction (FBP) aims to automatically predict beauty scores of facial images according to human perception. Usually, facial images contain lots of information irrelevant to facial beauty, such as information about pose, emotion, and illumination, which interferes with the prediction of facial beauty. To overcome interferences, we develop a broad Siamese network (BSN) to focus more on the task of beauty prediction. Specifically, BSN consists mainly of three components: a multitask Siamese network (MTSN), a multilayer attention (MLA) module, and a broad representation learning (BRL) module. First, MTSN is proposed with different tasks about facial beauty to fully mine knowledge about attractiveness and guide the network to neglect interference information. In the subnetwork of MTSN, the MLA module is proposed to focus more on salient features about facial beauty and reduce the impact of interference information. Then, the BRL module based on broad learning system (BLS) is developed to learn discriminative features with the guidance of beauty scores. It further releases facial features from the impact of interference information. Comparisons with state-of-the-art methods demonstrate the effectiveness of BSN.

面部美感预测（FBP）旨在根据人的感知自动预测面部图像的美感分数。通常，面部图像包含大量与面部美感无关的信息，如姿势、情感和光照等信息，这些信息会干扰面部美感预测。为了克服干扰，我们开发了广义连体网络（BSN），使其更专注于美感预测任务。具体来说，BSN 主要由三部分组成：多任务连体网络（MTSN）、多层注意（MLA）模块和广义表征学习（BRL）模块。首先，MTSN 提出了不同的面部美感任务，以充分挖掘有关吸引力的知识，并引导网络忽略干扰信息。在 MTSN 的子网络中，提出了 MLA 模块，以更加关注面部美的突出特征，减少干扰信息的影响。然后，开发了基于广泛学习系统（BLS）的 BRL 模块，在美貌评分的指导下学习辨别特征。它进一步使面部特征不受干扰信息的影响。与最先进方法的比较证明了 BSN 的有效性。

{"title":"Broad Siamese Network for Facial Beauty Prediction","authors":"Yikai Li;Tong Zhang;C. L. Philip Chen","doi":"10.1109/TAI.2024.3429293","DOIUrl":"https://doi.org/10.1109/TAI.2024.3429293","url":null,"abstract":"Facial beauty prediction (FBP) aims to automatically predict beauty scores of facial images according to human perception. Usually, facial images contain lots of information irrelevant to facial beauty, such as information about pose, emotion, and illumination, which interferes with the prediction of facial beauty. To overcome interferences, we develop a broad Siamese network (BSN) to focus more on the task of beauty prediction. Specifically, BSN consists mainly of three components: a multitask Siamese network (MTSN), a multilayer attention (MLA) module, and a broad representation learning (BRL) module. First, MTSN is proposed with different tasks about facial beauty to fully mine knowledge about attractiveness and guide the network to neglect interference information. In the subnetwork of MTSN, the MLA module is proposed to focus more on salient features about facial beauty and reduce the impact of interference information. Then, the BRL module based on broad learning system (BLS) is developed to learn discriminative features with the guidance of beauty scores. It further releases facial features from the impact of interference information. Comparisons with state-of-the-art methods demonstrate the effectiveness of BSN.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5786-5800"},"PeriodicalIF":0.0,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CycleGAN*: Collaborative AI Learning With Improved Adversarial Neural Networks for Multimodalities Data CycleGAN*：利用改进的对抗神经网络进行多模态数据的人工智能协作学习

IEEE transactions on artificial intelligence

Pub Date : 2024-07-23 DOI: 10.1109/TAI.2024.3432856

Yibo He;Kah Phooi Seng;Li Minn Ang

With the widespread adoption of generative adversarial networks (GANs) for sample generation, this article aims to enhance adversarial neural networks to facilitate collaborative artificial intelligence (AI) learning which has been specifically tailored to handle datasets containing multimodalities. Currently, a significant portion of the literature is dedicated to sample generation using GANs, with the objective of enhancing the detection performance of machine learning (ML) classifiers through the incorporation of these generated data into the original training set via adversarial training. The quality of the generated adversarial samples is contingent upon the sufficiency of training data samples. However, in the multimodal domain, the scarcity of multimodal data poses a challenge due to resource constraints. In this article, we address this challenge by proposing a new multimodal dataset generation approach based on the classical audio–visual speech recognition (AVSR) task, utilizing CycleGAN, DiscoGAN, and StyleGAN2 for exploration and performance comparison. AVSR experiments are conducted using the LRS2 and LRS3 corpora. Our experiments reveal that CycleGAN, DiscoGAN, and StyleGAN2 do not effectively address the low-data state problem in AVSR classification. Consequently, we introduce an enhanced model, CycleGAN*, based on the original CycleGAN, which efficiently learns the original dataset features and generates high-quality multimodal data. Experimental results demonstrate that the multimodal datasets generated by our proposed CycleGAN* exhibit significant improvement in word error rate (WER), indicating reduced errors. Notably, the images produced by CycleGAN* exhibit a marked enhancement in overall visual clarity, indicative of its superior generative capabilities. Furthermore, in contrast to traditional approaches, we underscore the significance of collaborative learning. We implement co-training with diverse multimodal data to facilitate information sharing and complementary learning across modalities. This collaborative approach enhances the model’s capability to integrate heterogeneous information, thereby boosting its performance in multimodal environments.

随着生成式对抗网络（GAN）在样本生成方面的广泛应用，本文旨在增强对抗神经网络，以促进协作式人工智能（AI）学习，这种学习是专门为处理包含多模态的数据集而量身定制的。目前，有相当一部分文献致力于使用 GAN 生成样本，目的是通过对抗训练将这些生成的数据纳入原始训练集，从而提高机器学习（ML）分类器的检测性能。生成的对抗样本的质量取决于训练数据样本是否充足。然而，在多模态领域，由于资源限制，多模态数据的稀缺性带来了挑战。在本文中，我们提出了一种基于经典视听语音识别（AVSR）任务的新的多模态数据集生成方法，利用 CycleGAN、DiscoGAN 和 StyleGAN2 进行探索和性能比较，从而应对这一挑战。AVSR 实验使用 LRS2 和 LRS3 语料库进行。实验结果表明，CycleGAN、DiscoGAN 和 StyleGAN2 无法有效解决 AVSR 分类中的低数据状态问题。因此，我们在原始 CycleGAN 的基础上引入了一个增强模型 CycleGAN*，它能有效地学习原始数据集特征并生成高质量的多模态数据。实验结果表明，由我们提出的 CycleGAN* 生成的多模态数据集在字错误率（WER）方面有显著改善，表明错误减少。值得注意的是，CycleGAN* 生成的图像在整体视觉清晰度上有明显提高，这表明它具有卓越的生成能力。此外，与传统方法相比，我们强调了协作学习的重要性。我们利用多样化的多模态数据实施协同训练，以促进信息共享和跨模态互补学习。这种协作方法增强了模型整合异构信息的能力，从而提高了模型在多模态环境中的性能。

{"title":"CycleGAN*: Collaborative AI Learning With Improved Adversarial Neural Networks for Multimodalities Data","authors":"Yibo He;Kah Phooi Seng;Li Minn Ang","doi":"10.1109/TAI.2024.3432856","DOIUrl":"https://doi.org/10.1109/TAI.2024.3432856","url":null,"abstract":"With the widespread adoption of generative adversarial networks (GANs) for sample generation, this article aims to enhance adversarial neural networks to facilitate collaborative artificial intelligence (AI) learning which has been specifically tailored to handle datasets containing multimodalities. Currently, a significant portion of the literature is dedicated to sample generation using GANs, with the objective of enhancing the detection performance of machine learning (ML) classifiers through the incorporation of these generated data into the original training set via adversarial training. The quality of the generated adversarial samples is contingent upon the sufficiency of training data samples. However, in the multimodal domain, the scarcity of multimodal data poses a challenge due to resource constraints. In this article, we address this challenge by proposing a new multimodal dataset generation approach based on the classical audio–visual speech recognition (AVSR) task, utilizing CycleGAN, DiscoGAN, and StyleGAN2 for exploration and performance comparison. AVSR experiments are conducted using the LRS2 and LRS3 corpora. Our experiments reveal that CycleGAN, DiscoGAN, and StyleGAN2 do not effectively address the low-data state problem in AVSR classification. Consequently, we introduce an enhanced model, CycleGAN*, based on the original CycleGAN, which efficiently learns the original dataset features and generates high-quality multimodal data. Experimental results demonstrate that the multimodal datasets generated by our proposed CycleGAN* exhibit significant improvement in word error rate (WER), indicating reduced errors. Notably, the images produced by CycleGAN* exhibit a marked enhancement in overall visual clarity, indicative of its superior generative capabilities. Furthermore, in contrast to traditional approaches, we underscore the significance of collaborative learning. We implement co-training with diverse multimodal data to facilitate information sharing and complementary learning across modalities. This collaborative approach enhances the model’s capability to integrate heterogeneous information, thereby boosting its performance in multimodal environments.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5616-5629"},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cooperative Advantage Actor–Critic Reinforcement Learning for Multiagent Pursuit-Evasion Games on Communication Graphs 通信图上多智能体追逃博弈的合作优势行为-批判强化学习

IEEE transactions on artificial intelligence

Pub Date : 2024-07-23 DOI: 10.1109/TAI.2024.3432511

Yizhen Meng;Chun Liu;Qiang Wang;Longyu Tan

This article investigates the distributed optimal strategy problem in multiagent pursuit-evasion (MPE) games, striving for Nash equilibrium through the optimization of individual benefit matrices based on observations. To this end, a novel collaborative control scheme for MPE games using communication graphs is proposed. This scheme employs cooperative advantage actor–critic (A2C) reinforcement learning to facilitate collaborative capture by pursuers in a distributed manner while maintaining bounded system signals. The strategy orchestrates the actions of pursuers through adaptive neural network learning, ensuring proximity-based collaboration for effective captures. Meanwhile, evaders aim to evade collectively by converging toward each other. Through extensive simulations involving five pursuers and two evaders, the efficacy of the proposed approach is demonstrated, and pursuers seamlessly organize into pursuit units and capture evaders, validating the collaborative capture objective. This article represents a promising step toward effective and cooperative control strategies in MPE game scenarios.

本文研究了多智能体追逐-逃避博弈中的分布式最优策略问题，在观察的基础上，通过对个体利益矩阵的优化，力求达到纳什均衡。为此，提出了一种基于通信图的MPE游戏协同控制方案。该方案采用合作优势行为-批评（A2C）强化学习，在保持有界系统信号的同时，促进追踪者以分布式方式进行协作捕获。该策略通过自适应神经网络学习来协调追捕者的行动，确保基于邻近度的协作以实现有效捕获。同时，逃避者以相互趋同的方式进行集体逃避。通过涉及5个追踪者和2个逃避者的大量模拟，证明了所提出方法的有效性，并且追踪者无缝地组织成追捕单位和捕获逃避者，验证了协同捕获目标。这篇文章代表了在MPE游戏场景中朝着有效和合作控制策略迈出的有希望的一步。

{"title":"Cooperative Advantage Actor–Critic Reinforcement Learning for Multiagent Pursuit-Evasion Games on Communication Graphs","authors":"Yizhen Meng;Chun Liu;Qiang Wang;Longyu Tan","doi":"10.1109/TAI.2024.3432511","DOIUrl":"https://doi.org/10.1109/TAI.2024.3432511","url":null,"abstract":"This article investigates the distributed optimal strategy problem in multiagent pursuit-evasion (MPE) games, striving for Nash equilibrium through the optimization of individual benefit matrices based on observations. To this end, a novel collaborative control scheme for MPE games using communication graphs is proposed. This scheme employs cooperative advantage actor–critic (A2C) reinforcement learning to facilitate collaborative capture by pursuers in a distributed manner while maintaining bounded system signals. The strategy orchestrates the actions of pursuers through adaptive neural network learning, ensuring proximity-based collaboration for effective captures. Meanwhile, evaders aim to evade collectively by converging toward each other. Through extensive simulations involving five pursuers and two evaders, the efficacy of the proposed approach is demonstrated, and pursuers seamlessly organize into pursuit units and capture evaders, validating the collaborative capture objective. This article represents a promising step toward effective and cooperative control strategies in MPE game scenarios.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6509-6523"},"PeriodicalIF":0.0,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Toward Correlated Sequential Rules 实现相关的序列规则

IEEE transactions on artificial intelligence

Pub Date : 2024-07-22 DOI: 10.1109/TAI.2024.3429306

Lili Chen;Wensheng Gan;Chien-Ming Chen

The goal of high-utility sequential pattern mining (HUSPM) is to efficiently discover profitable or useful sequential patterns in a large number of sequences. However, simply being aware of utility-eligible patterns is insufficient for making predictions. To compensate for this deficiency, high-utility sequential rule mining (HUSRM) is designed to explore the confidence or probability of predicting the occurrence of consequence sequential patterns based on the appearance of premise sequential patterns. It has numerous applications, such as product recommendation and weather prediction. However, the existing algorithm, known as HUSRM, is limited to extracting all eligible rules while neglecting the correlation between the generated sequential rules. To address this issue, we propose a novel algorithm called correlated high-utility sequential rule miner (CoUSR) to integrate the concept of correlation into HUSRM. The proposed algorithm requires not only that each rule be correlated but also that the patterns in the antecedent and consequent of the high-utility sequential rule be correlated. The algorithm adopts a utility-list structure to avoid multiple database scans. Additionally, several pruning strategies are used to improve the algorithm's efficiency and performance. Based on several real-world datasets, subsequent experiments demonstrated that CoUSR is effective and efficient in terms of operation time and memory consumption. All codes are accessible on GitHub: https://github.com/DSI-Lab1/CoUSR.

高效用序列模式挖掘（HUSPM）的目标是在大量序列中有效地发现有利可图或有用的序列模式。然而，仅仅意识到有用模式还不足以进行预测。为了弥补这一不足，高效用序列规则挖掘（HUSRM）旨在根据前提序列模式的出现情况，探索预测后果序列模式出现的置信度或概率。它有许多应用，如产品推荐和天气预测。然而，现有的算法（即 HUSRM）仅限于提取所有符合条件的规则，而忽略了生成的序列规则之间的相关性。为了解决这个问题，我们提出了一种名为 "相关高效用序列规则挖掘器"（CoUSR）的新算法，将相关性概念融入 HUSRM。所提出的算法不仅要求每条规则都是相关的，还要求高效用序列规则的前因和后果中的模式是相关的。该算法采用效用列表结构，以避免多次数据库扫描。此外，还采用了多种剪枝策略来提高算法的效率和性能。基于多个真实数据集的后续实验证明，CoUSR 在运行时间和内存消耗方面都是有效和高效的。所有代码均可在 GitHub 上访问：https://github.com/DSI-Lab1/CoUSR。

{"title":"Toward Correlated Sequential Rules","authors":"Lili Chen;Wensheng Gan;Chien-Ming Chen","doi":"10.1109/TAI.2024.3429306","DOIUrl":"https://doi.org/10.1109/TAI.2024.3429306","url":null,"abstract":"The goal of high-utility sequential pattern mining (HUSPM) is to efficiently discover profitable or useful sequential patterns in a large number of sequences. However, simply being aware of utility-eligible patterns is insufficient for making predictions. To compensate for this deficiency, high-utility sequential rule mining (HUSRM) is designed to explore the confidence or probability of predicting the occurrence of consequence sequential patterns based on the appearance of premise sequential patterns. It has numerous applications, such as product recommendation and weather prediction. However, the existing algorithm, known as HUSRM, is limited to extracting all eligible rules while neglecting the correlation between the generated sequential rules. To address this issue, we propose a novel algorithm called correlated high-utility sequential rule miner (CoUSR) to integrate the concept of correlation into HUSRM. The proposed algorithm requires not only that each rule be correlated but also that the patterns in the antecedent and consequent of the high-utility sequential rule be correlated. The algorithm adopts a utility-list structure to avoid multiple database scans. Additionally, several pruning strategies are used to improve the algorithm's efficiency and performance. Based on several real-world datasets, subsequent experiments demonstrated that CoUSR is effective and efficient in terms of operation time and memory consumption. All codes are accessible on GitHub: \u0000<uri>https://github.com/DSI-Lab1/CoUSR</uri>\u0000.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 10","pages":"5340-5351"},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142442960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Two-Stage Representation Refinement Based on Convex Combination for 3-D Human Poses Estimation 基于凸组合的两阶段表示改进三维人体姿态估计

IEEE transactions on artificial intelligence

Pub Date : 2024-07-22 DOI: 10.1109/TAI.2024.3432028

Luefeng Chen;Wei Cao;Biao Zheng;Min Wu;Witold Pedrycz;Kaoru Hirota

In the human pose estimation task, on the one hand, 3-D pose always has difficulty in dividing different 2-D poses if the view is limited; on the other hand, it is hard to reduce the lifting ambiguity because of the lack of depth information, it is an important and challenging problem. Therefore, two-stage representation refinement based on the convex combination for 3-D human pose estimation is proposed, in which the two-stage method includes a dense-spatial-temporal convolutional network and a local-to-refine network. The former is applied to determine the features between each video frame; the latter is used to get the different scales of pose details. It aims to address the difficulty of estimating 3-D human pose from 2-D image sequences. In such a way, it can better use the relations between every frame in the sequence of the pose video to produce more accurate results. Finally, we combine the above network with a block called convex combination to help refine the 3-D pose location. We test the proposed approach on both Human3.6m and MPII datasets. The result confirms that our method can achieve better performance than improved CNN supervision, a simple yet effective baseline, and coarse-to-fine volumetric prediction. Besides, a robustness test experiment is carried out for the proposed method while the input is interrupted. The result verifies that our method shows better robustness.

在人体姿态估计任务中，一方面，在视野有限的情况下，三维姿态难以分割不同的二维姿态；另一方面，由于缺乏深度信息，难以降低提升模糊度，这是一个重要而具有挑战性的问题。为此，提出了一种基于凸组合的两阶段人体姿态估计表示改进方法，其中两阶段方法包括一个密集时空卷积网络和一个局部-细化网络。前者用于确定每个视频帧之间的特征；后者用于获取姿态细节的不同尺度。它旨在解决从二维图像序列中估计三维人体姿态的困难。这样可以更好地利用姿态视频序列中每一帧之间的关系，产生更准确的结果。最后，我们将上述网络与一个称为凸组合的块结合起来，以帮助改进三维姿态位置。我们在Human3.6m和MPII数据集上测试了所提出的方法。结果证实，我们的方法可以获得比改进的CNN监督、简单有效的基线和粗到细的体积预测更好的性能。此外，还对该方法进行了输入中断情况下的鲁棒性检验实验。结果表明，该方法具有较好的鲁棒性。

{"title":"Two-Stage Representation Refinement Based on Convex Combination for 3-D Human Poses Estimation","authors":"Luefeng Chen;Wei Cao;Biao Zheng;Min Wu;Witold Pedrycz;Kaoru Hirota","doi":"10.1109/TAI.2024.3432028","DOIUrl":"https://doi.org/10.1109/TAI.2024.3432028","url":null,"abstract":"In the human pose estimation task, on the one hand, 3-D pose always has difficulty in dividing different 2-D poses if the view is limited; on the other hand, it is hard to reduce the lifting ambiguity because of the lack of depth information, it is an important and challenging problem. Therefore, two-stage representation refinement based on the convex combination for 3-D human pose estimation is proposed, in which the two-stage method includes a dense-spatial-temporal convolutional network and a local-to-refine network. The former is applied to determine the features between each video frame; the latter is used to get the different scales of pose details. It aims to address the difficulty of estimating 3-D human pose from 2-D image sequences. In such a way, it can better use the relations between every frame in the sequence of the pose video to produce more accurate results. Finally, we combine the above network with a block called convex combination to help refine the 3-D pose location. We test the proposed approach on both Human3.6m and MPII datasets. The result confirms that our method can achieve better performance than improved CNN supervision, a simple yet effective baseline, and coarse-to-fine volumetric prediction. Besides, a robustness test experiment is carried out for the proposed method while the input is interrupted. The result verifies that our method shows better robustness.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6500-6508"},"PeriodicalIF":0.0,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Learning Security Breach by Evolutionary Universal Perturbation Attack (EUPA) 进化通用扰动攻击（EUPA）造成的深度学习安全漏洞

IEEE transactions on artificial intelligence

Pub Date : 2024-07-19 DOI: 10.1109/TAI.2024.3429473

Neeraj Gupta;Mahdi Khosravy;Antoine Pasquali;Olaf Witkowski

The potential for sabotaging deep convolutions neural networks classifiers by universal perturbation attack (UPA) has proved itself as an effective threat to fool deep learning models in sensitive applications such as autonomous vehicles, clinical diagnosis, face recognition, and so on. The prospective application of UPA is for adversarial training of deep convolutional networks against the attacks. Although evolutionary algorithms have already shown their tremendous ability in solving nonconvex complex problems, the literature has limited exploration of evolutionary techniques and strategies for UPA, thus, it needs to be explored on evolutionary algorithms to minimize the magnitude and number of perturbation pixels while maximizing the misclassification of maximum data samples. In this research. This work focuses on utilizing an integer coded genetic algorithm within an evolutionary framework to evolve the UPA. The evolutionary UPA has been structured, analyzed, and compared for two evolutionary optimization structures: 1) constrained single-objective evolutionary UPA; and 2) Pareto double-objective evolutionary UPA. The efficiency of the methodology is analyzed on GoogleNet convolution neural network for its effectiveness on the Imagenet dataset. The results show that under the same experimental conditions, the constrained single objective technique outperforms the Pareto double objective one, and manages a successful breach on a deep network wherein the average detection score falls to

$0.446429$

. It is observed that besides the minimization of the detection rate score, the constraint of invisibility of noise is much more effective rather than having a conflicting objective of noise power minimization.

在自动驾驶汽车、临床诊断、人脸识别等敏感应用中，普遍扰动攻击（UPA）破坏深度卷积神经网络分类器的可能性已被证明是愚弄深度学习模型的有效威胁。UPA 的前瞻性应用是针对攻击对深度卷积网络进行对抗性训练。虽然进化算法在解决非凸复杂问题方面已经展现出了巨大的能力，但文献中对 UPA 的进化技术和策略的探索还很有限，因此需要探索进化算法，在最大化数据样本误分类的同时，最小化扰动像素的大小和数量。在这项研究中。这项工作的重点是在进化框架内利用整数编码遗传算法来进化 UPA。针对两种进化优化结构，对进化 UPA 进行了构建、分析和比较：1) 受限单目标进化 UPA；和 2) 帕累托双目标进化 UPA。在 GoogleNet 卷积神经网络上分析了该方法在 Imagenet 数据集上的效率。结果表明，在相同的实验条件下，受限单目标技术优于帕累托双目标技术，并成功攻破了深度网络，其平均检测得分降至 0.446429 美元。据观察，除了检测率得分最小化外，噪声不可见的约束比噪声功率最小化这一相互冲突的目标更有效。

{"title":"Deep Learning Security Breach by Evolutionary Universal Perturbation Attack (EUPA)","authors":"Neeraj Gupta;Mahdi Khosravy;Antoine Pasquali;Olaf Witkowski","doi":"10.1109/TAI.2024.3429473","DOIUrl":"https://doi.org/10.1109/TAI.2024.3429473","url":null,"abstract":"The potential for sabotaging deep convolutions neural networks classifiers by universal perturbation attack (UPA) has proved itself as an effective threat to fool deep learning models in sensitive applications such as autonomous vehicles, clinical diagnosis, face recognition, and so on. The prospective application of UPA is for adversarial training of deep convolutional networks against the attacks. Although evolutionary algorithms have already shown their tremendous ability in solving nonconvex complex problems, the literature has limited exploration of evolutionary techniques and strategies for UPA, thus, it needs to be explored on evolutionary algorithms to minimize the magnitude and number of perturbation pixels while maximizing the misclassification of maximum data samples. In this research. This work focuses on utilizing an integer coded genetic algorithm within an evolutionary framework to evolve the UPA. The evolutionary UPA has been structured, analyzed, and compared for two evolutionary optimization structures: 1) constrained single-objective evolutionary UPA; and 2) Pareto double-objective evolutionary UPA. The efficiency of the methodology is analyzed on GoogleNet convolution neural network for its effectiveness on the Imagenet dataset. The results show that under the same experimental conditions, the constrained single objective technique outperforms the Pareto double objective one, and manages a successful breach on a deep network wherein the average detection score falls to \u0000<inline-formula><tex-math>$0.446429$</tex-math></inline-formula>\u0000. It is observed that besides the minimization of the detection rate score, the constraint of invisibility of noise is much more effective rather than having a conflicting objective of noise power minimization.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5655-5665"},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hierarchical Spatial-Temporal Masked Contrast for Skeleton Action Recognition 用于骨骼动作识别的分层时空掩蔽对比技术

IEEE transactions on artificial intelligence

Pub Date : 2024-07-17 DOI: 10.1109/TAI.2024.3430260

Wenming Cao;Aoyu Zhang;Zhihai He;Yicha Zhang;Xinpeng Yin

In the field of 3-D action recognition, self-supervised learning has shown promising results but remains a challenging task. Previous approaches to motion modeling often relied on selecting features solely from the temporal or spatial domain, which limited the extraction of higher-level semantic information. Additionally, traditional one-to-one approaches in multilevel comparative learning overlooked the relationships between different levels, hindering the learning representation of the model. To address these issues, we propose the hierarchical spatial-temporal masked network (HSTM) for learning 3-D action representations. HSTM introduces a novel masking method that operates simultaneously in both the temporal and spatial dimensions. This approach leverages semantic relevance to identify meaningful regions in time and space, guiding the masking process based on semantic richness. This guidance is crucial for learning useful feature representations effectively. Furthermore, to enhance the learning of potential features, we introduce cross-level distillation (CLD) to extend the comparative learning approach. By training the model with two types of losses simultaneously, each level of the multilevel comparative learning process can be guided by levels rich in semantic information. This allows for more effective supervision of comparative learning, leading to improved performance. Extensive experiments conducted on the NTU-60, NTU-120, and PKU-MMD datasets demonstrate the effectiveness of our proposed framework. The learned action representations exhibit strong transferability and achieve state-of-the-art results.

在三维动作识别领域，自监督学习已经取得了可喜的成果，但仍然是一项具有挑战性的任务。以往的运动建模方法通常只依赖于从时间或空间域中选择特征，这限制了对更高层次语义信息的提取。此外，多层次比较学习中传统的一对一方法忽略了不同层次之间的关系，阻碍了模型的学习表示。为了解决这些问题，我们提出了用于学习三维动作表征的分层时空遮蔽网络（HSTM）。HSTM 引入了一种在时间和空间维度上同时运行的新型遮蔽方法。这种方法利用语义相关性来识别时间和空间中的有意义区域，并根据语义丰富程度来指导屏蔽过程。这种指导对于有效学习有用的特征表征至关重要。此外，为了加强对潜在特征的学习，我们引入了跨层次蒸馏（CLD）来扩展比较学习方法。通过同时用两类损失对模型进行训练，多层次比较学习过程中的每个层次都能得到语义信息丰富的层次的指导。这样就能更有效地监督比较学习，从而提高性能。在 NTU-60、NTU-120 和 PKU-MMD 数据集上进行的广泛实验证明了我们提出的框架的有效性。学习到的动作表征具有很强的可移植性，并取得了最先进的结果。

{"title":"Hierarchical Spatial-Temporal Masked Contrast for Skeleton Action Recognition","authors":"Wenming Cao;Aoyu Zhang;Zhihai He;Yicha Zhang;Xinpeng Yin","doi":"10.1109/TAI.2024.3430260","DOIUrl":"https://doi.org/10.1109/TAI.2024.3430260","url":null,"abstract":"In the field of 3-D action recognition, self-supervised learning has shown promising results but remains a challenging task. Previous approaches to motion modeling often relied on selecting features solely from the temporal or spatial domain, which limited the extraction of higher-level semantic information. Additionally, traditional one-to-one approaches in multilevel comparative learning overlooked the relationships between different levels, hindering the learning representation of the model. To address these issues, we propose the hierarchical spatial-temporal masked network (HSTM) for learning 3-D action representations. HSTM introduces a novel masking method that operates simultaneously in both the temporal and spatial dimensions. This approach leverages semantic relevance to identify meaningful regions in time and space, guiding the masking process based on semantic richness. This guidance is crucial for learning useful feature representations effectively. Furthermore, to enhance the learning of potential features, we introduce cross-level distillation (CLD) to extend the comparative learning approach. By training the model with two types of losses simultaneously, each level of the multilevel comparative learning process can be guided by levels rich in semantic information. This allows for more effective supervision of comparative learning, leading to improved performance. Extensive experiments conducted on the NTU-60, NTU-120, and PKU-MMD datasets demonstrate the effectiveness of our proposed framework. The learned action representations exhibit strong transferability and achieve state-of-the-art results.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 11","pages":"5801-5814"},"PeriodicalIF":0.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Label-Efficient Time Series Representation Learning: A Review 标签有效时间序列表示学习：综述

IEEE transactions on artificial intelligence

Pub Date : 2024-07-17 DOI: 10.1109/TAI.2024.3430236

Emadeldeen Eldele;Mohamed Ragab;Zhenghua Chen;Min Wu;Chee-Keong Kwoh;Xiaoli Li

Label-efficient time series representation learning, which aims to learn effective representations with limited labeled data, is crucial for deploying deep learning models in real-world applications. To address the scarcity of labeled time series data, various strategies, e.g., transfer learning, self-supervised learning, and semisupervised learning, have been developed. In this survey, we introduce a novel taxonomy for the first time, categorizing existing approaches as in-domain or cross domain based on their reliance on external data sources or not. Furthermore, we present a review of the recent advances in each strategy, conclude the limitations of current methodologies, and suggest future research directions that promise further improvements in the field.

标签高效时间序列表示学习旨在用有限的标记数据学习有效的表示，这对于在实际应用中部署深度学习模型至关重要。为了解决标记时间序列数据的稀缺性，已经开发了各种策略，例如迁移学习，自监督学习和半监督学习。在这项调查中，我们首次引入了一种新的分类法，根据现有方法对外部数据源的依赖程度将其分为域内方法和跨域方法。此外，我们对每种策略的最新进展进行了回顾，总结了当前方法的局限性，并提出了未来研究方向，有望在该领域进一步改进。

引用次数: 0

A Study of Enhancing Federated Learning on Non-IID Data With Server Learning 通过服务器学习加强非 IID 数据上的联合学习的研究。

IEEE transactions on artificial intelligence

Pub Date : 2024-07-17 DOI: 10.1109/TAI.2024.3430250

Van Sy Mai;Richard J. La;Tao Zhang

Federated learning (FL) has emerged as a means of distributed learning using local data stored at clients with a coordinating server. Recent studies showed that FL can suffer from poor performance and slower convergence when training data at the clients are not independent and identically distributed (IID). Here, we consider auxiliary server learning (SL) as a complementary approach to improving the performance of FL on non-IID data. Our analysis and experiments show that this approach can achieve significant improvements in both model accuracy and convergence time even when the dataset utilized by the server is small and its distribution differs from that of the clients’ aggregate data. Moreover, experimental results suggest that auxiliary SL delivers benefits when employed together with other techniques proposed to mitigate the performance degradation of FL on non-IID data.

联合学习（FL）是一种利用存储在客户端的本地数据与协调服务器进行分布式学习的方法。最近的研究表明，当客户端的训练数据不是独立且同分布的（IID）时，FL 的性能会变差，收敛速度也会变慢。在此，我们考虑将辅助服务器学习作为一种补充方法，以提高 FL 在非独立同分布数据上的性能。我们的分析和实验表明，即使服务器使用的数据集很小，而且其分布与客户端的总数据分布不同，这种方法也能显著提高模型的准确性和收敛时间。此外，实验结果表明，当辅助服务器学习与其他技术一起使用时，能有效缓解 FL 在非 IID 数据上的性能下降问题。

引用次数: 0