Pub Date : 2025-06-03DOI: 10.1109/TAI.2025.3575745
Gašper Beguš;Maksymilian Dąbkowski;Ryan Rhodes
The performance of large language models (LLMs) has recently improved to the point where models can perform well on many language tasks. We show here that—for the first time—the models can also generate valid metalinguistic analyses of language data. We outline a research program where the behavioral interpretability of LLMs on these tasks is tested via prompting. LLMs are trained primarily on text—as such, evaluating their metalinguistic abilities improves our understanding of their general capabilities and sheds new light on theoretical models in linguistics. We show that OpenAI’s [56] o1 vastly outperforms other models on tasks involving drawing syntactic trees and phonological generalization. We speculate that OpenAI o1’s unique advantage over other models may result from the model’s chain-of-thought mechanism, which mimics the structure of human reasoning used in complex cognitive tasks, such as linguistic analysis.
{"title":"Large Linguistic Models: Investigating LLMs’ Metalinguistic Abilities","authors":"Gašper Beguš;Maksymilian Dąbkowski;Ryan Rhodes","doi":"10.1109/TAI.2025.3575745","DOIUrl":"https://doi.org/10.1109/TAI.2025.3575745","url":null,"abstract":"The performance of large language models (LLMs) has recently improved to the point where models can perform well on many language tasks. We show here that—for the first time—the models can also generate valid metalinguistic analyses of language data. We outline a research program where the <italic>behavioral interpretability</i> of LLMs on these tasks is tested via prompting. LLMs are trained primarily on text—as such, evaluating their metalinguistic abilities improves our understanding of their general capabilities and sheds new light on theoretical models in linguistics. We show that OpenAI’s <xref>[56]</xref> o1 vastly outperforms other models on tasks involving drawing syntactic trees and phonological generalization. We speculate that OpenAI o1’s unique advantage over other models may result from the model’s <italic>chain-of-thought</i> mechanism, which mimics the structure of human reasoning used in complex cognitive tasks, such as linguistic analysis.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3453-3467"},"PeriodicalIF":0.0,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11022724","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In knowledge-based visual question answering (KB-VQA), the answer can be naturally represented by translating visual object embedding referred by the question according to the cross-modality relation embedding related to both the question and the image. Though the triplet representation of cross-modality knowledge is plausible and proven effective, these methods often encounter two challenges: 1) The semantic gap between the image and the question makes it difficult to accurately embed the cross-modality relation; and 2) the visual objects in the question often have ambiguous references in the input image. To solve the above challenges, we propose the image-caption-question translating embeddings (ICQ-TransE), which more effectively models both the cross-modality relation and the head entity of visual objects. Specifically, for cross-modality relation embedding, the designed image-caption-question information transmission mechanism transmits the information flow from image to question through the caption bridge, where the caption simultaneously has the visual content and the textual form. With this powerful bridge, cross-modality information can be more effectively fused, resulting in more precisely encoded relation embeddings. For the visual object embedding, instead of using a fixed number of visual regions as the previous methods, the most relevant visual regions to the question are dynamically selected. Experimental results on OK-VQA and KRVQA challenging datasets verify the effectiveness of ICQ-TransE compared with multiple state-of-the-art methods for visual question answering with knowledge. Our code will be available at https://github.com/cmcv2022/ICQ-TransE.
{"title":"ICQ-TransE: LLM-Enhanced Image-Caption-Question Translating Embeddings for Knowledge-Based Visual Question Answering","authors":"Heng Liu;Boyue Wang;Xiaoyan Li;Yanfeng Sun;Yongli Hu;Baocai Yin","doi":"10.1109/TAI.2025.3575553","DOIUrl":"https://doi.org/10.1109/TAI.2025.3575553","url":null,"abstract":"In knowledge-based visual question answering (KB-VQA), the answer can be naturally represented by translating visual object embedding referred by the question according to the cross-modality relation embedding related to both the question and the image. Though the triplet representation of cross-modality knowledge is plausible and proven effective, these methods often encounter two challenges: 1) The semantic gap between the image and the question makes it difficult to accurately embed the cross-modality relation; and 2) the visual objects in the question often have ambiguous references in the input image. To solve the above challenges, we propose the image-caption-question translating embeddings (ICQ-TransE), which more effectively models both the cross-modality relation and the head entity of visual objects. Specifically, for cross-modality relation embedding, the designed image-caption-question information transmission mechanism transmits the information flow from image to question through the caption bridge, where the caption simultaneously has the visual content and the textual form. With this powerful bridge, cross-modality information can be more effectively fused, resulting in more precisely encoded relation embeddings. For the visual object embedding, instead of using a fixed number of visual regions as the previous methods, the most relevant visual regions to the question are dynamically selected. Experimental results on OK-VQA and KRVQA challenging datasets verify the effectiveness of ICQ-TransE compared with multiple state-of-the-art methods for visual question answering with knowledge. Our code will be available at <uri>https://github.com/cmcv2022/ICQ-TransE</uri>.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"412-425"},"PeriodicalIF":0.0,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-08DOI: 10.1109/tai.2025.3567961
Jialu Pi, Juan Maria Farina, Chieh-Ju Chao, Chadi Ayoub, Reza Arsanjani, Imon Banerjee
Mitigating population drift is vital for developing robust AI models for clinical use. While current methodologies focus on reducing demographic bias in disease predictions, they overlook the significant impact of chronic comorbidities. Addressing these complexities is essential to enhance predictive accuracy and reliability across diverse patient demographics, ultimately improving healthcare outcomes. We propose a causal reasoning framework to address selection bias in opportunistic screening for 1-year composite MACE risk using chest X-ray images. Training in high-risk primarily Caucasian patients (43% MACE event), the model was evaluated in a lower-risk emergency department setting (12.8% MACE event) and a relatively lower-risk external Asian patient population (23.81% MACE event) to assess selection bias effects. We benchmarked our approach against a high-performance disease classification model, a propensity score matching strategy, and a debiasing model for unknown biases. The causal+confounder framework achieved an AUC of 0.75 and 0.7 on Shift data and Shift external, outperforming baselines, and a comparable AUC of 0.7 on internal data despite penalties for confounders. It minimized disparities in confounding factors and surpassed traditional and state-of-the-art debiasing methods. Experimental data show that integrating causal reasoning and confounder adjustments in AI models enhances their effectiveness. This approach shows promise for creating fair and robust clinical decision support systems that account for population shifts, ultimately improving the reliability and ethical integrity of AI-driven clinical decision-making.
{"title":"Mitigating Bias in Opportunistic Screening for MACE with Causal Reasoning.","authors":"Jialu Pi, Juan Maria Farina, Chieh-Ju Chao, Chadi Ayoub, Reza Arsanjani, Imon Banerjee","doi":"10.1109/tai.2025.3567961","DOIUrl":"10.1109/tai.2025.3567961","url":null,"abstract":"<p><p>Mitigating population drift is vital for developing robust AI models for clinical use. While current methodologies focus on reducing demographic bias in disease predictions, they overlook the significant impact of chronic comorbidities. Addressing these complexities is essential to enhance predictive accuracy and reliability across diverse patient demographics, ultimately improving healthcare outcomes. We propose a causal reasoning framework to address selection bias in opportunistic screening for 1-year composite MACE risk using chest X-ray images. Training in high-risk primarily Caucasian patients (43% MACE event), the model was evaluated in a lower-risk emergency department setting (12.8% MACE event) and a relatively lower-risk external Asian patient population (23.81% MACE event) to assess selection bias effects. We benchmarked our approach against a high-performance disease classification model, a propensity score matching strategy, and a debiasing model for unknown biases. The causal+confounder framework achieved an AUC of 0.75 and 0.7 on Shift data and Shift external, outperforming baselines, and a comparable AUC of 0.7 on internal data despite penalties for confounders. It minimized disparities in confounding factors and surpassed traditional and state-of-the-art debiasing methods. Experimental data show that integrating causal reasoning and confounder adjustments in AI models enhances their effectiveness. This approach shows promise for creating fair and robust clinical decision support systems that account for population shifts, ultimately improving the reliability and ethical integrity of AI-driven clinical decision-making.</p>","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12768338/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145914250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-30DOI: 10.1109/TAI.2025.3565671
Rangan Das;Swadesh Jana;Anannyo Dey;Pascal Le Corre;Marc Cuggia;Ujjwal Maulik;Sanghamitra Bandyopadhyay
The development of new drugs is an expensive and time-consuming process, often hindered by the lack of reliable models to predict drug-target interactions (DTIs) and their mechanisms of action (MoA). Existing deep learning-based methods for DTI prediction typically focus only on binary classification of interactions, overlooking the complex mechanisms underlying these interactions. Moreover, the absence of comprehensive datasets for modeling MoA further complicates this task. To address these limitations, we introduce DrugMAP, a novel multimodal deep learning model that integrates graph neural networks and transformer-based architectures to predict both DTIs and their MoA. We construct a large-scale dataset from multiple public sources, adding a new level of complexity by including detailed MoA annotations for thousands of drug-target pairs. DrugMAP simultaneously leverages the molecular and atomic-level structures of drugs and target proteins, utilizing multirepresentational encoders for enhanced feature extraction. Experimental results show that DrugMAP outperforms state-of-the-art models for both DTI and MoA prediction across multiple benchmark datasets. Our model achieves a 3.5% improvement in AUC for MoA prediction, demonstrating its potential for guiding drug discovery and understanding adverse drug events.
{"title":"DrugMAP: Deep Multimodal Transformers for Drug-Target Mechanism of Action Prediction","authors":"Rangan Das;Swadesh Jana;Anannyo Dey;Pascal Le Corre;Marc Cuggia;Ujjwal Maulik;Sanghamitra Bandyopadhyay","doi":"10.1109/TAI.2025.3565671","DOIUrl":"https://doi.org/10.1109/TAI.2025.3565671","url":null,"abstract":"The development of new drugs is an expensive and time-consuming process, often hindered by the lack of reliable models to predict drug-target interactions (DTIs) and their mechanisms of action (MoA). Existing deep learning-based methods for DTI prediction typically focus only on binary classification of interactions, overlooking the complex mechanisms underlying these interactions. Moreover, the absence of comprehensive datasets for modeling MoA further complicates this task. To address these limitations, we introduce DrugMAP, a novel multimodal deep learning model that integrates graph neural networks and transformer-based architectures to predict both DTIs and their MoA. We construct a large-scale dataset from multiple public sources, adding a new level of complexity by including detailed MoA annotations for thousands of drug-target pairs. DrugMAP simultaneously leverages the molecular and atomic-level structures of drugs and target proteins, utilizing multirepresentational encoders for enhanced feature extraction. Experimental results show that DrugMAP outperforms state-of-the-art models for both DTI and MoA prediction across multiple benchmark datasets. Our model achieves a 3.5% improvement in AUC for MoA prediction, demonstrating its potential for guiding drug discovery and understanding adverse drug events.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"3087-3099"},"PeriodicalIF":0.0,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-29DOI: 10.1109/TAI.2025.3565483
Menglin Yang;Dong Xie;Guiting Zhang;Fulong Chen;Taochun Wang;Peng Hu
Compared with the cryptographic image encryption schemes, neural networks (NN) based image encryption schemes exhibit a significantly larger key space and offer enhanced capabilities for parallel processing of image data. However, most existing NN-based image encryption schemes suffer from high time complexity in generating random keys, and their decryption processes often fail to fully recover the plaintext images without loss. In this article, we first propose a normalizing flows based encryption network, called EncryptFlow, designed to achieve efficient and lossless image encryption. Normalizing flows employ a special coupling structure to couple the partitioned data, thereby establishing interdependence among them. Specifically, we utilize coupling structures (e.g., additive coupling) that allows the image blocks to alternately encrypt each other during forward propagation. Additionally, we devise a key generation algorithm that produces sub-keys tailored for each layer of the encryption network. The proposed EncryptFlow network seamlessly integrates both encryption and decryption functionalities, leveraging the XOR operation as the encryption function within each layer. The experimental results and comparative analyses indicate that EncryptFlow can encrypt $256times 256$ grayscale images with an average time of merely $0.047s$, and similarly, it requires only $0.188s$ to encrypt color images of the same dimensions.
{"title":"EncryptFlow: Efficient and Lossless Image Encryption Network Based on Normalizing Flows","authors":"Menglin Yang;Dong Xie;Guiting Zhang;Fulong Chen;Taochun Wang;Peng Hu","doi":"10.1109/TAI.2025.3565483","DOIUrl":"https://doi.org/10.1109/TAI.2025.3565483","url":null,"abstract":"Compared with the cryptographic image encryption schemes, neural networks (NN) based image encryption schemes exhibit a significantly larger key space and offer enhanced capabilities for parallel processing of image data. However, most existing NN-based image encryption schemes suffer from high time complexity in generating random keys, and their decryption processes often fail to fully recover the plaintext images without loss. In this article, we first propose a normalizing flows based encryption network, called <italic>EncryptFlow</i>, designed to achieve efficient and lossless image encryption. Normalizing flows employ a special coupling structure to couple the partitioned data, thereby establishing interdependence among them. Specifically, we utilize coupling structures (e.g., additive coupling) that allows the image blocks to alternately encrypt each other during forward propagation. Additionally, we devise a key generation algorithm that produces sub-keys tailored for each layer of the encryption network. The proposed EncryptFlow network seamlessly integrates both encryption and decryption functionalities, leveraging the XOR operation as the encryption function within each layer. The experimental results and comparative analyses indicate that EncryptFlow can encrypt <inline-formula> <tex-math>$256times 256$</tex-math></inline-formula> grayscale images with an average time of merely <inline-formula> <tex-math>$0.047s$</tex-math></inline-formula>, and similarly, it requires only <inline-formula> <tex-math>$0.188s$</tex-math></inline-formula> to encrypt color images of the same dimensions.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3377-3390"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-29DOI: 10.1109/TAI.2025.3565225
Mohd Aquib;Nishchal K. Verma;M. Jaleel Akhtar
Facial expression recognition (FER) is a complex task, hindered by subtle distinctions between expression classes, significant variability within each class, and external influences such as identity, pose, age, and ethnicity. As a result, achieving pure expression encodings that are resilient to exogenous factors proves elusive, thereby compromising the downstream classification tasks. This study presents a novel intelligent FER scheme that mitigates the impact of external confounders by integrating disentangled representation learning with fuzzy logic. Building on Adaptive $beta$-variational autoencoder (VAE) [1] as a backbone, we develop a semisupervised guided adaptive $beta$ variational autoencoder (GA-$beta$-VAE) capable of isolating expression features from exogenous factors. Specifically, the adaptive $beta$-VAE is augmented with two additional branches: a deformable PCA-based secondary decoder that disentangles expression-irrelevant transformations from the core expression content, and an adversarial excitation–inhibition branch that forces the “target” (expression) latent variables to be informative only of expressions. This yields well separated, expression-centric embeddings that are subsequently processed by an interval type-2 (IT2) fuzzy classification unit to predict the corresponding expression classes. By avoiding reliance on paired data or explicit annotations, this approach offers a scalable and flexible solution for FER. Experimental evaluations on benchmark datasets [extended Cohn–Kanade (CK+), facial expression recognition plus (FER+), and real-world affective faces database (RAF-DB)] demonstrate the framework’s effectiveness in addressing the challenges posed by exogenous factors, achieving superior accuracy and interpretability compared to state-of-the-art methods.
{"title":"Enhancing Facial Expression Recognition With AI Agents: A Semisupervised Guided Adaptive $beta$-VAE Coupled With Interval Type-2 Fuzzy Classifier","authors":"Mohd Aquib;Nishchal K. Verma;M. Jaleel Akhtar","doi":"10.1109/TAI.2025.3565225","DOIUrl":"https://doi.org/10.1109/TAI.2025.3565225","url":null,"abstract":"Facial expression recognition (FER) is a complex task, hindered by subtle distinctions between expression classes, significant variability within each class, and external influences such as identity, pose, age, and ethnicity. As a result, achieving pure expression encodings that are resilient to exogenous factors proves elusive, thereby compromising the downstream classification tasks. This study presents a novel intelligent FER scheme that mitigates the impact of external confounders by integrating disentangled representation learning with fuzzy logic. Building on Adaptive <inline-formula><tex-math>$beta$</tex-math></inline-formula>-variational autoencoder (VAE) <xref>[1]</xref> as a backbone, we develop a semisupervised guided adaptive <inline-formula><tex-math>$beta$</tex-math></inline-formula> variational autoencoder (GA-<inline-formula><tex-math>$beta$</tex-math></inline-formula>-VAE) capable of isolating expression features from exogenous factors. Specifically, the adaptive <inline-formula><tex-math>$beta$</tex-math></inline-formula>-VAE is augmented with two additional branches: a deformable PCA-based secondary decoder that disentangles expression-irrelevant transformations from the core expression content, and an adversarial excitation–inhibition branch that forces the “target” (expression) latent variables to be informative only of expressions. This yields well separated, expression-centric embeddings that are subsequently processed by an interval type-2 (IT2) fuzzy classification unit to predict the corresponding expression classes. By avoiding reliance on paired data or explicit annotations, this approach offers a scalable and flexible solution for FER. Experimental evaluations on benchmark datasets [extended Cohn–Kanade (CK+), facial expression recognition plus (FER+), and real-world affective faces database (RAF-DB)] demonstrate the framework’s effectiveness in addressing the challenges posed by exogenous factors, achieving superior accuracy and interpretability compared to state-of-the-art methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"3070-3086"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145428953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we introduce a novel and synergistic approach that combines attention mechanisms, low-visibility enhancement network (LVENet) for image visibility enhancement, and a tailored head pruning method for multihead self attention (MHSA) models, specifically engineered for attention augmented convolutional network (AACN) and bottleneck transformers (BoTNets). The integration of these techniques aims to comprehensively address the challenges associated with object detection in the maritime domain. The attention mechanism selectively emphasizes critical areas of the image, LVENet enhances visibility under challenging conditions, and the head pruning method optimizes model efficiency and simplicity. Employing meticulous selection and evaluation, our approach achieves precise head pruning without compromising detection performance. Validation using common and maritime datasets underscores the effectiveness of our approach. The results showcase a substantial reduction in epoch time by over 30%, while enhancing accuracy, improving computational efficiency, and streamlining model complexity. This innovation facilitates deployment in challenging maritime scenarios.
{"title":"Adaptive Head Pruning for Attention Mechanism in the Maritime Domain","authors":"Walid Messaoud;Rim Trabelsi;Adnane Cabani;Fatma Abdelkefi","doi":"10.1109/TAI.2025.3558724","DOIUrl":"https://doi.org/10.1109/TAI.2025.3558724","url":null,"abstract":"In this article, we introduce a novel and synergistic approach that combines attention mechanisms, low-visibility enhancement network (LVENet) for image visibility enhancement, and a tailored head pruning method for multihead self attention (MHSA) models, specifically engineered for attention augmented convolutional network (AACN) and bottleneck transformers (BoTNets). The integration of these techniques aims to comprehensively address the challenges associated with object detection in the maritime domain. The attention mechanism selectively emphasizes critical areas of the image, LVENet enhances visibility under challenging conditions, and the head pruning method optimizes model efficiency and simplicity. Employing meticulous selection and evaluation, our approach achieves precise head pruning without compromising detection performance. Validation using common and maritime datasets underscores the effectiveness of our approach. The results showcase a substantial reduction in epoch time by over 30%, while enhancing accuracy, improving computational efficiency, and streamlining model complexity. This innovation facilitates deployment in challenging maritime scenarios.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"2966-2976"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145428950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-28DOI: 10.1109/TAI.2025.3564900
Jawhar Ghommam;Maarouf Saad;Mohammad H. Rahman;Quanmin Zhu
In this article, we develop a virtual vehicle scheme to solve the coordination control problem under denial-of-service (DoS) attacks for heterogeneous vehicles. This system includes an unmanned surface vessel (USV) in distress, sharing kinematic data, and a helicopter receiving data from the latter through wireless communication. Specifically, we carefully develop an estimator to model the unmeasurable states of the USV in the presence of DoS attacks. The virtual vehicle concept is then utilized to generate a velocity reference output for the helicopter to follow. To achieve preset tracking performances, the cascade structure of the helicopter is exploited, where the backstepping control strategy is used via a barrier Lyapunov function. To handle input constraints, auxiliary systems are built to bridge the association between input saturation errors and performance constraints. Furthermore, to mitigate the saturation effect of bounded inputs and model uncertainties in the attitude dynamics, a fixed-time reinforcement learning (FT-RL) control algorithm is designed according to actor–critic strategy. Stability analysis is thoroughly studied with the help of Lyapunov stability where sufficient conditions for the whole closed-loop system have been obtained. Numerical simulations have been shown to validate the proposed coordination strategy.
{"title":"Prescribed Performance Resilient Motion Coordination With Actor–Critic Reinforcement Learning Design for UAV-USV Systems","authors":"Jawhar Ghommam;Maarouf Saad;Mohammad H. Rahman;Quanmin Zhu","doi":"10.1109/TAI.2025.3564900","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564900","url":null,"abstract":"In this article, we develop a virtual vehicle scheme to solve the coordination control problem under denial-of-service (DoS) attacks for heterogeneous vehicles. This system includes an unmanned surface vessel (USV) in distress, sharing kinematic data, and a helicopter receiving data from the latter through wireless communication. Specifically, we carefully develop an estimator to model the unmeasurable states of the USV in the presence of DoS attacks. The virtual vehicle concept is then utilized to generate a velocity reference output for the helicopter to follow. To achieve preset tracking performances, the cascade structure of the helicopter is exploited, where the backstepping control strategy is used via a barrier Lyapunov function. To handle input constraints, auxiliary systems are built to bridge the association between input saturation errors and performance constraints. Furthermore, to mitigate the saturation effect of bounded inputs and model uncertainties in the attitude dynamics, a fixed-time reinforcement learning (FT-RL) control algorithm is designed according to actor–critic strategy. Stability analysis is thoroughly studied with the help of Lyapunov stability where sufficient conditions for the whole closed-loop system have been obtained. Numerical simulations have been shown to validate the proposed coordination strategy.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3336-3350"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}