首页 > 最新文献

IEEE transactions on artificial intelligence最新文献

英文 中文
Large Linguistic Models: Investigating LLMs’ Metalinguistic Abilities 大型语言模型:调查法学硕士的元语言能力
Pub Date : 2025-06-03 DOI: 10.1109/TAI.2025.3575745
Gašper Beguš;Maksymilian Dąbkowski;Ryan Rhodes
The performance of large language models (LLMs) has recently improved to the point where models can perform well on many language tasks. We show here that—for the first time—the models can also generate valid metalinguistic analyses of language data. We outline a research program where the behavioral interpretability of LLMs on these tasks is tested via prompting. LLMs are trained primarily on text—as such, evaluating their metalinguistic abilities improves our understanding of their general capabilities and sheds new light on theoretical models in linguistics. We show that OpenAI’s [56] o1 vastly outperforms other models on tasks involving drawing syntactic trees and phonological generalization. We speculate that OpenAI o1’s unique advantage over other models may result from the model’s chain-of-thought mechanism, which mimics the structure of human reasoning used in complex cognitive tasks, such as linguistic analysis.
大型语言模型(llm)的性能最近得到了改进,模型可以很好地执行许多语言任务。我们在这里首次表明,这些模型也可以对语言数据生成有效的元语言分析。我们概述了一个研究项目,通过提示来测试法学硕士在这些任务上的行为可解释性。法学硕士主要在文本上进行培训,因此,评估他们的元语言能力可以提高我们对他们一般能力的理解,并为语言学的理论模型提供新的视角。我们表明,OpenAI的b[56]1在绘制句法树和语音泛化的任务上大大优于其他模型。我们推测,OpenAI o1相对于其他模型的独特优势可能源于该模型的思维链机制,该机制模仿了用于复杂认知任务(如语言分析)的人类推理结构。
{"title":"Large Linguistic Models: Investigating LLMs’ Metalinguistic Abilities","authors":"Gašper Beguš;Maksymilian Dąbkowski;Ryan Rhodes","doi":"10.1109/TAI.2025.3575745","DOIUrl":"https://doi.org/10.1109/TAI.2025.3575745","url":null,"abstract":"The performance of large language models (LLMs) has recently improved to the point where models can perform well on many language tasks. We show here that—for the first time—the models can also generate valid metalinguistic analyses of language data. We outline a research program where the <italic>behavioral interpretability</i> of LLMs on these tasks is tested via prompting. LLMs are trained primarily on text—as such, evaluating their metalinguistic abilities improves our understanding of their general capabilities and sheds new light on theoretical models in linguistics. We show that OpenAI’s <xref>[56]</xref> o1 vastly outperforms other models on tasks involving drawing syntactic trees and phonological generalization. We speculate that OpenAI o1’s unique advantage over other models may result from the model’s <italic>chain-of-thought</i> mechanism, which mimics the structure of human reasoning used in complex cognitive tasks, such as linguistic analysis.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3453-3467"},"PeriodicalIF":0.0,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11022724","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145674827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ICQ-TransE: LLM-Enhanced Image-Caption-Question Translating Embeddings for Knowledge-Based Visual Question Answering 基于知识的视觉问答的llm增强图像标题问题翻译嵌入
Pub Date : 2025-06-03 DOI: 10.1109/TAI.2025.3575553
Heng Liu;Boyue Wang;Xiaoyan Li;Yanfeng Sun;Yongli Hu;Baocai Yin
In knowledge-based visual question answering (KB-VQA), the answer can be naturally represented by translating visual object embedding referred by the question according to the cross-modality relation embedding related to both the question and the image. Though the triplet representation of cross-modality knowledge is plausible and proven effective, these methods often encounter two challenges: 1) The semantic gap between the image and the question makes it difficult to accurately embed the cross-modality relation; and 2) the visual objects in the question often have ambiguous references in the input image. To solve the above challenges, we propose the image-caption-question translating embeddings (ICQ-TransE), which more effectively models both the cross-modality relation and the head entity of visual objects. Specifically, for cross-modality relation embedding, the designed image-caption-question information transmission mechanism transmits the information flow from image to question through the caption bridge, where the caption simultaneously has the visual content and the textual form. With this powerful bridge, cross-modality information can be more effectively fused, resulting in more precisely encoded relation embeddings. For the visual object embedding, instead of using a fixed number of visual regions as the previous methods, the most relevant visual regions to the question are dynamically selected. Experimental results on OK-VQA and KRVQA challenging datasets verify the effectiveness of ICQ-TransE compared with multiple state-of-the-art methods for visual question answering with knowledge. Our code will be available at https://github.com/cmcv2022/ICQ-TransE.
在基于知识的视觉问答(KB-VQA)中,根据与问题和图像相关的跨模态关系嵌入,通过翻译问题所引用的视觉对象嵌入,可以自然地表示答案。虽然跨模态知识的三元组表示是合理且有效的,但这些方法经常遇到两个挑战:1)图像和问题之间的语义差距使跨模态关系难以准确嵌入;2)问题中的视觉对象通常在输入图像中有模糊的引用。为了解决上述问题,我们提出了图像标题问题翻译嵌入(ICQ-TransE),它可以更有效地建模视觉对象的跨模态关系和头部实体。具体而言,对于跨模态关系嵌入,设计的图像-字幕-问题信息传递机制通过字幕桥将信息流从图像传递到问题,其中字幕同时具有视觉内容和文本形式。有了这个强大的桥梁,跨模态信息可以更有效地融合,从而产生更精确编码的关系嵌入。在视觉对象嵌入中,不像以前的方法那样使用固定数量的视觉区域,而是动态选择与问题最相关的视觉区域。在OK-VQA和KRVQA挑战性数据集上的实验结果验证了ICQ-TransE与多种最先进的基于知识的视觉问答方法的有效性。我们的代码可以在https://github.com/cmcv2022/ICQ-TransE上找到。
{"title":"ICQ-TransE: LLM-Enhanced Image-Caption-Question Translating Embeddings for Knowledge-Based Visual Question Answering","authors":"Heng Liu;Boyue Wang;Xiaoyan Li;Yanfeng Sun;Yongli Hu;Baocai Yin","doi":"10.1109/TAI.2025.3575553","DOIUrl":"https://doi.org/10.1109/TAI.2025.3575553","url":null,"abstract":"In knowledge-based visual question answering (KB-VQA), the answer can be naturally represented by translating visual object embedding referred by the question according to the cross-modality relation embedding related to both the question and the image. Though the triplet representation of cross-modality knowledge is plausible and proven effective, these methods often encounter two challenges: 1) The semantic gap between the image and the question makes it difficult to accurately embed the cross-modality relation; and 2) the visual objects in the question often have ambiguous references in the input image. To solve the above challenges, we propose the image-caption-question translating embeddings (ICQ-TransE), which more effectively models both the cross-modality relation and the head entity of visual objects. Specifically, for cross-modality relation embedding, the designed image-caption-question information transmission mechanism transmits the information flow from image to question through the caption bridge, where the caption simultaneously has the visual content and the textual form. With this powerful bridge, cross-modality information can be more effectively fused, resulting in more precisely encoded relation embeddings. For the visual object embedding, instead of using a fixed number of visual regions as the previous methods, the most relevant visual regions to the question are dynamically selected. Experimental results on OK-VQA and KRVQA challenging datasets verify the effectiveness of ICQ-TransE compared with multiple state-of-the-art methods for visual question answering with knowledge. Our code will be available at <uri>https://github.com/cmcv2022/ICQ-TransE</uri>.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"7 1","pages":"412-425"},"PeriodicalIF":0.0,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145898253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Artificial Intelligence Publication Information IEEE人工智能学报
Pub Date : 2025-06-02 DOI: 10.1109/TAI.2025.3569136
{"title":"IEEE Transactions on Artificial Intelligence Publication Information","authors":"","doi":"10.1109/TAI.2025.3569136","DOIUrl":"https://doi.org/10.1109/TAI.2025.3569136","url":null,"abstract":"","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 6","pages":"C2-C2"},"PeriodicalIF":0.0,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11020980","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144196882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mitigating Bias in Opportunistic Screening for MACE with Causal Reasoning. 用因果推理减轻MACE机会性筛查的偏倚。
Pub Date : 2025-05-08 DOI: 10.1109/tai.2025.3567961
Jialu Pi, Juan Maria Farina, Chieh-Ju Chao, Chadi Ayoub, Reza Arsanjani, Imon Banerjee

Mitigating population drift is vital for developing robust AI models for clinical use. While current methodologies focus on reducing demographic bias in disease predictions, they overlook the significant impact of chronic comorbidities. Addressing these complexities is essential to enhance predictive accuracy and reliability across diverse patient demographics, ultimately improving healthcare outcomes. We propose a causal reasoning framework to address selection bias in opportunistic screening for 1-year composite MACE risk using chest X-ray images. Training in high-risk primarily Caucasian patients (43% MACE event), the model was evaluated in a lower-risk emergency department setting (12.8% MACE event) and a relatively lower-risk external Asian patient population (23.81% MACE event) to assess selection bias effects. We benchmarked our approach against a high-performance disease classification model, a propensity score matching strategy, and a debiasing model for unknown biases. The causal+confounder framework achieved an AUC of 0.75 and 0.7 on Shift data and Shift external, outperforming baselines, and a comparable AUC of 0.7 on internal data despite penalties for confounders. It minimized disparities in confounding factors and surpassed traditional and state-of-the-art debiasing methods. Experimental data show that integrating causal reasoning and confounder adjustments in AI models enhances their effectiveness. This approach shows promise for creating fair and robust clinical decision support systems that account for population shifts, ultimately improving the reliability and ethical integrity of AI-driven clinical decision-making.

缓解人口漂移对于开发用于临床的强大人工智能模型至关重要。虽然目前的方法侧重于减少疾病预测中的人口统计学偏差,但它们忽视了慢性合并症的重大影响。解决这些复杂性对于提高不同患者统计数据预测的准确性和可靠性,最终改善医疗保健结果至关重要。我们提出了一个因果推理框架,以解决利用胸部x线图像对1年复合MACE风险进行机会性筛查时的选择偏差。在高风险的主要高加索患者(43% MACE事件)中进行培训,该模型在低风险的急诊科环境(12.8% MACE事件)和相对低风险的外部亚洲患者人群(23.81% MACE事件)中进行评估,以评估选择偏倚效应。我们将我们的方法与高性能疾病分类模型、倾向评分匹配策略和未知偏差的去偏模型进行了基准测试。因果+混杂因素框架在Shift数据和外部Shift数据上的AUC分别为0.75和0.7,优于基线,尽管对混杂因素进行了处罚,但内部数据的AUC可达0.7。它最大限度地减少了混杂因素的差异,超越了传统和最先进的去偏方法。实验数据表明,在人工智能模型中整合因果推理和混杂因素调整可以提高模型的有效性。这种方法有望创造公平和强大的临床决策支持系统,考虑到人口的变化,最终提高人工智能驱动的临床决策的可靠性和道德完整性。
{"title":"Mitigating Bias in Opportunistic Screening for MACE with Causal Reasoning.","authors":"Jialu Pi, Juan Maria Farina, Chieh-Ju Chao, Chadi Ayoub, Reza Arsanjani, Imon Banerjee","doi":"10.1109/tai.2025.3567961","DOIUrl":"10.1109/tai.2025.3567961","url":null,"abstract":"<p><p>Mitigating population drift is vital for developing robust AI models for clinical use. While current methodologies focus on reducing demographic bias in disease predictions, they overlook the significant impact of chronic comorbidities. Addressing these complexities is essential to enhance predictive accuracy and reliability across diverse patient demographics, ultimately improving healthcare outcomes. We propose a causal reasoning framework to address selection bias in opportunistic screening for 1-year composite MACE risk using chest X-ray images. Training in high-risk primarily Caucasian patients (43% MACE event), the model was evaluated in a lower-risk emergency department setting (12.8% MACE event) and a relatively lower-risk external Asian patient population (23.81% MACE event) to assess selection bias effects. We benchmarked our approach against a high-performance disease classification model, a propensity score matching strategy, and a debiasing model for unknown biases. The causal+confounder framework achieved an AUC of 0.75 and 0.7 on Shift data and Shift external, outperforming baselines, and a comparable AUC of 0.7 on internal data despite penalties for confounders. It minimized disparities in confounding factors and surpassed traditional and state-of-the-art debiasing methods. Experimental data show that integrating causal reasoning and confounder adjustments in AI models enhances their effectiveness. This approach shows promise for creating fair and robust clinical decision support systems that account for population shifts, ultimately improving the reliability and ethical integrity of AI-driven clinical decision-making.</p>","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12768338/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145914250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Artificial Intelligence Publication Information IEEE人工智能学报
Pub Date : 2025-04-30 DOI: 10.1109/TAI.2025.3557987
{"title":"IEEE Transactions on Artificial Intelligence Publication Information","authors":"","doi":"10.1109/TAI.2025.3557987","DOIUrl":"https://doi.org/10.1109/TAI.2025.3557987","url":null,"abstract":"","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 5","pages":"C2-C2"},"PeriodicalIF":0.0,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10980623","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143892500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DrugMAP: Deep Multimodal Transformers for Drug-Target Mechanism of Action Prediction drug - map:用于药物靶标作用机制预测的深度多模态变压器
Pub Date : 2025-04-30 DOI: 10.1109/TAI.2025.3565671
Rangan Das;Swadesh Jana;Anannyo Dey;Pascal Le Corre;Marc Cuggia;Ujjwal Maulik;Sanghamitra Bandyopadhyay
The development of new drugs is an expensive and time-consuming process, often hindered by the lack of reliable models to predict drug-target interactions (DTIs) and their mechanisms of action (MoA). Existing deep learning-based methods for DTI prediction typically focus only on binary classification of interactions, overlooking the complex mechanisms underlying these interactions. Moreover, the absence of comprehensive datasets for modeling MoA further complicates this task. To address these limitations, we introduce DrugMAP, a novel multimodal deep learning model that integrates graph neural networks and transformer-based architectures to predict both DTIs and their MoA. We construct a large-scale dataset from multiple public sources, adding a new level of complexity by including detailed MoA annotations for thousands of drug-target pairs. DrugMAP simultaneously leverages the molecular and atomic-level structures of drugs and target proteins, utilizing multirepresentational encoders for enhanced feature extraction. Experimental results show that DrugMAP outperforms state-of-the-art models for both DTI and MoA prediction across multiple benchmark datasets. Our model achieves a 3.5% improvement in AUC for MoA prediction, demonstrating its potential for guiding drug discovery and understanding adverse drug events.
新药的开发是一个昂贵且耗时的过程,往往由于缺乏预测药物-靶标相互作用(DTIs)及其作用机制(MoA)的可靠模型而受到阻碍。现有的基于深度学习的DTI预测方法通常只关注相互作用的二元分类,而忽略了这些相互作用背后的复杂机制。此外,缺乏全面的数据集来建模MoA进一步复杂化了这项任务。为了解决这些限制,我们引入了DrugMAP,这是一种新型的多模态深度学习模型,它集成了图神经网络和基于变压器的架构来预测dti及其MoA。我们从多个公共来源构建了一个大规模数据集,通过包含数千个药物靶标对的详细MoA注释,增加了一个新的复杂性水平。DrugMAP同时利用药物和靶蛋白的分子和原子水平结构,利用多表示编码器增强特征提取。实验结果表明,在多个基准数据集上,DrugMAP在DTI和MoA预测方面都优于最先进的模型。我们的模型在MoA预测方面的AUC提高了3.5%,证明了它在指导药物发现和理解药物不良事件方面的潜力。
{"title":"DrugMAP: Deep Multimodal Transformers for Drug-Target Mechanism of Action Prediction","authors":"Rangan Das;Swadesh Jana;Anannyo Dey;Pascal Le Corre;Marc Cuggia;Ujjwal Maulik;Sanghamitra Bandyopadhyay","doi":"10.1109/TAI.2025.3565671","DOIUrl":"https://doi.org/10.1109/TAI.2025.3565671","url":null,"abstract":"The development of new drugs is an expensive and time-consuming process, often hindered by the lack of reliable models to predict drug-target interactions (DTIs) and their mechanisms of action (MoA). Existing deep learning-based methods for DTI prediction typically focus only on binary classification of interactions, overlooking the complex mechanisms underlying these interactions. Moreover, the absence of comprehensive datasets for modeling MoA further complicates this task. To address these limitations, we introduce DrugMAP, a novel multimodal deep learning model that integrates graph neural networks and transformer-based architectures to predict both DTIs and their MoA. We construct a large-scale dataset from multiple public sources, adding a new level of complexity by including detailed MoA annotations for thousands of drug-target pairs. DrugMAP simultaneously leverages the molecular and atomic-level structures of drugs and target proteins, utilizing multirepresentational encoders for enhanced feature extraction. Experimental results show that DrugMAP outperforms state-of-the-art models for both DTI and MoA prediction across multiple benchmark datasets. Our model achieves a 3.5% improvement in AUC for MoA prediction, demonstrating its potential for guiding drug discovery and understanding adverse drug events.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"3087-3099"},"PeriodicalIF":0.0,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EncryptFlow: Efficient and Lossless Image Encryption Network Based on Normalizing Flows EncryptFlow:基于归一化流的高效无损图像加密网络
Pub Date : 2025-04-29 DOI: 10.1109/TAI.2025.3565483
Menglin Yang;Dong Xie;Guiting Zhang;Fulong Chen;Taochun Wang;Peng Hu
Compared with the cryptographic image encryption schemes, neural networks (NN) based image encryption schemes exhibit a significantly larger key space and offer enhanced capabilities for parallel processing of image data. However, most existing NN-based image encryption schemes suffer from high time complexity in generating random keys, and their decryption processes often fail to fully recover the plaintext images without loss. In this article, we first propose a normalizing flows based encryption network, called EncryptFlow, designed to achieve efficient and lossless image encryption. Normalizing flows employ a special coupling structure to couple the partitioned data, thereby establishing interdependence among them. Specifically, we utilize coupling structures (e.g., additive coupling) that allows the image blocks to alternately encrypt each other during forward propagation. Additionally, we devise a key generation algorithm that produces sub-keys tailored for each layer of the encryption network. The proposed EncryptFlow network seamlessly integrates both encryption and decryption functionalities, leveraging the XOR operation as the encryption function within each layer. The experimental results and comparative analyses indicate that EncryptFlow can encrypt $256times 256$ grayscale images with an average time of merely $0.047s$, and similarly, it requires only $0.188s$ to encrypt color images of the same dimensions.
与加密图像加密方案相比,基于神经网络(NN)的图像加密方案具有更大的密钥空间,并提供了更强的图像数据并行处理能力。然而,现有的大多数基于神经网络的图像加密方案在生成随机密钥时存在较高的时间复杂度,其解密过程往往无法完全恢复明文图像而不丢失。在本文中,我们首先提出了一种基于规范化流的加密网络,称为EncryptFlow,旨在实现高效无损的图像加密。规范化流采用一种特殊的耦合结构来耦合划分的数据,从而在它们之间建立相互依赖关系。具体来说,我们利用耦合结构(例如,加性耦合),允许图像块在前向传播期间交替地相互加密。此外,我们设计了一种密钥生成算法,该算法为加密网络的每一层生成量身定制的子密钥。提出的EncryptFlow网络无缝集成了加密和解密功能,利用异或操作作为每层中的加密功能。实验结果和对比分析表明,EncryptFlow可以加密256 × 256的灰度图像,平均时间仅为0.047美元,同样,加密相同维度的彩色图像只需要0.188美元。
{"title":"EncryptFlow: Efficient and Lossless Image Encryption Network Based on Normalizing Flows","authors":"Menglin Yang;Dong Xie;Guiting Zhang;Fulong Chen;Taochun Wang;Peng Hu","doi":"10.1109/TAI.2025.3565483","DOIUrl":"https://doi.org/10.1109/TAI.2025.3565483","url":null,"abstract":"Compared with the cryptographic image encryption schemes, neural networks (NN) based image encryption schemes exhibit a significantly larger key space and offer enhanced capabilities for parallel processing of image data. However, most existing NN-based image encryption schemes suffer from high time complexity in generating random keys, and their decryption processes often fail to fully recover the plaintext images without loss. In this article, we first propose a normalizing flows based encryption network, called <italic>EncryptFlow</i>, designed to achieve efficient and lossless image encryption. Normalizing flows employ a special coupling structure to couple the partitioned data, thereby establishing interdependence among them. Specifically, we utilize coupling structures (e.g., additive coupling) that allows the image blocks to alternately encrypt each other during forward propagation. Additionally, we devise a key generation algorithm that produces sub-keys tailored for each layer of the encryption network. The proposed EncryptFlow network seamlessly integrates both encryption and decryption functionalities, leveraging the XOR operation as the encryption function within each layer. The experimental results and comparative analyses indicate that EncryptFlow can encrypt <inline-formula> <tex-math>$256times 256$</tex-math></inline-formula> grayscale images with an average time of merely <inline-formula> <tex-math>$0.047s$</tex-math></inline-formula>, and similarly, it requires only <inline-formula> <tex-math>$0.188s$</tex-math></inline-formula> to encrypt color images of the same dimensions.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3377-3390"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Facial Expression Recognition With AI Agents: A Semisupervised Guided Adaptive $beta$-VAE Coupled With Interval Type-2 Fuzzy Classifier 人工智能增强面部表情识别:半监督引导自适应$beta$-VAE与区间2型模糊分类器的结合
Pub Date : 2025-04-29 DOI: 10.1109/TAI.2025.3565225
Mohd Aquib;Nishchal K. Verma;M. Jaleel Akhtar
Facial expression recognition (FER) is a complex task, hindered by subtle distinctions between expression classes, significant variability within each class, and external influences such as identity, pose, age, and ethnicity. As a result, achieving pure expression encodings that are resilient to exogenous factors proves elusive, thereby compromising the downstream classification tasks. This study presents a novel intelligent FER scheme that mitigates the impact of external confounders by integrating disentangled representation learning with fuzzy logic. Building on Adaptive $beta$-variational autoencoder (VAE) [1] as a backbone, we develop a semisupervised guided adaptive $beta$ variational autoencoder (GA-$beta$-VAE) capable of isolating expression features from exogenous factors. Specifically, the adaptive $beta$-VAE is augmented with two additional branches: a deformable PCA-based secondary decoder that disentangles expression-irrelevant transformations from the core expression content, and an adversarial excitation–inhibition branch that forces the “target” (expression) latent variables to be informative only of expressions. This yields well separated, expression-centric embeddings that are subsequently processed by an interval type-2 (IT2) fuzzy classification unit to predict the corresponding expression classes. By avoiding reliance on paired data or explicit annotations, this approach offers a scalable and flexible solution for FER. Experimental evaluations on benchmark datasets [extended Cohn–Kanade (CK+), facial expression recognition plus (FER+), and real-world affective faces database (RAF-DB)] demonstrate the framework’s effectiveness in addressing the challenges posed by exogenous factors, achieving superior accuracy and interpretability compared to state-of-the-art methods.
面部表情识别(FER)是一项复杂的任务,受到表情类别之间的细微差异、类别内部的显著差异以及身份、姿势、年龄和种族等外部影响的阻碍。因此,实现对外源因素具有弹性的纯表达编码被证明是难以实现的,从而影响了下游的分类任务。本研究提出了一种新的智能模糊学习方案,通过将解纠缠表示学习与模糊逻辑相结合,减轻了外部混杂因素的影响。我们以自适应$beta$-变分自编码器(VAE)[1]为骨干,开发了一种半监督引导的自适应$beta$-变分自编码器(GA-$beta$-VAE),能够将表达特征与外源因素隔离开来。具体来说,自适应的$beta$-VAE增加了两个额外的分支:一个可变形的基于pca的二级解码器,它从核心表达内容中分离出与表达无关的转换,以及一个对抗性的兴奋抑制分支,它迫使“目标”(表达)潜在变量仅提供表达的信息。这产生了分离良好的、以表达式为中心的嵌入,随后由区间类型-2 (IT2)模糊分类单元处理,以预测相应的表达式类。通过避免对成对数据或显式注释的依赖,该方法为FER提供了可扩展且灵活的解决方案。对基准数据集[扩展科恩-卡纳德(CK+),面部表情识别plus (FER+)和现实世界情感面部数据库(RAF-DB)]的实验评估表明,与最先进的方法相比,该框架在解决外源因素带来的挑战方面具有有效性,实现了更高的准确性和可解释性。
{"title":"Enhancing Facial Expression Recognition With AI Agents: A Semisupervised Guided Adaptive $beta$-VAE Coupled With Interval Type-2 Fuzzy Classifier","authors":"Mohd Aquib;Nishchal K. Verma;M. Jaleel Akhtar","doi":"10.1109/TAI.2025.3565225","DOIUrl":"https://doi.org/10.1109/TAI.2025.3565225","url":null,"abstract":"Facial expression recognition (FER) is a complex task, hindered by subtle distinctions between expression classes, significant variability within each class, and external influences such as identity, pose, age, and ethnicity. As a result, achieving pure expression encodings that are resilient to exogenous factors proves elusive, thereby compromising the downstream classification tasks. This study presents a novel intelligent FER scheme that mitigates the impact of external confounders by integrating disentangled representation learning with fuzzy logic. Building on Adaptive <inline-formula><tex-math>$beta$</tex-math></inline-formula>-variational autoencoder (VAE) <xref>[1]</xref> as a backbone, we develop a semisupervised guided adaptive <inline-formula><tex-math>$beta$</tex-math></inline-formula> variational autoencoder (GA-<inline-formula><tex-math>$beta$</tex-math></inline-formula>-VAE) capable of isolating expression features from exogenous factors. Specifically, the adaptive <inline-formula><tex-math>$beta$</tex-math></inline-formula>-VAE is augmented with two additional branches: a deformable PCA-based secondary decoder that disentangles expression-irrelevant transformations from the core expression content, and an adversarial excitation–inhibition branch that forces the “target” (expression) latent variables to be informative only of expressions. This yields well separated, expression-centric embeddings that are subsequently processed by an interval type-2 (IT2) fuzzy classification unit to predict the corresponding expression classes. By avoiding reliance on paired data or explicit annotations, this approach offers a scalable and flexible solution for FER. Experimental evaluations on benchmark datasets [extended Cohn–Kanade (CK+), facial expression recognition plus (FER+), and real-world affective faces database (RAF-DB)] demonstrate the framework’s effectiveness in addressing the challenges posed by exogenous factors, achieving superior accuracy and interpretability compared to state-of-the-art methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"3070-3086"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145428953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Head Pruning for Attention Mechanism in the Maritime Domain 海事领域注意机制的自适应头部修剪
Pub Date : 2025-04-28 DOI: 10.1109/TAI.2025.3558724
Walid Messaoud;Rim Trabelsi;Adnane Cabani;Fatma Abdelkefi
In this article, we introduce a novel and synergistic approach that combines attention mechanisms, low-visibility enhancement network (LVENet) for image visibility enhancement, and a tailored head pruning method for multihead self attention (MHSA) models, specifically engineered for attention augmented convolutional network (AACN) and bottleneck transformers (BoTNets). The integration of these techniques aims to comprehensively address the challenges associated with object detection in the maritime domain. The attention mechanism selectively emphasizes critical areas of the image, LVENet enhances visibility under challenging conditions, and the head pruning method optimizes model efficiency and simplicity. Employing meticulous selection and evaluation, our approach achieves precise head pruning without compromising detection performance. Validation using common and maritime datasets underscores the effectiveness of our approach. The results showcase a substantial reduction in epoch time by over 30%, while enhancing accuracy, improving computational efficiency, and streamlining model complexity. This innovation facilitates deployment in challenging maritime scenarios.
在本文中,我们介绍了一种新颖的协同方法,该方法结合了注意机制,用于图像可见性增强的低可见性增强网络(LVENet),以及针对多头自我注意(MHSA)模型的定制头部修剪方法,该模型专门针对注意力增强卷积网络(AACN)和瓶颈转换器(BoTNets)设计。这些技术的整合旨在全面解决与海洋领域目标检测相关的挑战。注意机制选择性地强调图像的关键区域,LVENet增强了挑战性条件下的可见性,头部修剪方法优化了模型的效率和简单性。采用细致的选择和评估,我们的方法在不影响检测性能的情况下实现精确的头部修剪。使用通用和海事数据集的验证强调了我们方法的有效性。结果显示,历元时间大幅减少了30%以上,同时提高了精度,提高了计算效率,简化了模型复杂性。这一创新有助于在具有挑战性的海上场景中进行部署。
{"title":"Adaptive Head Pruning for Attention Mechanism in the Maritime Domain","authors":"Walid Messaoud;Rim Trabelsi;Adnane Cabani;Fatma Abdelkefi","doi":"10.1109/TAI.2025.3558724","DOIUrl":"https://doi.org/10.1109/TAI.2025.3558724","url":null,"abstract":"In this article, we introduce a novel and synergistic approach that combines attention mechanisms, low-visibility enhancement network (LVENet) for image visibility enhancement, and a tailored head pruning method for multihead self attention (MHSA) models, specifically engineered for attention augmented convolutional network (AACN) and bottleneck transformers (BoTNets). The integration of these techniques aims to comprehensively address the challenges associated with object detection in the maritime domain. The attention mechanism selectively emphasizes critical areas of the image, LVENet enhances visibility under challenging conditions, and the head pruning method optimizes model efficiency and simplicity. Employing meticulous selection and evaluation, our approach achieves precise head pruning without compromising detection performance. Validation using common and maritime datasets underscores the effectiveness of our approach. The results showcase a substantial reduction in epoch time by over 30%, while enhancing accuracy, improving computational efficiency, and streamlining model complexity. This innovation facilitates deployment in challenging maritime scenarios.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"2966-2976"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145428950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prescribed Performance Resilient Motion Coordination With Actor–Critic Reinforcement Learning Design for UAV-USV Systems 无人机-无人潜航器系统规定性能弹性运动协调与Actor-Critic强化学习设计
Pub Date : 2025-04-28 DOI: 10.1109/TAI.2025.3564900
Jawhar Ghommam;Maarouf Saad;Mohammad H. Rahman;Quanmin Zhu
In this article, we develop a virtual vehicle scheme to solve the coordination control problem under denial-of-service (DoS) attacks for heterogeneous vehicles. This system includes an unmanned surface vessel (USV) in distress, sharing kinematic data, and a helicopter receiving data from the latter through wireless communication. Specifically, we carefully develop an estimator to model the unmeasurable states of the USV in the presence of DoS attacks. The virtual vehicle concept is then utilized to generate a velocity reference output for the helicopter to follow. To achieve preset tracking performances, the cascade structure of the helicopter is exploited, where the backstepping control strategy is used via a barrier Lyapunov function. To handle input constraints, auxiliary systems are built to bridge the association between input saturation errors and performance constraints. Furthermore, to mitigate the saturation effect of bounded inputs and model uncertainties in the attitude dynamics, a fixed-time reinforcement learning (FT-RL) control algorithm is designed according to actor–critic strategy. Stability analysis is thoroughly studied with the help of Lyapunov stability where sufficient conditions for the whole closed-loop system have been obtained. Numerical simulations have been shown to validate the proposed coordination strategy.
在本文中,我们开发了一个虚拟车辆方案来解决异构车辆在拒绝服务(DoS)攻击下的协调控制问题。该系统包括一艘遇险无人水面舰艇(USV),共享运动学数据,一架直升机通过无线通信接收来自后者的数据。具体来说,我们仔细开发了一个估计器来模拟存在DoS攻击时USV的不可测量状态。然后利用虚拟飞行器的概念来生成一个速度参考输出供直升机跟随。为了实现预设的跟踪性能,利用直升机的级联结构,其中通过障碍李雅普诺夫函数使用后退控制策略。为了处理输入约束,建立了辅助系统来连接输入饱和误差和性能约束之间的联系。此外,为了减轻姿态动力学中有界输入的饱和效应和模型的不确定性,设计了一种基于actor-critic策略的固定时间强化学习(FT-RL)控制算法。利用李雅普诺夫稳定性对系统的稳定性进行了深入的研究,得到了整个闭环系统存在的充分条件。数值模拟验证了所提出的协调策略。
{"title":"Prescribed Performance Resilient Motion Coordination With Actor–Critic Reinforcement Learning Design for UAV-USV Systems","authors":"Jawhar Ghommam;Maarouf Saad;Mohammad H. Rahman;Quanmin Zhu","doi":"10.1109/TAI.2025.3564900","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564900","url":null,"abstract":"In this article, we develop a virtual vehicle scheme to solve the coordination control problem under denial-of-service (DoS) attacks for heterogeneous vehicles. This system includes an unmanned surface vessel (USV) in distress, sharing kinematic data, and a helicopter receiving data from the latter through wireless communication. Specifically, we carefully develop an estimator to model the unmeasurable states of the USV in the presence of DoS attacks. The virtual vehicle concept is then utilized to generate a velocity reference output for the helicopter to follow. To achieve preset tracking performances, the cascade structure of the helicopter is exploited, where the backstepping control strategy is used via a barrier Lyapunov function. To handle input constraints, auxiliary systems are built to bridge the association between input saturation errors and performance constraints. Furthermore, to mitigate the saturation effect of bounded inputs and model uncertainties in the attitude dynamics, a fixed-time reinforcement learning (FT-RL) control algorithm is designed according to actor–critic strategy. Stability analysis is thoroughly studied with the help of Lyapunov stability where sufficient conditions for the whole closed-loop system have been obtained. Numerical simulations have been shown to validate the proposed coordination strategy.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3336-3350"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE transactions on artificial intelligence
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1