Red–green–blue-depth (RGB-D) deep learning-based co-salient object detection (Co-SOD) automatically detects and segments common salient objects in images. However, this computationally intensive model cannot be run on mobile devices. To help overcome this limitation, this article proposes a localization, neighborhood, and semantic guidance network (LNSNet) with knowledge distillation (KD), called LNSNet-S*, for RGB-D Co-SOD to minimize the number of parameters and improve the accuracy. Apart from their backbone networks, the LNSNet student (LNSNet-S) and teacher (LNSNet-T) models use the same structure to capture similarity knowledge in category, channel, and pixel-point dimensions to train an LNSNet-S with KD for superior lightweight performance. For optimization, a positioning path progressive activation uses hierarchical transformers to fuse features from low to high levels, generating class activation localization maps using the fused bimodal information to obtain location information. The high-level neighborhood-guidance information is then used to guide the low-level features. Next, a multisource semantic enhancement embedding module progressively fuses multiscale cross-modal semantic information guided by class-activated localization information. A class-based progressive triplet loss facilitates the transfer of category, channel, and pixel-point information. Extensive experiments demonstrated the effectiveness and robustness of the novel LNSNet-S* in different sizes, and significant improvements were observed. The smallest LNSNet-S* model reduced the number of parameters by more than 92% compared to that of LNSNet-T, requiring only 15.9 M parameters.
基于红绿蓝深(RGB-D)深度学习的协同显著目标检测(Co-SOD)能够自动检测并分割图像中常见的显著目标。然而,这种计算密集型模型不能在移动设备上运行。为了克服这一限制,本文针对RGB-D Co-SOD提出了一种具有知识蒸馏(KD)的定位、邻域和语义引导网络(LNSNet),称为LNSNet- s *,以最大限度地减少参数数量并提高准确性。除了骨干网络之外,LNSNet学生(LNSNet- s)和教师(LNSNet- t)模型使用相同的结构来捕获类别、通道和像素点维度的相似性知识,以训练具有KD的LNSNet- s,以获得卓越的轻量级性能。在优化方面,定位路径渐进式激活利用层次变换从低到高融合特征,利用融合的双峰信息生成类激活定位图,获取位置信息。然后利用高阶邻域引导信息引导低阶特征。其次,多源语义增强嵌入模块以类激活定位信息为导向,逐步融合多尺度跨模态语义信息。基于类的渐进式三联体损耗促进了类别、信道和像素点信息的传输。大量的实验证明了新型LNSNet-S*在不同尺寸下的有效性和鲁棒性,并观察到显著的改进。最小的LNSNet-S*模型与LNSNet-T相比,参数数量减少了92%以上,只需要15.9 M个参数。
{"title":"Location, Neighborhood, and Semantic Guidance Network for RGB-D Co-Salient Object Detection","authors":"Wujie Zhou;Bingying Wang;Xiena Dong;Caie Xu;Fangfang Qiang","doi":"10.1109/TAI.2025.3564238","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564238","url":null,"abstract":"Red–green–blue-depth (RGB-D) deep learning-based co-salient object detection (Co-SOD) automatically detects and segments common salient objects in images. However, this computationally intensive model cannot be run on mobile devices. To help overcome this limitation, this article proposes a localization, neighborhood, and semantic guidance network (LNSNet) with knowledge distillation (KD), called LNSNet-S<sup>*</sup>, for RGB-D Co-SOD to minimize the number of parameters and improve the accuracy. Apart from their backbone networks, the LNSNet student (LNSNet-S) and teacher (LNSNet-T) models use the same structure to capture similarity knowledge in category, channel, and pixel-point dimensions to train an LNSNet-S with KD for superior lightweight performance. For optimization, a positioning path progressive activation uses hierarchical transformers to fuse features from low to high levels, generating class activation localization maps using the fused bimodal information to obtain location information. The high-level neighborhood-guidance information is then used to guide the low-level features. Next, a multisource semantic enhancement embedding module progressively fuses multiscale cross-modal semantic information guided by class-activated localization information. A class-based progressive triplet loss facilitates the transfer of category, channel, and pixel-point information. Extensive experiments demonstrated the effectiveness and robustness of the novel LNSNet-S<sup>*</sup> in different sizes, and significant improvements were observed. The smallest LNSNet-S<sup>*</sup> model reduced the number of parameters by more than 92% compared to that of LNSNet-T, requiring only 15.9 M parameters.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3297-3311"},"PeriodicalIF":0.0,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-23DOI: 10.1109/TAI.2025.3563144
Ghadeer A. Jaradat;Mohammed F. Tolba;Ghada Alsuhli;Hani Saleh;Mahmoud Al-Qutayri;Thanos Stouraitis
In the world of deep learning, transformer models have become very significant, leading to improvements in many areas, from understanding language to recognizing images, covering a wide range of applications. Despite their success, the deployment of these models in real-time applications, particularly on edge devices, poses significant challenges due to their computational intensity and memory demands. To overcome these challenges, we introduce a novel hybrid dynamic pruning (HDP) technique, an efficient algorithm-architecture codesign approach that accelerates transformers using head sparsity, block sparsity, and approximation to reduce computations in attention and reduce memory access. With the observation of the huge redundancy in attention scores and attention heads, we propose a novel integer-based block pruning to prune unimportant blocks in the attention matrix at run time. We also propose integer-based head pruning to detect and prune unimportant heads at an early stage at run time. Also, we propose an approximation method that reduces attention computations. To efficiently support these methods with lower latency, we propose the HDP accelerator (HDPA) as a coprocessor architecture, synthesized in two configurations—HDPA-edge and HDPA-server—to meet the needs of mobile and server platforms. Extensive experiments with different transformer models and benchmarks demonstrate that HDPA-server achieves $481times$ and $381times$ speedup in attention layer computation over Intel i7-1185G7 CPU and NVIDIA T4 GPU, respectively. Compared to other state-of-the-art (SOTA) accelerators, HDPA achieves $1.26times$ to $2.08times$ higher throughput, $1.3times$ to $18times$ greater MAC efficiency, and $1.1times$ to $5.1times$ improved energy efficiency, when normalized to the same computational load.
{"title":"Efficient Transformer Inference Through Hybrid Dynamic Pruning","authors":"Ghadeer A. Jaradat;Mohammed F. Tolba;Ghada Alsuhli;Hani Saleh;Mahmoud Al-Qutayri;Thanos Stouraitis","doi":"10.1109/TAI.2025.3563144","DOIUrl":"https://doi.org/10.1109/TAI.2025.3563144","url":null,"abstract":"In the world of deep learning, transformer models have become very significant, leading to improvements in many areas, from understanding language to recognizing images, covering a wide range of applications. Despite their success, the deployment of these models in real-time applications, particularly on edge devices, poses significant challenges due to their computational intensity and memory demands. To overcome these challenges, we introduce a novel hybrid dynamic pruning (HDP) technique, an efficient algorithm-architecture codesign approach that accelerates transformers using head sparsity, block sparsity, and approximation to reduce computations in attention and reduce memory access. With the observation of the huge redundancy in attention scores and attention heads, we propose a novel integer-based block pruning to prune unimportant blocks in the attention matrix at run time. We also propose integer-based head pruning to detect and prune unimportant heads at an early stage at run time. Also, we propose an approximation method that reduces attention computations. To efficiently support these methods with lower latency, we propose the HDP accelerator (HDPA) as a coprocessor architecture, synthesized in two configurations—HDPA-edge and HDPA-server—to meet the needs of mobile and server platforms. Extensive experiments with different transformer models and benchmarks demonstrate that HDPA-server achieves <inline-formula> <tex-math>$481times$</tex-math></inline-formula> and <inline-formula> <tex-math>$381times$</tex-math></inline-formula> speedup in attention layer computation over Intel i7-1185G7 CPU and NVIDIA T4 GPU, respectively. Compared to other state-of-the-art (SOTA) accelerators, HDPA achieves <inline-formula> <tex-math>$1.26times$</tex-math></inline-formula> to <inline-formula> <tex-math>$2.08times$</tex-math></inline-formula> higher throughput, <inline-formula> <tex-math>$1.3times$</tex-math></inline-formula> to <inline-formula> <tex-math>$18times$</tex-math></inline-formula> greater MAC efficiency, and <inline-formula> <tex-math>$1.1times$</tex-math></inline-formula> to <inline-formula> <tex-math>$5.1times$</tex-math></inline-formula> improved energy efficiency, when normalized to the same computational load.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3273-3286"},"PeriodicalIF":0.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-22DOI: 10.1109/TAI.2025.3563438
Umesh Kashyap;Sudev Kumar Padhi;Sk. Subidh Ali
Perceptual encryption (PE) methods are the key enablers for protecting image privacy for deep learning-based applications in the cloud. In PE, the image content is obfuscated such that the deep learning models can work on the obfuscated data. The key advantage of PE over holomorphic encryption is that, unlike holomorphic encryption, the feature required by the target deep learning model is preserved in the encrypted data. Therefore, the model is not required to be retrained on the encrypted data. Recently, a significant number of PE methods have been proposed in the literature, each improving over the others. In this article, we perform a detailed security analysis of three best-known PE methods, namely, adversarial visual information hiding, learnable encryption, and encryption-then-compression methods designed to protect the privacy of images. We proposed a new generative adversarial network (GAN)-based security evaluation framework to successfully reconstruct the original images encrypted by these methods, showing clear security flaws. We conducted extensive experiments using different datasets and deep learning models. The results show significant vulnerabilities in the existing key-based PE methods.
{"title":"Is Perceptual Encryption Secure? A Security Benchmark for Perceptual Encryption Methods","authors":"Umesh Kashyap;Sudev Kumar Padhi;Sk. Subidh Ali","doi":"10.1109/TAI.2025.3563438","DOIUrl":"https://doi.org/10.1109/TAI.2025.3563438","url":null,"abstract":"Perceptual encryption (PE) methods are the key enablers for protecting image privacy for deep learning-based applications in the cloud. In PE, the image content is obfuscated such that the deep learning models can work on the obfuscated data. The key advantage of PE over holomorphic encryption is that, unlike holomorphic encryption, the feature required by the target deep learning model is preserved in the encrypted data. Therefore, the model is not required to be retrained on the encrypted data. Recently, a significant number of PE methods have been proposed in the literature, each improving over the others. In this article, we perform a detailed security analysis of three best-known PE methods, namely, adversarial visual information hiding, learnable encryption, and encryption-then-compression methods designed to protect the privacy of images. We proposed a new generative adversarial network (GAN)-based security evaluation framework to successfully reconstruct the original images encrypted by these methods, showing clear security flaws. We conducted extensive experiments using different datasets and deep learning models. The results show significant vulnerabilities in the existing key-based PE methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3287-3296"},"PeriodicalIF":0.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-21DOI: 10.1109/TAI.2025.3562839
Zhilin Zhu;Chucai Zhang;Jianhua Dai
Feature selection is an important data preprocessing process in artificial intelligence, which aims to eliminate redundant features while retaining essential features. Measuring feature significance and relevance between features is a significant challenge. Fuzzy information entropy is an extension of Shannon entropy. It is widely used for quantifying the information of fuzzy divisions. However, it has significant limitations, notably the lack of monotonicity in fuzzy conditional entropy measure of decision uncertainty in the feature selection process. We introduce a novel measurement macrogranular entropy (ME) and construct some generalized forms, such as conditional ME, mutual macrogranular information, and joint ME. The conditional ME exhibits monotonicity when measuring decision uncertainty. In addition, we propose two feature selection algorithms: one based on monotonic conditional ME (MCME), and the other based on the degree of symmetric association (ADSA). The ADSA algorithm and the MCME algorithm are compared against eight other feature selection algorithms through a series of experiments. The comparison was conducted based on classification performance using SVM and NB classifiers, and evaluation metrics including F1-score and recall. In terms of all four evaluation metrics, ADSA and MCME achieved the top two rankings, respectively. Specifically, on the NB and SVM classifiers, the ADSA algorithm improves the average accuracy by 12.22% and 2.88% compared to the original feature set, while MCME improves the accuracy by 10.07% and 1.01%, respectively. Experimental comparisons demonstrate that ADSA algorithm effectively removes redundant information from the dataset during feature selection.
{"title":"Fuzzy Information Quantity Measurement and Feature Selection by Macrogranular Entropy","authors":"Zhilin Zhu;Chucai Zhang;Jianhua Dai","doi":"10.1109/TAI.2025.3562839","DOIUrl":"https://doi.org/10.1109/TAI.2025.3562839","url":null,"abstract":"Feature selection is an important data preprocessing process in artificial intelligence, which aims to eliminate redundant features while retaining essential features. Measuring feature significance and relevance between features is a significant challenge. Fuzzy information entropy is an extension of Shannon entropy. It is widely used for quantifying the information of fuzzy divisions. However, it has significant limitations, notably the lack of monotonicity in fuzzy conditional entropy measure of decision uncertainty in the feature selection process. We introduce a novel measurement macrogranular entropy (ME) and construct some generalized forms, such as conditional ME, mutual macrogranular information, and joint ME. The conditional ME exhibits monotonicity when measuring decision uncertainty. In addition, we propose two feature selection algorithms: one based on monotonic conditional ME (MCME), and the other based on the degree of symmetric association (ADSA). The ADSA algorithm and the MCME algorithm are compared against eight other feature selection algorithms through a series of experiments. The comparison was conducted based on classification performance using SVM and NB classifiers, and evaluation metrics including F1-score and recall. In terms of all four evaluation metrics, ADSA and MCME achieved the top two rankings, respectively. Specifically, on the NB and SVM classifiers, the ADSA algorithm improves the average accuracy by 12.22% and 2.88% compared to the original feature set, while MCME improves the accuracy by 10.07% and 1.01%, respectively. Experimental comparisons demonstrate that ADSA algorithm effectively removes redundant information from the dataset during feature selection.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3258-3272"},"PeriodicalIF":0.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-21DOI: 10.1109/TAI.2025.3562505
Clement Fung;Chen Qiu;Aodong Li;Maja Rudolph
Anomaly detection is the task of identifying abnormal samples in large unlabeled datasets. Although the advent of foundation models has produced powerful zero-shot anomaly detection methods, their deployment in practice is often hindered by the absence of labeled validation data—without it, detection performance cannot be evaluated reliably. In this work, we propose selection with synthetic anomalies (SWSA): a general-purpose framework to select image-based anomaly detectors without labeled validation data. Instead of collecting labeled validation data, we generate synthetic anomalies from a small support set of normal images without using any training or fine-tuning. Our synthetic anomalies are then used to create detection tasks that compose a validation framework for model selection. In an empirical study, we evaluate SWSA with three types of synthetic anomalies and on two selection tasks: model selection of image-based anomaly detectors and prompt selection for CLIP-based anomaly detection. SWSA often selects models and prompts that match selections made with a ground-truth validation set, outperforming baseline selection strategies.
{"title":"Model Selection of Anomaly Detectors in the Absence of Labeled Validation Data","authors":"Clement Fung;Chen Qiu;Aodong Li;Maja Rudolph","doi":"10.1109/TAI.2025.3562505","DOIUrl":"https://doi.org/10.1109/TAI.2025.3562505","url":null,"abstract":"Anomaly detection is the task of identifying abnormal samples in large unlabeled datasets. Although the advent of foundation models has produced powerful zero-shot anomaly detection methods, their deployment in practice is often hindered by the absence of labeled validation data—without it, detection performance cannot be evaluated reliably. In this work, we propose selection with synthetic anomalies (SWSA): a general-purpose framework to select image-based anomaly detectors without labeled validation data. Instead of collecting labeled validation data, we generate synthetic anomalies from a small support set of normal images without using any training or fine-tuning. Our synthetic anomalies are then used to create detection tasks that compose a validation framework for model selection. In an empirical study, we evaluate SWSA with three types of synthetic anomalies and on two selection tasks: model selection of image-based anomaly detectors and prompt selection for CLIP-based anomaly detection. SWSA often selects models and prompts that match selections made with a ground-truth validation set, outperforming baseline selection strategies.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3248-3257"},"PeriodicalIF":0.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genetic programming (GP) has been widely applied to evolve scheduling heuristics for dynamic flexible job shop scheduling (DFJSS). However, the evaluation of GP individuals is computationally expensive, especially in large scale DFJSS scenarios. A k-nearest neighbor (KNN) based surrogate has been successfully used to reduce individual evaluation time for GP by predicting the fitness of an individual with the most similar sample in KNN. Particularly, the phenotypes of GP individuals have been utilized to generate samples for KNN-based surrogates with a precondition that the fitness of individuals with the same phenotype is the same or similar. However, their real fitness may differ greatly due to different input decision situations for fitness calculations in DFJSS. Thus, only considering phenotypes of GP individuals to extract samples could decrease the accuracy of KNN surrogates. This article proposes a KNN-based surrogate assisted GP algorithm by considering both the phenotype and genotype of GP individuals to generate samples. Specifically, a genotypic characterization based on terminal frequency is designed to measure the similarity of individual genotypes. The results show that with the same training time, the proposed algorithm can converge fast and achieve better scheduling heuristics than the state-of-the-art algorithms in most examined scenarios. With the same number of generations, the proposed algorithm can obtain comparable performance but only needs about one third of the training time of baseline GP. The effectiveness of the proposed algorithm is also verified from different aspects, e.g., relation between genotype correlation and fitness difference of individuals, and population diversity.
{"title":"Phenotype and Genotype Based Sample Aware Surrogate-Assisted Genetic Programming in Dynamic Flexible Job Shop Scheduling","authors":"Luyao Zhu;Fangfang Zhang;Xiaodong Zhu;Ke Chen;Mengjie Zhang","doi":"10.1109/TAI.2025.3562161","DOIUrl":"https://doi.org/10.1109/TAI.2025.3562161","url":null,"abstract":"Genetic programming (GP) has been widely applied to evolve scheduling heuristics for dynamic flexible job shop scheduling (DFJSS). However, the evaluation of GP individuals is computationally expensive, especially in large scale DFJSS scenarios. A k-nearest neighbor (KNN) based surrogate has been successfully used to reduce individual evaluation time for GP by predicting the fitness of an individual with the most similar sample in KNN. Particularly, the phenotypes of GP individuals have been utilized to generate samples for KNN-based surrogates with a precondition that the fitness of individuals with the same phenotype is the same or similar. However, their real fitness may differ greatly due to different input decision situations for fitness calculations in DFJSS. Thus, only considering phenotypes of GP individuals to extract samples could decrease the accuracy of KNN surrogates. This article proposes a KNN-based surrogate assisted GP algorithm by considering both the phenotype and genotype of GP individuals to generate samples. Specifically, a genotypic characterization based on terminal frequency is designed to measure the similarity of individual genotypes. The results show that with the same training time, the proposed algorithm can converge fast and achieve better scheduling heuristics than the state-of-the-art algorithms in most examined scenarios. With the same number of generations, the proposed algorithm can obtain comparable performance but only needs about one third of the training time of baseline GP. The effectiveness of the proposed algorithm is also verified from different aspects, e.g., relation between genotype correlation and fitness difference of individuals, and population diversity.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3232-3247"},"PeriodicalIF":0.0,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-17DOI: 10.1109/TAI.2025.3562160
Yuxing Xing;Caixia Chen;Jie Wu;Jie Chen
The potential game has been widely used to describe multiagent task allocation. However, the application of traditional game-theoretic algorithms has shown unsatisfactory performance in scenarios with a high agent count. For this, we employ reinforcement learning algorithm to enable each agent to independently make decision in response to other agents’ decisions and variations in the number of agents, ultimately working towards achieving a desired goal. First, we construct a potential game for multiagent task allocation and design a corresponding utility function for each agent. Then, we propose a deep q-network algorithm based on graph neural network, and enhance the agent selection mechanism in this learning algorithm. During each iteration, a task is randomly selected for an agent from the participant set, and each agent updates its strategy accordingly. Finally, by comparing several representative game theoretical algorithms, the numerical simulations highlight the advantages and performance of our proposed GDQ-Net algorithm across various tasks and numbers of agents under the constructed model.
{"title":"Reinforcement Learning for Efficient Multiagent Task Allocation in Potential Game Model","authors":"Yuxing Xing;Caixia Chen;Jie Wu;Jie Chen","doi":"10.1109/TAI.2025.3562160","DOIUrl":"https://doi.org/10.1109/TAI.2025.3562160","url":null,"abstract":"The potential game has been widely used to describe multiagent task allocation. However, the application of traditional game-theoretic algorithms has shown unsatisfactory performance in scenarios with a high agent count. For this, we employ reinforcement learning algorithm to enable each agent to independently make decision in response to other agents’ decisions and variations in the number of agents, ultimately working towards achieving a desired goal. First, we construct a potential game for multiagent task allocation and design a corresponding utility function for each agent. Then, we propose a deep q-network algorithm based on graph neural network, and enhance the agent selection mechanism in this learning algorithm. During each iteration, a task is randomly selected for an agent from the participant set, and each agent updates its strategy accordingly. Finally, by comparing several representative game theoretical algorithms, the numerical simulations highlight the advantages and performance of our proposed GDQ-Net algorithm across various tasks and numbers of agents under the constructed model.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3217-3231"},"PeriodicalIF":0.0,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-16DOI: 10.1109/TAI.2025.3560921
Hailong Hu;Jun Pang
Generative adversarial networks (GANs) have shown remarkable success in image synthesis, making GAN models themselves commercially valuable to legitimate model owners. Therefore, it is critical to technically protect the intellectual property of GANs. Prior works need to tamper with the training set or training process to verify the ownership of a GAN. In this article, we show that these methods are not robust to emerging model extraction attacks. Then, we propose a new method GAN-Guards which utilizes the common characteristics of a target model and its stolen models for ownership infringement detection. Our method can be directly applicable to all well-trained GANs as it does not require retraining target models. Extensive experimental results show that our new method achieves superior detection performance, compared with the watermark-based and fingerprint-based methods. Finally, we demonstrate the effectiveness of our method with respect to the number of generations of model extraction attacks, the number of generated samples, and adaptive attacks.
{"title":"Ownership Infringement Detection for Generative Adversarial Networks Against Model Stealing","authors":"Hailong Hu;Jun Pang","doi":"10.1109/TAI.2025.3560921","DOIUrl":"https://doi.org/10.1109/TAI.2025.3560921","url":null,"abstract":"Generative adversarial networks (GANs) have shown remarkable success in image synthesis, making GAN models themselves commercially valuable to legitimate model owners. Therefore, it is critical to technically protect the intellectual property of GANs. Prior works need to tamper with the training set or training process to verify the ownership of a GAN. In this article, we show that these methods are not robust to emerging model extraction attacks. Then, we propose a new method GAN-Guards which utilizes the common characteristics of a target model and its stolen models for ownership infringement detection. Our method can be directly applicable to all well-trained GANs as it does not require retraining target models. Extensive experimental results show that our new method achieves superior detection performance, compared with the watermark-based and fingerprint-based methods. Finally, we demonstrate the effectiveness of our method with respect to the number of generations of model extraction attacks, the number of generated samples, and adaptive attacks.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"3018-3029"},"PeriodicalIF":0.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning (DL) has made significant advancements in tomographic imaging, particularly in low-dose computed tomography (LDCT) denoising. A recent trend involves servers training powerful models with enormous self-collected data and providing application programming interfaces (APIs) for users, such as Chat-GPT. To avoid model leakage, users are required to upload their data to the server. This approach is particularly advantageous for devices with limited computational capabilities, as it offloads computation to the server, easing the workload on the devices themselves. However, this way raises public concerns about the privacy disclosure risk. Hence, to alleviate related concerns, we propose to directly denoise LDCT in the encrypted domain to achieve privacy-preserving cloud services without exposing private data to the server. Concretely, we employ homomorphic encryption to encrypt private LDCT, which is then transferred to the server model trained with plaintext LDCT for further denoising. Since fundamental DL operations, such as convolution and linear transformation, cannot be directly used in the encrypted domain, we transform the fundamental mathematic operations in the plaintext domain into the operations in the encrypted domain. Moreover, we present two interactive frameworks for linear and nonlinear models, both of which can achieve lossless operating. In this way, the proposed methods can achieve two merits, the data privacy is well protected, and the server model is free from the risk of model leakage. Moreover, we provide theoretical proof to validate the lossless property of our framework. Finally, experiments were conducted to demonstrate that the transferred contents are well protected and cannot be reconstructed.1
The codes are released at https://github.com/Zi-YuanYang/Encrypt_LDCT_Recon
{"title":"A Novel Privacy-Enhancing Framework for Low-Dose CT Denoising","authors":"Ziyuan Yang;Huijie Huangfu;Maosong Ran;Zhiwen Wang;Hui Yu;Mengyu Sun;Yi Zhang","doi":"10.1109/TAI.2025.3561092","DOIUrl":"https://doi.org/10.1109/TAI.2025.3561092","url":null,"abstract":"Deep learning (DL) has made significant advancements in tomographic imaging, particularly in low-dose computed tomography (LDCT) denoising. A recent trend involves servers training powerful models with enormous self-collected data and providing application programming interfaces (APIs) for users, such as Chat-GPT. To avoid model leakage, users are required to upload their data to the server. This approach is particularly advantageous for devices with limited computational capabilities, as it offloads computation to the server, easing the workload on the devices themselves. However, this way raises public concerns about the privacy disclosure risk. Hence, to alleviate related concerns, we propose to directly denoise LDCT in the encrypted domain to achieve privacy-preserving cloud services without exposing private data to the server. Concretely, we employ homomorphic encryption to encrypt private LDCT, which is then transferred to the server model trained with plaintext LDCT for further denoising. Since fundamental DL operations, such as convolution and linear transformation, cannot be directly used in the encrypted domain, we transform the fundamental mathematic operations in the plaintext domain into the operations in the encrypted domain. Moreover, we present two interactive frameworks for linear and nonlinear models, both of which can achieve lossless operating. In this way, the proposed methods can achieve two merits, the data privacy is well protected, and the server model is free from the risk of model leakage. Moreover, we provide theoretical proof to validate the lossless property of our framework. Finally, experiments were conducted to demonstrate that the transferred contents are well protected and cannot be reconstructed.<xref><sup>1</sup></xref><fn><label><sup>1</sup></label><p>The codes are released at <uri>https://github.com/Zi-YuanYang/Encrypt_LDCT_Recon</uri></p></fn>","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"3043-3055"},"PeriodicalIF":0.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recovering the structure of causal graphical models from observational data is an essential yet challenging task for causal discovery in scientific scenarios. Domain-specific causal discovery usually relies on expert validation or prior analysis to improve the reliability of recovered causality, which is yet limited by the scarcity of expert resources. Recently, large language models (LLM) have been used for causal analysis across various domain-specific scenarios, suggesting its potential as autonomous expert roles in guiding data-based structure learning. However, integrating LLMs into causal discovery faces challenges due to inaccuracies in LLM-based reasoning on revealing the actual causal structure. To address this challenge, we propose an error-tolerant LLM-driven causal discovery framework. The error-tolerant mechanism is designed three-fold with sufficient consideration on potential inaccuracies. In the LLM-based reasoning process, an accuracy-oriented prompting strategy restricts causal analysis to a reliable range. Next, a knowledge-to-structure transition aligns LLM-derived causal statements with structural causal interactions. In the structure learning process, the goodness-of-fit to data and adherence to LLM-derived priors are balanced to further address prior inaccuracies. Evaluation of eight real-world causal structures demonstrates the efficacy of our LLM-driven approach in improving data-based causal discovery, along with its robustness to inaccurate LLM-derived priors.
{"title":"Integrating Large Language Model for Improved Causal Discovery","authors":"Taiyu Ban;Lyuzhou Chen;Derui Lyu;Xiangyu Wang;Qinrui Zhu;Qiang Tu;Huanhuan Chen","doi":"10.1109/TAI.2025.3560927","DOIUrl":"https://doi.org/10.1109/TAI.2025.3560927","url":null,"abstract":"Recovering the structure of causal graphical models from observational data is an essential yet challenging task for causal discovery in scientific scenarios. Domain-specific causal discovery usually relies on expert validation or prior analysis to improve the reliability of recovered causality, which is yet limited by the scarcity of expert resources. Recently, large language models (LLM) have been used for causal analysis across various domain-specific scenarios, suggesting its potential as autonomous expert roles in guiding data-based structure learning. However, integrating LLMs into causal discovery faces challenges due to inaccuracies in LLM-based reasoning on revealing the actual causal structure. To address this challenge, we propose an error-tolerant LLM-driven causal discovery framework. The error-tolerant mechanism is designed three-fold with sufficient consideration on potential inaccuracies. In the LLM-based reasoning process, an accuracy-oriented prompting strategy restricts causal analysis to a reliable range. Next, a knowledge-to-structure transition aligns LLM-derived causal statements with structural causal interactions. In the structure learning process, the goodness-of-fit to data and adherence to LLM-derived priors are balanced to further address prior inaccuracies. Evaluation of eight real-world causal structures demonstrates the efficacy of our LLM-driven approach in improving data-based causal discovery, along with its robustness to inaccurate LLM-derived priors.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"3030-3042"},"PeriodicalIF":0.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145456013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}