Pub Date : 2025-04-28DOI: 10.1109/TAI.2025.3564903
Emmanuel Pintelas;Ioannis E. Livieris;Panagiotis E. Pintelas
In the dynamic domain of synthetic media, deepfakes challenge the trust in digital communication. The identification of manipulated content is essential to ensure the authenticity of shared information. Recent advances in deepfake detection have focused on developing sophisticated convolutional neural network (CNN)-based approaches. However, these approaches remain anchored within the continuous feature space, potentially missing manipulative signatures that might be more salient in a discrete domain. For this task, we propose a new strategy that combines insights from both continuous and discrete spaces for enhanced deepfake detection. Our hypothesis is that deepfakes may lie closer to a discrete space, potentially revealing hidden patterns that are not evident in continuous representations. In addition, we propose a new gradual-unfreezing technique, employed in the proposed framework to slowly adapt the network parameters to align with the new combined representation. Via comprehensive experimentation, the efficiency of the proposed approach is highlighted, in comparison with state-of-the-art (SoA) deepfake detection strategies.
{"title":"Quantization-Based 3D-CNNs Through Circular Gradual Unfreezing for DeepFake Detection","authors":"Emmanuel Pintelas;Ioannis E. Livieris;Panagiotis E. Pintelas","doi":"10.1109/TAI.2025.3564903","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564903","url":null,"abstract":"In the dynamic domain of synthetic media, deepfakes challenge the trust in digital communication. The identification of manipulated content is essential to ensure the authenticity of shared information. Recent advances in deepfake detection have focused on developing sophisticated convolutional neural network (CNN)-based approaches. However, these approaches remain anchored within the continuous feature space, potentially missing manipulative signatures that might be more salient in a discrete domain. For this task, we propose a new strategy that combines insights from both continuous and discrete spaces for enhanced deepfake detection. Our hypothesis is that deepfakes may lie closer to a discrete space, potentially revealing hidden patterns that are not evident in continuous representations. In addition, we propose a new gradual-unfreezing technique, employed in the proposed framework to slowly adapt the network parameters to align with the new combined representation. Via comprehensive experimentation, the efficiency of the proposed approach is highlighted, in comparison with state-of-the-art (SoA) deepfake detection strategies.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3351-3363"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-28DOI: 10.1109/TAI.2025.3564911
Dongfang Ma;Zhaoyang Ma;Chengying Wu;Jianmin Lin
Tropical cyclones (TCs) are destructive weather systems, and the accurate prediction of the trajectory of TCs is crucial. Previous studies have focused mainly on trajectory prediction for individual TCs, which cannot take into account the interaction between different TCs, affecting the prediction performance. To address this problem, this study proposed an innovative method for multi-TC trajectory prediction based on a density map. Instead of predicting the location of a TC directly, the article first predicts the density map of a sea area, and then obtain TC centers from the predicted density maps. In the first step, a relation extraction module (REM) is proposed in order to analyze the interaction between multiple TCs. Further, a 3-D cloud feature extraction module was designed to enhance the ability to use 3-D cloud structural information on TCs via feature extraction and the fusion of density maps, satellite images, and environmental data. In addition, a long short-term memory (LSTM) fusion module was designed to adaptively select important historical information, which improves the ability to extract long-term spatiotemporal dependencies. In the second step, those density map pixels with extreme values are identified as TC centers. The proposed method was verified by experiments using Gridsat, IBTrACS, and ERA5 datasets. The results show that the mean distance error of TC trajectory prediction is reduced by 10.0%, 10.7%, 10.5%, and 11.7% for overall performance, and 21.5%, 18.0%, 19.1%, and 19.8% for multi-TC scenario in the 6-, 12-, 18-, and 24-h predictions compared with state-of-the-art prediction models.
{"title":"A Multitropical Cyclone Trajectory Prediction Method Based on Density Maps With Memory and Data Fusion","authors":"Dongfang Ma;Zhaoyang Ma;Chengying Wu;Jianmin Lin","doi":"10.1109/TAI.2025.3564911","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564911","url":null,"abstract":"Tropical cyclones (TCs) are destructive weather systems, and the accurate prediction of the trajectory of TCs is crucial. Previous studies have focused mainly on trajectory prediction for individual TCs, which cannot take into account the interaction between different TCs, affecting the prediction performance. To address this problem, this study proposed an innovative method for multi-TC trajectory prediction based on a density map. Instead of predicting the location of a TC directly, the article first predicts the density map of a sea area, and then obtain TC centers from the predicted density maps. In the first step, a relation extraction module (REM) is proposed in order to analyze the interaction between multiple TCs. Further, a 3-D cloud feature extraction module was designed to enhance the ability to use 3-D cloud structural information on TCs via feature extraction and the fusion of density maps, satellite images, and environmental data. In addition, a long short-term memory (LSTM) fusion module was designed to adaptively select important historical information, which improves the ability to extract long-term spatiotemporal dependencies. In the second step, those density map pixels with extreme values are identified as TC centers. The proposed method was verified by experiments using Gridsat, IBTrACS, and ERA5 datasets. The results show that the mean distance error of TC trajectory prediction is reduced by 10.0%, 10.7%, 10.5%, and 11.7% for overall performance, and 21.5%, 18.0%, 19.1%, and 19.8% for multi-TC scenario in the 6-, 12-, 18-, and 24-h predictions compared with state-of-the-art prediction models.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3364-3376"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-25DOI: 10.1109/TAI.2025.3564603
Neenu Sharma;Deepak Joshi
The advancements in telehealth monitoring technology have enabled the collection of vast quantities of electro-physiological signals, including the electrocardiogram (ECG) which contains critical diagnostic information about cardiac diseases. There are two main key challenges in the automatic classification of cardiac rhythms. First, addressing the specific characteristics of irregular heartbeats is critical for accurate classification. Second, the low frequency of ECG signals combined with noise interference makes it particularly difficult to efficiently detect abnormal electrical activity in the heart. To solve this issue, this article proposes an ensemble deep-learning model, ECG_DEEPNet architecture to enhance the delineation of ECG signals with improved accuracy for better diagnosis in telemedicine monitoring systems. The presented technique consists of a feature extraction stage using a convolutional neural network (CNN) and a sequence processing stage using a combination of gated recurrent units (GRU) and bidirectional long short-term memory (BiLSTM) networks. The proposed method is divided into four parts: first, the signal preprocessing, second waveform segmentation, third classification of ECG signals and lastly results are evaluated on the proposed model. The proposed technique was tested and trained using standard Lobachevsky University Electrocardiography Database (LUDB) and QT database (QTDB) containing annotation of a waveform for accurate classification of ECG wave components. The presented technique shows the average accuracy of 99.82%, 98.50%, and 97.42% for QRS, P, and T on the QTDB database, and 99.96%, 98.82%, and 99.47% on LUDB dataset, respectively, for classification and delineation of ECG signals. The proposed technique achieves better performance compared to state-of-the-art methods, which results in a better diagnosis of heart-related problems.
{"title":"ECG_DEEPNet: A Novel Approach for Delineation and Classification of Electrocardiogram Signal Based on Ensemble Deep-Learning","authors":"Neenu Sharma;Deepak Joshi","doi":"10.1109/TAI.2025.3564603","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564603","url":null,"abstract":"The advancements in telehealth monitoring technology have enabled the collection of vast quantities of electro-physiological signals, including the electrocardiogram (ECG) which contains critical diagnostic information about cardiac diseases. There are two main key challenges in the automatic classification of cardiac rhythms. First, addressing the specific characteristics of irregular heartbeats is critical for accurate classification. Second, the low frequency of ECG signals combined with noise interference makes it particularly difficult to efficiently detect abnormal electrical activity in the heart. To solve this issue, this article proposes an ensemble deep-learning model, <monospace>ECG_DEEPNet</monospace> architecture to enhance the delineation of ECG signals with improved accuracy for better diagnosis in telemedicine monitoring systems. The presented technique consists of a feature extraction stage using a convolutional neural network (CNN) and a sequence processing stage using a combination of gated recurrent units (GRU) and bidirectional long short-term memory (BiLSTM) networks. The proposed method is divided into four parts: first, the signal preprocessing, second waveform segmentation, third classification of ECG signals and lastly results are evaluated on the proposed model. The proposed technique was tested and trained using standard Lobachevsky University Electrocardiography Database (LUDB) and QT database (QTDB) containing annotation of a waveform for accurate classification of ECG wave components. The presented technique shows the average accuracy of 99.82%, 98.50%, and 97.42% for QRS, P, and T on the QTDB database, and 99.96%, 98.82%, and 99.47% on LUDB dataset, respectively, for classification and delineation of ECG signals. The proposed technique achieves better performance compared to state-of-the-art methods, which results in a better diagnosis of heart-related problems.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3321-3335"},"PeriodicalIF":0.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-25DOI: 10.1109/TAI.2025.3564605
Sardar Jaf;Basel Barakat
Despite the extensive communication benefits offered by social media platforms, numerous challenges must be addressed to ensure user safety. One of the most significant risks faced by users on these platforms is targeted hatespeech. Social media platforms are widely utilized for generating datasets employed in training and evaluating machine learning algorithms for hatespeech detection. However, existing public datasets exhibit numerous limitations, hindering the effective training of these algorithms and leading to inaccurate hatespeech classification. This study provides a systematic empirical evaluation of several public datasets commonly used in automated hatespeech classification. Through rigorous analysis, we present compelling evidence highlighting the limitations of current hatespeech datasets. Additionally, we conduct a range of statistical analyses to elucidate the strengths and weaknesses inherent in these datasets. This work aims to advance the development of more accurate and reliable machine learning models for hatespeech detection by addressing the dataset limitations identified.
{"title":"Empirical Evaluation of Public HateSpeech Datasets","authors":"Sardar Jaf;Basel Barakat","doi":"10.1109/TAI.2025.3564605","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564605","url":null,"abstract":"Despite the extensive communication benefits offered by social media platforms, numerous challenges must be addressed to ensure user safety. One of the most significant risks faced by users on these platforms is targeted hatespeech. Social media platforms are widely utilized for generating datasets employed in training and evaluating machine learning algorithms for hatespeech detection. However, existing public datasets exhibit numerous limitations, hindering the effective training of these algorithms and leading to inaccurate hatespeech classification. This study provides a systematic empirical evaluation of several public datasets commonly used in automated hatespeech classification. Through rigorous analysis, we present compelling evidence highlighting the limitations of current hatespeech datasets. Additionally, we conduct a range of statistical analyses to elucidate the strengths and weaknesses inherent in these datasets. This work aims to advance the development of more accurate and reliable machine learning models for hatespeech detection by addressing the dataset limitations identified.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"3056-3069"},"PeriodicalIF":0.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145428949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-25DOI: 10.1109/TAI.2025.3564243
Xiao Yang;Zhan-Li Sun;Mengya Liu;Zhigang Zeng;Kin-Man Lam;Xin Wang
Due to the significant differences between the structural and sequence information of RNA, accurately predicting RNA-small molecule binding sites by utilizing these two attributes remains a challenging task. This study introduces a novel network for predicting RNA-small molecule binding sites, employing a two-stage approach that integrates feature extraction and fusion processes. On one hand, in order to capture the diverse characteristic information of RNA, a dual-path feature extraction module is proposed to extract features from both short-range and long-range perspectives, by incorporating convolutional and attention networks. On the other hand, a one-dimensional multiscale feature fusion module, consisting of parallel one-dimensional convolutional kernels, is proposed to extract feature information at multiple granularities and to effectively integrate the features of nucleotides on the RNA chain and their neighboring nucleotides. Experimental results demonstrate that RNA-small molecule binding sites prediction by dual-path feature extraction and one-dimensional multiscale feature fusion network (RSMBSP-DON) is competitive with some recently reported methods.
{"title":"RSMBSP-DON: RNA-Small Molecule Binding Sites Prediction by Dual-Path Feature Extraction and One-Dimensional Multiscale Feature Fusion Network","authors":"Xiao Yang;Zhan-Li Sun;Mengya Liu;Zhigang Zeng;Kin-Man Lam;Xin Wang","doi":"10.1109/TAI.2025.3564243","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564243","url":null,"abstract":"Due to the significant differences between the structural and sequence information of RNA, accurately predicting RNA-small molecule binding sites by utilizing these two attributes remains a challenging task. This study introduces a novel network for predicting RNA-small molecule binding sites, employing a two-stage approach that integrates feature extraction and fusion processes. On one hand, in order to capture the diverse characteristic information of RNA, a dual-path feature extraction module is proposed to extract features from both short-range and long-range perspectives, by incorporating convolutional and attention networks. On the other hand, a one-dimensional multiscale feature fusion module, consisting of parallel one-dimensional convolutional kernels, is proposed to extract feature information at multiple granularities and to effectively integrate the features of nucleotides on the RNA chain and their neighboring nucleotides. Experimental results demonstrate that <bold>RNA-small molecule binding sites prediction by dual-path feature extraction and one-dimensional multiscale feature fusion network (RSMBSP-DON)</b> is competitive with some recently reported methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3312-3320"},"PeriodicalIF":0.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Red–green–blue-depth (RGB-D) deep learning-based co-salient object detection (Co-SOD) automatically detects and segments common salient objects in images. However, this computationally intensive model cannot be run on mobile devices. To help overcome this limitation, this article proposes a localization, neighborhood, and semantic guidance network (LNSNet) with knowledge distillation (KD), called LNSNet-S*, for RGB-D Co-SOD to minimize the number of parameters and improve the accuracy. Apart from their backbone networks, the LNSNet student (LNSNet-S) and teacher (LNSNet-T) models use the same structure to capture similarity knowledge in category, channel, and pixel-point dimensions to train an LNSNet-S with KD for superior lightweight performance. For optimization, a positioning path progressive activation uses hierarchical transformers to fuse features from low to high levels, generating class activation localization maps using the fused bimodal information to obtain location information. The high-level neighborhood-guidance information is then used to guide the low-level features. Next, a multisource semantic enhancement embedding module progressively fuses multiscale cross-modal semantic information guided by class-activated localization information. A class-based progressive triplet loss facilitates the transfer of category, channel, and pixel-point information. Extensive experiments demonstrated the effectiveness and robustness of the novel LNSNet-S* in different sizes, and significant improvements were observed. The smallest LNSNet-S* model reduced the number of parameters by more than 92% compared to that of LNSNet-T, requiring only 15.9 M parameters.
基于红绿蓝深(RGB-D)深度学习的协同显著目标检测(Co-SOD)能够自动检测并分割图像中常见的显著目标。然而,这种计算密集型模型不能在移动设备上运行。为了克服这一限制,本文针对RGB-D Co-SOD提出了一种具有知识蒸馏(KD)的定位、邻域和语义引导网络(LNSNet),称为LNSNet- s *,以最大限度地减少参数数量并提高准确性。除了骨干网络之外,LNSNet学生(LNSNet- s)和教师(LNSNet- t)模型使用相同的结构来捕获类别、通道和像素点维度的相似性知识,以训练具有KD的LNSNet- s,以获得卓越的轻量级性能。在优化方面,定位路径渐进式激活利用层次变换从低到高融合特征,利用融合的双峰信息生成类激活定位图,获取位置信息。然后利用高阶邻域引导信息引导低阶特征。其次,多源语义增强嵌入模块以类激活定位信息为导向,逐步融合多尺度跨模态语义信息。基于类的渐进式三联体损耗促进了类别、信道和像素点信息的传输。大量的实验证明了新型LNSNet-S*在不同尺寸下的有效性和鲁棒性,并观察到显著的改进。最小的LNSNet-S*模型与LNSNet-T相比,参数数量减少了92%以上,只需要15.9 M个参数。
{"title":"Location, Neighborhood, and Semantic Guidance Network for RGB-D Co-Salient Object Detection","authors":"Wujie Zhou;Bingying Wang;Xiena Dong;Caie Xu;Fangfang Qiang","doi":"10.1109/TAI.2025.3564238","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564238","url":null,"abstract":"Red–green–blue-depth (RGB-D) deep learning-based co-salient object detection (Co-SOD) automatically detects and segments common salient objects in images. However, this computationally intensive model cannot be run on mobile devices. To help overcome this limitation, this article proposes a localization, neighborhood, and semantic guidance network (LNSNet) with knowledge distillation (KD), called LNSNet-S<sup>*</sup>, for RGB-D Co-SOD to minimize the number of parameters and improve the accuracy. Apart from their backbone networks, the LNSNet student (LNSNet-S) and teacher (LNSNet-T) models use the same structure to capture similarity knowledge in category, channel, and pixel-point dimensions to train an LNSNet-S with KD for superior lightweight performance. For optimization, a positioning path progressive activation uses hierarchical transformers to fuse features from low to high levels, generating class activation localization maps using the fused bimodal information to obtain location information. The high-level neighborhood-guidance information is then used to guide the low-level features. Next, a multisource semantic enhancement embedding module progressively fuses multiscale cross-modal semantic information guided by class-activated localization information. A class-based progressive triplet loss facilitates the transfer of category, channel, and pixel-point information. Extensive experiments demonstrated the effectiveness and robustness of the novel LNSNet-S<sup>*</sup> in different sizes, and significant improvements were observed. The smallest LNSNet-S<sup>*</sup> model reduced the number of parameters by more than 92% compared to that of LNSNet-T, requiring only 15.9 M parameters.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3297-3311"},"PeriodicalIF":0.0,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-23DOI: 10.1109/TAI.2025.3563144
Ghadeer A. Jaradat;Mohammed F. Tolba;Ghada Alsuhli;Hani Saleh;Mahmoud Al-Qutayri;Thanos Stouraitis
In the world of deep learning, transformer models have become very significant, leading to improvements in many areas, from understanding language to recognizing images, covering a wide range of applications. Despite their success, the deployment of these models in real-time applications, particularly on edge devices, poses significant challenges due to their computational intensity and memory demands. To overcome these challenges, we introduce a novel hybrid dynamic pruning (HDP) technique, an efficient algorithm-architecture codesign approach that accelerates transformers using head sparsity, block sparsity, and approximation to reduce computations in attention and reduce memory access. With the observation of the huge redundancy in attention scores and attention heads, we propose a novel integer-based block pruning to prune unimportant blocks in the attention matrix at run time. We also propose integer-based head pruning to detect and prune unimportant heads at an early stage at run time. Also, we propose an approximation method that reduces attention computations. To efficiently support these methods with lower latency, we propose the HDP accelerator (HDPA) as a coprocessor architecture, synthesized in two configurations—HDPA-edge and HDPA-server—to meet the needs of mobile and server platforms. Extensive experiments with different transformer models and benchmarks demonstrate that HDPA-server achieves $481times$ and $381times$ speedup in attention layer computation over Intel i7-1185G7 CPU and NVIDIA T4 GPU, respectively. Compared to other state-of-the-art (SOTA) accelerators, HDPA achieves $1.26times$ to $2.08times$ higher throughput, $1.3times$ to $18times$ greater MAC efficiency, and $1.1times$ to $5.1times$ improved energy efficiency, when normalized to the same computational load.
{"title":"Efficient Transformer Inference Through Hybrid Dynamic Pruning","authors":"Ghadeer A. Jaradat;Mohammed F. Tolba;Ghada Alsuhli;Hani Saleh;Mahmoud Al-Qutayri;Thanos Stouraitis","doi":"10.1109/TAI.2025.3563144","DOIUrl":"https://doi.org/10.1109/TAI.2025.3563144","url":null,"abstract":"In the world of deep learning, transformer models have become very significant, leading to improvements in many areas, from understanding language to recognizing images, covering a wide range of applications. Despite their success, the deployment of these models in real-time applications, particularly on edge devices, poses significant challenges due to their computational intensity and memory demands. To overcome these challenges, we introduce a novel hybrid dynamic pruning (HDP) technique, an efficient algorithm-architecture codesign approach that accelerates transformers using head sparsity, block sparsity, and approximation to reduce computations in attention and reduce memory access. With the observation of the huge redundancy in attention scores and attention heads, we propose a novel integer-based block pruning to prune unimportant blocks in the attention matrix at run time. We also propose integer-based head pruning to detect and prune unimportant heads at an early stage at run time. Also, we propose an approximation method that reduces attention computations. To efficiently support these methods with lower latency, we propose the HDP accelerator (HDPA) as a coprocessor architecture, synthesized in two configurations—HDPA-edge and HDPA-server—to meet the needs of mobile and server platforms. Extensive experiments with different transformer models and benchmarks demonstrate that HDPA-server achieves <inline-formula> <tex-math>$481times$</tex-math></inline-formula> and <inline-formula> <tex-math>$381times$</tex-math></inline-formula> speedup in attention layer computation over Intel i7-1185G7 CPU and NVIDIA T4 GPU, respectively. Compared to other state-of-the-art (SOTA) accelerators, HDPA achieves <inline-formula> <tex-math>$1.26times$</tex-math></inline-formula> to <inline-formula> <tex-math>$2.08times$</tex-math></inline-formula> higher throughput, <inline-formula> <tex-math>$1.3times$</tex-math></inline-formula> to <inline-formula> <tex-math>$18times$</tex-math></inline-formula> greater MAC efficiency, and <inline-formula> <tex-math>$1.1times$</tex-math></inline-formula> to <inline-formula> <tex-math>$5.1times$</tex-math></inline-formula> improved energy efficiency, when normalized to the same computational load.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3273-3286"},"PeriodicalIF":0.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-22DOI: 10.1109/TAI.2025.3563438
Umesh Kashyap;Sudev Kumar Padhi;Sk. Subidh Ali
Perceptual encryption (PE) methods are the key enablers for protecting image privacy for deep learning-based applications in the cloud. In PE, the image content is obfuscated such that the deep learning models can work on the obfuscated data. The key advantage of PE over holomorphic encryption is that, unlike holomorphic encryption, the feature required by the target deep learning model is preserved in the encrypted data. Therefore, the model is not required to be retrained on the encrypted data. Recently, a significant number of PE methods have been proposed in the literature, each improving over the others. In this article, we perform a detailed security analysis of three best-known PE methods, namely, adversarial visual information hiding, learnable encryption, and encryption-then-compression methods designed to protect the privacy of images. We proposed a new generative adversarial network (GAN)-based security evaluation framework to successfully reconstruct the original images encrypted by these methods, showing clear security flaws. We conducted extensive experiments using different datasets and deep learning models. The results show significant vulnerabilities in the existing key-based PE methods.
{"title":"Is Perceptual Encryption Secure? A Security Benchmark for Perceptual Encryption Methods","authors":"Umesh Kashyap;Sudev Kumar Padhi;Sk. Subidh Ali","doi":"10.1109/TAI.2025.3563438","DOIUrl":"https://doi.org/10.1109/TAI.2025.3563438","url":null,"abstract":"Perceptual encryption (PE) methods are the key enablers for protecting image privacy for deep learning-based applications in the cloud. In PE, the image content is obfuscated such that the deep learning models can work on the obfuscated data. The key advantage of PE over holomorphic encryption is that, unlike holomorphic encryption, the feature required by the target deep learning model is preserved in the encrypted data. Therefore, the model is not required to be retrained on the encrypted data. Recently, a significant number of PE methods have been proposed in the literature, each improving over the others. In this article, we perform a detailed security analysis of three best-known PE methods, namely, adversarial visual information hiding, learnable encryption, and encryption-then-compression methods designed to protect the privacy of images. We proposed a new generative adversarial network (GAN)-based security evaluation framework to successfully reconstruct the original images encrypted by these methods, showing clear security flaws. We conducted extensive experiments using different datasets and deep learning models. The results show significant vulnerabilities in the existing key-based PE methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3287-3296"},"PeriodicalIF":0.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-21DOI: 10.1109/TAI.2025.3562839
Zhilin Zhu;Chucai Zhang;Jianhua Dai
Feature selection is an important data preprocessing process in artificial intelligence, which aims to eliminate redundant features while retaining essential features. Measuring feature significance and relevance between features is a significant challenge. Fuzzy information entropy is an extension of Shannon entropy. It is widely used for quantifying the information of fuzzy divisions. However, it has significant limitations, notably the lack of monotonicity in fuzzy conditional entropy measure of decision uncertainty in the feature selection process. We introduce a novel measurement macrogranular entropy (ME) and construct some generalized forms, such as conditional ME, mutual macrogranular information, and joint ME. The conditional ME exhibits monotonicity when measuring decision uncertainty. In addition, we propose two feature selection algorithms: one based on monotonic conditional ME (MCME), and the other based on the degree of symmetric association (ADSA). The ADSA algorithm and the MCME algorithm are compared against eight other feature selection algorithms through a series of experiments. The comparison was conducted based on classification performance using SVM and NB classifiers, and evaluation metrics including F1-score and recall. In terms of all four evaluation metrics, ADSA and MCME achieved the top two rankings, respectively. Specifically, on the NB and SVM classifiers, the ADSA algorithm improves the average accuracy by 12.22% and 2.88% compared to the original feature set, while MCME improves the accuracy by 10.07% and 1.01%, respectively. Experimental comparisons demonstrate that ADSA algorithm effectively removes redundant information from the dataset during feature selection.
{"title":"Fuzzy Information Quantity Measurement and Feature Selection by Macrogranular Entropy","authors":"Zhilin Zhu;Chucai Zhang;Jianhua Dai","doi":"10.1109/TAI.2025.3562839","DOIUrl":"https://doi.org/10.1109/TAI.2025.3562839","url":null,"abstract":"Feature selection is an important data preprocessing process in artificial intelligence, which aims to eliminate redundant features while retaining essential features. Measuring feature significance and relevance between features is a significant challenge. Fuzzy information entropy is an extension of Shannon entropy. It is widely used for quantifying the information of fuzzy divisions. However, it has significant limitations, notably the lack of monotonicity in fuzzy conditional entropy measure of decision uncertainty in the feature selection process. We introduce a novel measurement macrogranular entropy (ME) and construct some generalized forms, such as conditional ME, mutual macrogranular information, and joint ME. The conditional ME exhibits monotonicity when measuring decision uncertainty. In addition, we propose two feature selection algorithms: one based on monotonic conditional ME (MCME), and the other based on the degree of symmetric association (ADSA). The ADSA algorithm and the MCME algorithm are compared against eight other feature selection algorithms through a series of experiments. The comparison was conducted based on classification performance using SVM and NB classifiers, and evaluation metrics including F1-score and recall. In terms of all four evaluation metrics, ADSA and MCME achieved the top two rankings, respectively. Specifically, on the NB and SVM classifiers, the ADSA algorithm improves the average accuracy by 12.22% and 2.88% compared to the original feature set, while MCME improves the accuracy by 10.07% and 1.01%, respectively. Experimental comparisons demonstrate that ADSA algorithm effectively removes redundant information from the dataset during feature selection.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3258-3272"},"PeriodicalIF":0.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-21DOI: 10.1109/TAI.2025.3562505
Clement Fung;Chen Qiu;Aodong Li;Maja Rudolph
Anomaly detection is the task of identifying abnormal samples in large unlabeled datasets. Although the advent of foundation models has produced powerful zero-shot anomaly detection methods, their deployment in practice is often hindered by the absence of labeled validation data—without it, detection performance cannot be evaluated reliably. In this work, we propose selection with synthetic anomalies (SWSA): a general-purpose framework to select image-based anomaly detectors without labeled validation data. Instead of collecting labeled validation data, we generate synthetic anomalies from a small support set of normal images without using any training or fine-tuning. Our synthetic anomalies are then used to create detection tasks that compose a validation framework for model selection. In an empirical study, we evaluate SWSA with three types of synthetic anomalies and on two selection tasks: model selection of image-based anomaly detectors and prompt selection for CLIP-based anomaly detection. SWSA often selects models and prompts that match selections made with a ground-truth validation set, outperforming baseline selection strategies.
{"title":"Model Selection of Anomaly Detectors in the Absence of Labeled Validation Data","authors":"Clement Fung;Chen Qiu;Aodong Li;Maja Rudolph","doi":"10.1109/TAI.2025.3562505","DOIUrl":"https://doi.org/10.1109/TAI.2025.3562505","url":null,"abstract":"Anomaly detection is the task of identifying abnormal samples in large unlabeled datasets. Although the advent of foundation models has produced powerful zero-shot anomaly detection methods, their deployment in practice is often hindered by the absence of labeled validation data—without it, detection performance cannot be evaluated reliably. In this work, we propose selection with synthetic anomalies (SWSA): a general-purpose framework to select image-based anomaly detectors without labeled validation data. Instead of collecting labeled validation data, we generate synthetic anomalies from a small support set of normal images without using any training or fine-tuning. Our synthetic anomalies are then used to create detection tasks that compose a validation framework for model selection. In an empirical study, we evaluate SWSA with three types of synthetic anomalies and on two selection tasks: model selection of image-based anomaly detectors and prompt selection for CLIP-based anomaly detection. SWSA often selects models and prompts that match selections made with a ground-truth validation set, outperforming baseline selection strategies.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3248-3257"},"PeriodicalIF":0.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}