Pub Date : 2025-04-30DOI: 10.1109/TAI.2025.3565671
Rangan Das;Swadesh Jana;Anannyo Dey;Pascal Le Corre;Marc Cuggia;Ujjwal Maulik;Sanghamitra Bandyopadhyay
The development of new drugs is an expensive and time-consuming process, often hindered by the lack of reliable models to predict drug-target interactions (DTIs) and their mechanisms of action (MoA). Existing deep learning-based methods for DTI prediction typically focus only on binary classification of interactions, overlooking the complex mechanisms underlying these interactions. Moreover, the absence of comprehensive datasets for modeling MoA further complicates this task. To address these limitations, we introduce DrugMAP, a novel multimodal deep learning model that integrates graph neural networks and transformer-based architectures to predict both DTIs and their MoA. We construct a large-scale dataset from multiple public sources, adding a new level of complexity by including detailed MoA annotations for thousands of drug-target pairs. DrugMAP simultaneously leverages the molecular and atomic-level structures of drugs and target proteins, utilizing multirepresentational encoders for enhanced feature extraction. Experimental results show that DrugMAP outperforms state-of-the-art models for both DTI and MoA prediction across multiple benchmark datasets. Our model achieves a 3.5% improvement in AUC for MoA prediction, demonstrating its potential for guiding drug discovery and understanding adverse drug events.
{"title":"DrugMAP: Deep Multimodal Transformers for Drug-Target Mechanism of Action Prediction","authors":"Rangan Das;Swadesh Jana;Anannyo Dey;Pascal Le Corre;Marc Cuggia;Ujjwal Maulik;Sanghamitra Bandyopadhyay","doi":"10.1109/TAI.2025.3565671","DOIUrl":"https://doi.org/10.1109/TAI.2025.3565671","url":null,"abstract":"The development of new drugs is an expensive and time-consuming process, often hindered by the lack of reliable models to predict drug-target interactions (DTIs) and their mechanisms of action (MoA). Existing deep learning-based methods for DTI prediction typically focus only on binary classification of interactions, overlooking the complex mechanisms underlying these interactions. Moreover, the absence of comprehensive datasets for modeling MoA further complicates this task. To address these limitations, we introduce DrugMAP, a novel multimodal deep learning model that integrates graph neural networks and transformer-based architectures to predict both DTIs and their MoA. We construct a large-scale dataset from multiple public sources, adding a new level of complexity by including detailed MoA annotations for thousands of drug-target pairs. DrugMAP simultaneously leverages the molecular and atomic-level structures of drugs and target proteins, utilizing multirepresentational encoders for enhanced feature extraction. Experimental results show that DrugMAP outperforms state-of-the-art models for both DTI and MoA prediction across multiple benchmark datasets. Our model achieves a 3.5% improvement in AUC for MoA prediction, demonstrating its potential for guiding drug discovery and understanding adverse drug events.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"3087-3099"},"PeriodicalIF":0.0,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145455836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-29DOI: 10.1109/TAI.2025.3565483
Menglin Yang;Dong Xie;Guiting Zhang;Fulong Chen;Taochun Wang;Peng Hu
Compared with the cryptographic image encryption schemes, neural networks (NN) based image encryption schemes exhibit a significantly larger key space and offer enhanced capabilities for parallel processing of image data. However, most existing NN-based image encryption schemes suffer from high time complexity in generating random keys, and their decryption processes often fail to fully recover the plaintext images without loss. In this article, we first propose a normalizing flows based encryption network, called EncryptFlow, designed to achieve efficient and lossless image encryption. Normalizing flows employ a special coupling structure to couple the partitioned data, thereby establishing interdependence among them. Specifically, we utilize coupling structures (e.g., additive coupling) that allows the image blocks to alternately encrypt each other during forward propagation. Additionally, we devise a key generation algorithm that produces sub-keys tailored for each layer of the encryption network. The proposed EncryptFlow network seamlessly integrates both encryption and decryption functionalities, leveraging the XOR operation as the encryption function within each layer. The experimental results and comparative analyses indicate that EncryptFlow can encrypt $256times 256$ grayscale images with an average time of merely $0.047s$, and similarly, it requires only $0.188s$ to encrypt color images of the same dimensions.
{"title":"EncryptFlow: Efficient and Lossless Image Encryption Network Based on Normalizing Flows","authors":"Menglin Yang;Dong Xie;Guiting Zhang;Fulong Chen;Taochun Wang;Peng Hu","doi":"10.1109/TAI.2025.3565483","DOIUrl":"https://doi.org/10.1109/TAI.2025.3565483","url":null,"abstract":"Compared with the cryptographic image encryption schemes, neural networks (NN) based image encryption schemes exhibit a significantly larger key space and offer enhanced capabilities for parallel processing of image data. However, most existing NN-based image encryption schemes suffer from high time complexity in generating random keys, and their decryption processes often fail to fully recover the plaintext images without loss. In this article, we first propose a normalizing flows based encryption network, called <italic>EncryptFlow</i>, designed to achieve efficient and lossless image encryption. Normalizing flows employ a special coupling structure to couple the partitioned data, thereby establishing interdependence among them. Specifically, we utilize coupling structures (e.g., additive coupling) that allows the image blocks to alternately encrypt each other during forward propagation. Additionally, we devise a key generation algorithm that produces sub-keys tailored for each layer of the encryption network. The proposed EncryptFlow network seamlessly integrates both encryption and decryption functionalities, leveraging the XOR operation as the encryption function within each layer. The experimental results and comparative analyses indicate that EncryptFlow can encrypt <inline-formula> <tex-math>$256times 256$</tex-math></inline-formula> grayscale images with an average time of merely <inline-formula> <tex-math>$0.047s$</tex-math></inline-formula>, and similarly, it requires only <inline-formula> <tex-math>$0.188s$</tex-math></inline-formula> to encrypt color images of the same dimensions.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3377-3390"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-29DOI: 10.1109/TAI.2025.3565225
Mohd Aquib;Nishchal K. Verma;M. Jaleel Akhtar
Facial expression recognition (FER) is a complex task, hindered by subtle distinctions between expression classes, significant variability within each class, and external influences such as identity, pose, age, and ethnicity. As a result, achieving pure expression encodings that are resilient to exogenous factors proves elusive, thereby compromising the downstream classification tasks. This study presents a novel intelligent FER scheme that mitigates the impact of external confounders by integrating disentangled representation learning with fuzzy logic. Building on Adaptive $beta$-variational autoencoder (VAE) [1] as a backbone, we develop a semisupervised guided adaptive $beta$ variational autoencoder (GA-$beta$-VAE) capable of isolating expression features from exogenous factors. Specifically, the adaptive $beta$-VAE is augmented with two additional branches: a deformable PCA-based secondary decoder that disentangles expression-irrelevant transformations from the core expression content, and an adversarial excitation–inhibition branch that forces the “target” (expression) latent variables to be informative only of expressions. This yields well separated, expression-centric embeddings that are subsequently processed by an interval type-2 (IT2) fuzzy classification unit to predict the corresponding expression classes. By avoiding reliance on paired data or explicit annotations, this approach offers a scalable and flexible solution for FER. Experimental evaluations on benchmark datasets [extended Cohn–Kanade (CK+), facial expression recognition plus (FER+), and real-world affective faces database (RAF-DB)] demonstrate the framework’s effectiveness in addressing the challenges posed by exogenous factors, achieving superior accuracy and interpretability compared to state-of-the-art methods.
{"title":"Enhancing Facial Expression Recognition With AI Agents: A Semisupervised Guided Adaptive $beta$-VAE Coupled With Interval Type-2 Fuzzy Classifier","authors":"Mohd Aquib;Nishchal K. Verma;M. Jaleel Akhtar","doi":"10.1109/TAI.2025.3565225","DOIUrl":"https://doi.org/10.1109/TAI.2025.3565225","url":null,"abstract":"Facial expression recognition (FER) is a complex task, hindered by subtle distinctions between expression classes, significant variability within each class, and external influences such as identity, pose, age, and ethnicity. As a result, achieving pure expression encodings that are resilient to exogenous factors proves elusive, thereby compromising the downstream classification tasks. This study presents a novel intelligent FER scheme that mitigates the impact of external confounders by integrating disentangled representation learning with fuzzy logic. Building on Adaptive <inline-formula><tex-math>$beta$</tex-math></inline-formula>-variational autoencoder (VAE) <xref>[1]</xref> as a backbone, we develop a semisupervised guided adaptive <inline-formula><tex-math>$beta$</tex-math></inline-formula> variational autoencoder (GA-<inline-formula><tex-math>$beta$</tex-math></inline-formula>-VAE) capable of isolating expression features from exogenous factors. Specifically, the adaptive <inline-formula><tex-math>$beta$</tex-math></inline-formula>-VAE is augmented with two additional branches: a deformable PCA-based secondary decoder that disentangles expression-irrelevant transformations from the core expression content, and an adversarial excitation–inhibition branch that forces the “target” (expression) latent variables to be informative only of expressions. This yields well separated, expression-centric embeddings that are subsequently processed by an interval type-2 (IT2) fuzzy classification unit to predict the corresponding expression classes. By avoiding reliance on paired data or explicit annotations, this approach offers a scalable and flexible solution for FER. Experimental evaluations on benchmark datasets [extended Cohn–Kanade (CK+), facial expression recognition plus (FER+), and real-world affective faces database (RAF-DB)] demonstrate the framework’s effectiveness in addressing the challenges posed by exogenous factors, achieving superior accuracy and interpretability compared to state-of-the-art methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"3070-3086"},"PeriodicalIF":0.0,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145428953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this article, we introduce a novel and synergistic approach that combines attention mechanisms, low-visibility enhancement network (LVENet) for image visibility enhancement, and a tailored head pruning method for multihead self attention (MHSA) models, specifically engineered for attention augmented convolutional network (AACN) and bottleneck transformers (BoTNets). The integration of these techniques aims to comprehensively address the challenges associated with object detection in the maritime domain. The attention mechanism selectively emphasizes critical areas of the image, LVENet enhances visibility under challenging conditions, and the head pruning method optimizes model efficiency and simplicity. Employing meticulous selection and evaluation, our approach achieves precise head pruning without compromising detection performance. Validation using common and maritime datasets underscores the effectiveness of our approach. The results showcase a substantial reduction in epoch time by over 30%, while enhancing accuracy, improving computational efficiency, and streamlining model complexity. This innovation facilitates deployment in challenging maritime scenarios.
{"title":"Adaptive Head Pruning for Attention Mechanism in the Maritime Domain","authors":"Walid Messaoud;Rim Trabelsi;Adnane Cabani;Fatma Abdelkefi","doi":"10.1109/TAI.2025.3558724","DOIUrl":"https://doi.org/10.1109/TAI.2025.3558724","url":null,"abstract":"In this article, we introduce a novel and synergistic approach that combines attention mechanisms, low-visibility enhancement network (LVENet) for image visibility enhancement, and a tailored head pruning method for multihead self attention (MHSA) models, specifically engineered for attention augmented convolutional network (AACN) and bottleneck transformers (BoTNets). The integration of these techniques aims to comprehensively address the challenges associated with object detection in the maritime domain. The attention mechanism selectively emphasizes critical areas of the image, LVENet enhances visibility under challenging conditions, and the head pruning method optimizes model efficiency and simplicity. Employing meticulous selection and evaluation, our approach achieves precise head pruning without compromising detection performance. Validation using common and maritime datasets underscores the effectiveness of our approach. The results showcase a substantial reduction in epoch time by over 30%, while enhancing accuracy, improving computational efficiency, and streamlining model complexity. This innovation facilitates deployment in challenging maritime scenarios.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"2966-2976"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145428950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-28DOI: 10.1109/TAI.2025.3564900
Jawhar Ghommam;Maarouf Saad;Mohammad H. Rahman;Quanmin Zhu
In this article, we develop a virtual vehicle scheme to solve the coordination control problem under denial-of-service (DoS) attacks for heterogeneous vehicles. This system includes an unmanned surface vessel (USV) in distress, sharing kinematic data, and a helicopter receiving data from the latter through wireless communication. Specifically, we carefully develop an estimator to model the unmeasurable states of the USV in the presence of DoS attacks. The virtual vehicle concept is then utilized to generate a velocity reference output for the helicopter to follow. To achieve preset tracking performances, the cascade structure of the helicopter is exploited, where the backstepping control strategy is used via a barrier Lyapunov function. To handle input constraints, auxiliary systems are built to bridge the association between input saturation errors and performance constraints. Furthermore, to mitigate the saturation effect of bounded inputs and model uncertainties in the attitude dynamics, a fixed-time reinforcement learning (FT-RL) control algorithm is designed according to actor–critic strategy. Stability analysis is thoroughly studied with the help of Lyapunov stability where sufficient conditions for the whole closed-loop system have been obtained. Numerical simulations have been shown to validate the proposed coordination strategy.
{"title":"Prescribed Performance Resilient Motion Coordination With Actor–Critic Reinforcement Learning Design for UAV-USV Systems","authors":"Jawhar Ghommam;Maarouf Saad;Mohammad H. Rahman;Quanmin Zhu","doi":"10.1109/TAI.2025.3564900","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564900","url":null,"abstract":"In this article, we develop a virtual vehicle scheme to solve the coordination control problem under denial-of-service (DoS) attacks for heterogeneous vehicles. This system includes an unmanned surface vessel (USV) in distress, sharing kinematic data, and a helicopter receiving data from the latter through wireless communication. Specifically, we carefully develop an estimator to model the unmeasurable states of the USV in the presence of DoS attacks. The virtual vehicle concept is then utilized to generate a velocity reference output for the helicopter to follow. To achieve preset tracking performances, the cascade structure of the helicopter is exploited, where the backstepping control strategy is used via a barrier Lyapunov function. To handle input constraints, auxiliary systems are built to bridge the association between input saturation errors and performance constraints. Furthermore, to mitigate the saturation effect of bounded inputs and model uncertainties in the attitude dynamics, a fixed-time reinforcement learning (FT-RL) control algorithm is designed according to actor–critic strategy. Stability analysis is thoroughly studied with the help of Lyapunov stability where sufficient conditions for the whole closed-loop system have been obtained. Numerical simulations have been shown to validate the proposed coordination strategy.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3336-3350"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-28DOI: 10.1109/TAI.2025.3564903
Emmanuel Pintelas;Ioannis E. Livieris;Panagiotis E. Pintelas
In the dynamic domain of synthetic media, deepfakes challenge the trust in digital communication. The identification of manipulated content is essential to ensure the authenticity of shared information. Recent advances in deepfake detection have focused on developing sophisticated convolutional neural network (CNN)-based approaches. However, these approaches remain anchored within the continuous feature space, potentially missing manipulative signatures that might be more salient in a discrete domain. For this task, we propose a new strategy that combines insights from both continuous and discrete spaces for enhanced deepfake detection. Our hypothesis is that deepfakes may lie closer to a discrete space, potentially revealing hidden patterns that are not evident in continuous representations. In addition, we propose a new gradual-unfreezing technique, employed in the proposed framework to slowly adapt the network parameters to align with the new combined representation. Via comprehensive experimentation, the efficiency of the proposed approach is highlighted, in comparison with state-of-the-art (SoA) deepfake detection strategies.
{"title":"Quantization-Based 3D-CNNs Through Circular Gradual Unfreezing for DeepFake Detection","authors":"Emmanuel Pintelas;Ioannis E. Livieris;Panagiotis E. Pintelas","doi":"10.1109/TAI.2025.3564903","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564903","url":null,"abstract":"In the dynamic domain of synthetic media, deepfakes challenge the trust in digital communication. The identification of manipulated content is essential to ensure the authenticity of shared information. Recent advances in deepfake detection have focused on developing sophisticated convolutional neural network (CNN)-based approaches. However, these approaches remain anchored within the continuous feature space, potentially missing manipulative signatures that might be more salient in a discrete domain. For this task, we propose a new strategy that combines insights from both continuous and discrete spaces for enhanced deepfake detection. Our hypothesis is that deepfakes may lie closer to a discrete space, potentially revealing hidden patterns that are not evident in continuous representations. In addition, we propose a new gradual-unfreezing technique, employed in the proposed framework to slowly adapt the network parameters to align with the new combined representation. Via comprehensive experimentation, the efficiency of the proposed approach is highlighted, in comparison with state-of-the-art (SoA) deepfake detection strategies.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3351-3363"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-28DOI: 10.1109/TAI.2025.3564911
Dongfang Ma;Zhaoyang Ma;Chengying Wu;Jianmin Lin
Tropical cyclones (TCs) are destructive weather systems, and the accurate prediction of the trajectory of TCs is crucial. Previous studies have focused mainly on trajectory prediction for individual TCs, which cannot take into account the interaction between different TCs, affecting the prediction performance. To address this problem, this study proposed an innovative method for multi-TC trajectory prediction based on a density map. Instead of predicting the location of a TC directly, the article first predicts the density map of a sea area, and then obtain TC centers from the predicted density maps. In the first step, a relation extraction module (REM) is proposed in order to analyze the interaction between multiple TCs. Further, a 3-D cloud feature extraction module was designed to enhance the ability to use 3-D cloud structural information on TCs via feature extraction and the fusion of density maps, satellite images, and environmental data. In addition, a long short-term memory (LSTM) fusion module was designed to adaptively select important historical information, which improves the ability to extract long-term spatiotemporal dependencies. In the second step, those density map pixels with extreme values are identified as TC centers. The proposed method was verified by experiments using Gridsat, IBTrACS, and ERA5 datasets. The results show that the mean distance error of TC trajectory prediction is reduced by 10.0%, 10.7%, 10.5%, and 11.7% for overall performance, and 21.5%, 18.0%, 19.1%, and 19.8% for multi-TC scenario in the 6-, 12-, 18-, and 24-h predictions compared with state-of-the-art prediction models.
{"title":"A Multitropical Cyclone Trajectory Prediction Method Based on Density Maps With Memory and Data Fusion","authors":"Dongfang Ma;Zhaoyang Ma;Chengying Wu;Jianmin Lin","doi":"10.1109/TAI.2025.3564911","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564911","url":null,"abstract":"Tropical cyclones (TCs) are destructive weather systems, and the accurate prediction of the trajectory of TCs is crucial. Previous studies have focused mainly on trajectory prediction for individual TCs, which cannot take into account the interaction between different TCs, affecting the prediction performance. To address this problem, this study proposed an innovative method for multi-TC trajectory prediction based on a density map. Instead of predicting the location of a TC directly, the article first predicts the density map of a sea area, and then obtain TC centers from the predicted density maps. In the first step, a relation extraction module (REM) is proposed in order to analyze the interaction between multiple TCs. Further, a 3-D cloud feature extraction module was designed to enhance the ability to use 3-D cloud structural information on TCs via feature extraction and the fusion of density maps, satellite images, and environmental data. In addition, a long short-term memory (LSTM) fusion module was designed to adaptively select important historical information, which improves the ability to extract long-term spatiotemporal dependencies. In the second step, those density map pixels with extreme values are identified as TC centers. The proposed method was verified by experiments using Gridsat, IBTrACS, and ERA5 datasets. The results show that the mean distance error of TC trajectory prediction is reduced by 10.0%, 10.7%, 10.5%, and 11.7% for overall performance, and 21.5%, 18.0%, 19.1%, and 19.8% for multi-TC scenario in the 6-, 12-, 18-, and 24-h predictions compared with state-of-the-art prediction models.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3364-3376"},"PeriodicalIF":0.0,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-25DOI: 10.1109/TAI.2025.3564603
Neenu Sharma;Deepak Joshi
The advancements in telehealth monitoring technology have enabled the collection of vast quantities of electro-physiological signals, including the electrocardiogram (ECG) which contains critical diagnostic information about cardiac diseases. There are two main key challenges in the automatic classification of cardiac rhythms. First, addressing the specific characteristics of irregular heartbeats is critical for accurate classification. Second, the low frequency of ECG signals combined with noise interference makes it particularly difficult to efficiently detect abnormal electrical activity in the heart. To solve this issue, this article proposes an ensemble deep-learning model, ECG_DEEPNet architecture to enhance the delineation of ECG signals with improved accuracy for better diagnosis in telemedicine monitoring systems. The presented technique consists of a feature extraction stage using a convolutional neural network (CNN) and a sequence processing stage using a combination of gated recurrent units (GRU) and bidirectional long short-term memory (BiLSTM) networks. The proposed method is divided into four parts: first, the signal preprocessing, second waveform segmentation, third classification of ECG signals and lastly results are evaluated on the proposed model. The proposed technique was tested and trained using standard Lobachevsky University Electrocardiography Database (LUDB) and QT database (QTDB) containing annotation of a waveform for accurate classification of ECG wave components. The presented technique shows the average accuracy of 99.82%, 98.50%, and 97.42% for QRS, P, and T on the QTDB database, and 99.96%, 98.82%, and 99.47% on LUDB dataset, respectively, for classification and delineation of ECG signals. The proposed technique achieves better performance compared to state-of-the-art methods, which results in a better diagnosis of heart-related problems.
{"title":"ECG_DEEPNet: A Novel Approach for Delineation and Classification of Electrocardiogram Signal Based on Ensemble Deep-Learning","authors":"Neenu Sharma;Deepak Joshi","doi":"10.1109/TAI.2025.3564603","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564603","url":null,"abstract":"The advancements in telehealth monitoring technology have enabled the collection of vast quantities of electro-physiological signals, including the electrocardiogram (ECG) which contains critical diagnostic information about cardiac diseases. There are two main key challenges in the automatic classification of cardiac rhythms. First, addressing the specific characteristics of irregular heartbeats is critical for accurate classification. Second, the low frequency of ECG signals combined with noise interference makes it particularly difficult to efficiently detect abnormal electrical activity in the heart. To solve this issue, this article proposes an ensemble deep-learning model, <monospace>ECG_DEEPNet</monospace> architecture to enhance the delineation of ECG signals with improved accuracy for better diagnosis in telemedicine monitoring systems. The presented technique consists of a feature extraction stage using a convolutional neural network (CNN) and a sequence processing stage using a combination of gated recurrent units (GRU) and bidirectional long short-term memory (BiLSTM) networks. The proposed method is divided into four parts: first, the signal preprocessing, second waveform segmentation, third classification of ECG signals and lastly results are evaluated on the proposed model. The proposed technique was tested and trained using standard Lobachevsky University Electrocardiography Database (LUDB) and QT database (QTDB) containing annotation of a waveform for accurate classification of ECG wave components. The presented technique shows the average accuracy of 99.82%, 98.50%, and 97.42% for QRS, P, and T on the QTDB database, and 99.96%, 98.82%, and 99.47% on LUDB dataset, respectively, for classification and delineation of ECG signals. The proposed technique achieves better performance compared to state-of-the-art methods, which results in a better diagnosis of heart-related problems.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3321-3335"},"PeriodicalIF":0.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-25DOI: 10.1109/TAI.2025.3564605
Sardar Jaf;Basel Barakat
Despite the extensive communication benefits offered by social media platforms, numerous challenges must be addressed to ensure user safety. One of the most significant risks faced by users on these platforms is targeted hatespeech. Social media platforms are widely utilized for generating datasets employed in training and evaluating machine learning algorithms for hatespeech detection. However, existing public datasets exhibit numerous limitations, hindering the effective training of these algorithms and leading to inaccurate hatespeech classification. This study provides a systematic empirical evaluation of several public datasets commonly used in automated hatespeech classification. Through rigorous analysis, we present compelling evidence highlighting the limitations of current hatespeech datasets. Additionally, we conduct a range of statistical analyses to elucidate the strengths and weaknesses inherent in these datasets. This work aims to advance the development of more accurate and reliable machine learning models for hatespeech detection by addressing the dataset limitations identified.
{"title":"Empirical Evaluation of Public HateSpeech Datasets","authors":"Sardar Jaf;Basel Barakat","doi":"10.1109/TAI.2025.3564605","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564605","url":null,"abstract":"Despite the extensive communication benefits offered by social media platforms, numerous challenges must be addressed to ensure user safety. One of the most significant risks faced by users on these platforms is targeted hatespeech. Social media platforms are widely utilized for generating datasets employed in training and evaluating machine learning algorithms for hatespeech detection. However, existing public datasets exhibit numerous limitations, hindering the effective training of these algorithms and leading to inaccurate hatespeech classification. This study provides a systematic empirical evaluation of several public datasets commonly used in automated hatespeech classification. Through rigorous analysis, we present compelling evidence highlighting the limitations of current hatespeech datasets. Additionally, we conduct a range of statistical analyses to elucidate the strengths and weaknesses inherent in these datasets. This work aims to advance the development of more accurate and reliable machine learning models for hatespeech detection by addressing the dataset limitations identified.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 11","pages":"3056-3069"},"PeriodicalIF":0.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145428949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-25DOI: 10.1109/TAI.2025.3564243
Xiao Yang;Zhan-Li Sun;Mengya Liu;Zhigang Zeng;Kin-Man Lam;Xin Wang
Due to the significant differences between the structural and sequence information of RNA, accurately predicting RNA-small molecule binding sites by utilizing these two attributes remains a challenging task. This study introduces a novel network for predicting RNA-small molecule binding sites, employing a two-stage approach that integrates feature extraction and fusion processes. On one hand, in order to capture the diverse characteristic information of RNA, a dual-path feature extraction module is proposed to extract features from both short-range and long-range perspectives, by incorporating convolutional and attention networks. On the other hand, a one-dimensional multiscale feature fusion module, consisting of parallel one-dimensional convolutional kernels, is proposed to extract feature information at multiple granularities and to effectively integrate the features of nucleotides on the RNA chain and their neighboring nucleotides. Experimental results demonstrate that RNA-small molecule binding sites prediction by dual-path feature extraction and one-dimensional multiscale feature fusion network (RSMBSP-DON) is competitive with some recently reported methods.
{"title":"RSMBSP-DON: RNA-Small Molecule Binding Sites Prediction by Dual-Path Feature Extraction and One-Dimensional Multiscale Feature Fusion Network","authors":"Xiao Yang;Zhan-Li Sun;Mengya Liu;Zhigang Zeng;Kin-Man Lam;Xin Wang","doi":"10.1109/TAI.2025.3564243","DOIUrl":"https://doi.org/10.1109/TAI.2025.3564243","url":null,"abstract":"Due to the significant differences between the structural and sequence information of RNA, accurately predicting RNA-small molecule binding sites by utilizing these two attributes remains a challenging task. This study introduces a novel network for predicting RNA-small molecule binding sites, employing a two-stage approach that integrates feature extraction and fusion processes. On one hand, in order to capture the diverse characteristic information of RNA, a dual-path feature extraction module is proposed to extract features from both short-range and long-range perspectives, by incorporating convolutional and attention networks. On the other hand, a one-dimensional multiscale feature fusion module, consisting of parallel one-dimensional convolutional kernels, is proposed to extract feature information at multiple granularities and to effectively integrate the features of nucleotides on the RNA chain and their neighboring nucleotides. Experimental results demonstrate that <bold>RNA-small molecule binding sites prediction by dual-path feature extraction and one-dimensional multiscale feature fusion network (RSMBSP-DON)</b> is competitive with some recently reported methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 12","pages":"3312-3320"},"PeriodicalIF":0.0,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}