Anting Zhu, Xingxing Jia, Longfei Yang, Huiyu Zhou, Wei Su
Facial expression recognition (FER) remains a challenging task in computer vision. Recent works have shown excellent performance in overall recognition accuracy, but its accuracy significantly decreases when recognizing similar expressions. This is due to interclass homogeneity and intraclass heterogeneity. To address these issues, we propose a novel dual-stage network called DUAL, inspired by contrastive learning. First, we increase the distance between negative samples while reducing the distance between positive ones. This is achieved by dynamically updating pairs of comparison samples. Second, we introduce a two-stage network architecture. The first stage uses two branches to extract image features and facial keypoint features. These branches interact to learn coarse-grained features through mutual guidance. The second stage focuses on fine-grained features using scale-specific residual blocks. This allows the model to identify facial regions that are critical for recognizing expressions. We conducted extensive experiments on multiple datasets. The results show that DUAL surpasses state-of-the-art models in items of performance. Additionally, the model shows high accuracy even in noisy conditions, highlighting its robustness.
{"title":"DUAL: A Dual-Stage Approach for Facial Expression Recognition Based on Contrastive Learning","authors":"Anting Zhu, Xingxing Jia, Longfei Yang, Huiyu Zhou, Wei Su","doi":"10.1155/int/7401168","DOIUrl":"https://doi.org/10.1155/int/7401168","url":null,"abstract":"<p>Facial expression recognition (FER) remains a challenging task in computer vision. Recent works have shown excellent performance in overall recognition accuracy, but its accuracy significantly decreases when recognizing similar expressions. This is due to interclass homogeneity and intraclass heterogeneity. To address these issues, we propose a novel dual-stage network called DUAL, inspired by contrastive learning. First, we increase the distance between negative samples while reducing the distance between positive ones. This is achieved by dynamically updating pairs of comparison samples. Second, we introduce a two-stage network architecture. The first stage uses two branches to extract image features and facial keypoint features. These branches interact to learn coarse-grained features through mutual guidance. The second stage focuses on fine-grained features using scale-specific residual blocks. This allows the model to identify facial regions that are critical for recognizing expressions. We conducted extensive experiments on multiple datasets. The results show that DUAL surpasses state-of-the-art models in items of performance. Additionally, the model shows high accuracy even in noisy conditions, highlighting its robustness.</p>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/7401168","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145625981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marilyn Bello, Rosalís Amador, María-Matilde García, Rafael Bello, Óscar Cordón, Francisco Herrera
Artificial intelligence (AI) systems are increasingly adopted in high-stakes domains such as healthcare and finance, so the demand for transparency and interpretability has grown substantially. EXplainable AI (XAI) methods have emerged to address this challenge, but individual techniques often offer limited, fragmented insights. This paper introduces Meta-explainers, a novel ensemble-based XAI framework that integrates multiple explanation types—specifically relevance-based and counterfactual methods—into unified, multifaceted and complementary meta-explanations. Inspired by meta-classification principles, our approach structures the explanation process into five stages: generation, grouping, evaluation, aggregation, and visualization. Each stage is designed to preserve the unique strengths of individual XAI techniques while enhancing their interpretability and coherence when combined. Experimental results on both image (MNIST) and tabular (Breast Cancer) datasets show that Meta-explainers consistently outperform individual and state-of-the-art ensemble explanation methods in terms of explanation quality, as measured by established metrics. This work paves the way toward more holistic and user-centered AI explainability with a flexible methodology that can be extended to incorporate additional explanation paradigms.
{"title":"Meta-Explainers: A Unified Ensemble Approach for Multifaceted XAI","authors":"Marilyn Bello, Rosalís Amador, María-Matilde García, Rafael Bello, Óscar Cordón, Francisco Herrera","doi":"10.1155/int/4841666","DOIUrl":"https://doi.org/10.1155/int/4841666","url":null,"abstract":"<p>Artificial intelligence (AI) systems are increasingly adopted in high-stakes domains such as healthcare and finance, so the demand for transparency and interpretability has grown substantially. EXplainable AI (XAI) methods have emerged to address this challenge, but individual techniques often offer limited, fragmented insights. This paper introduces Meta-explainers, a novel ensemble-based XAI framework that integrates multiple explanation types—specifically relevance-based and counterfactual methods—into unified, multifaceted and complementary meta-explanations. Inspired by meta-classification principles, our approach structures the explanation process into five stages: generation, grouping, evaluation, aggregation, and visualization. Each stage is designed to preserve the unique strengths of individual XAI techniques while enhancing their interpretability and coherence when combined. Experimental results on both image (MNIST) and tabular (Breast Cancer) datasets show that Meta-explainers consistently outperform individual and state-of-the-art ensemble explanation methods in terms of explanation quality, as measured by established metrics. This work paves the way toward more holistic and user-centered AI explainability with a flexible methodology that can be extended to incorporate additional explanation paradigms.</p>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/4841666","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145626361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper introduces a cost-effective prompt optimization strategy for ancient Chinese word segmentation using large language models, aiming to mitigate the substantial computational resources and training expenses of fine-tuning. We developed two knowledge-enhanced frameworks, a General Knowledge Prompt framework and a Domain-Specific Knowledge Prompt framework, and evaluated their effectiveness across various ancient Chinese corpora using seven mainstream LLMs, including ERNIE Bot, Qwen, SparkDesk, DeepSeek, ChatGPT, Gemini, and Copilot. Our findings confirm that both prompt frameworks enhance the segmentation capability of LLMs to varying extents, with the Domain-Specific Knowledge Prompt framework yielding the most significant improvements. Notably, the DeepSeek model achieves 94.01% F1 score (94.24% precision, 93.79% recall) on the test set, while the Qwen model demonstrates a remarkable 15.73% increase in the F1 score with the Domain-Specific Knowledge Prompt framework. Our ablation studies indicate that the entries Rules and Examples are the most crucial to the success of prompt frameworks, effectively addressing the challenges of rule inconsistency and insufficient annotated data.
{"title":"Improving Ancient Chinese Word Segmentation With Knowledge-Enhanced Prompting for Large Language Models","authors":"Meng-Tian Tang, Cheng-Gang Mi","doi":"10.1155/int/9612240","DOIUrl":"https://doi.org/10.1155/int/9612240","url":null,"abstract":"<p>This paper introduces a cost-effective prompt optimization strategy for ancient Chinese word segmentation using large language models, aiming to mitigate the substantial computational resources and training expenses of fine-tuning. We developed two knowledge-enhanced frameworks, a General Knowledge Prompt framework and a Domain-Specific Knowledge Prompt framework, and evaluated their effectiveness across various ancient Chinese corpora using seven mainstream LLMs, including ERNIE Bot, Qwen, SparkDesk, DeepSeek, ChatGPT, Gemini, and Copilot. Our findings confirm that both prompt frameworks enhance the segmentation capability of LLMs to varying extents, with the Domain-Specific Knowledge Prompt framework yielding the most significant improvements. Notably, the DeepSeek model achieves 94.01% <i>F</i>1 score (94.24% precision, 93.79% recall) on the test set, while the Qwen model demonstrates a remarkable 15.73% increase in the <i>F</i>1 score with the Domain-Specific Knowledge Prompt framework. Our ablation studies indicate that the entries Rules and Examples are the most crucial to the success of prompt frameworks, effectively addressing the challenges of rule inconsistency and insufficient annotated data.</p>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/9612240","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145626360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the rise of neural networks and pre-trained models such as BERT, abstractive text summarization techniques have received widespread attention. Nevertheless, traditional extractive text summarization methods still hold substantial research value due to their low computational cost, interpretability, and robustness. In algorithms like TextRank and its variants, graph nodes are typically constructed based on surface-level lexical features. These graphs often fail to incorporate many contextual relationships, such as coreference relationships among nodes, resulting in fragmented representations of key concepts. For edge construction, a sliding window of size T is commonly used to connect word nodes within the window. However, these methods often fall short in modeling the rich contextual dependencies embedded in the document. Several recent studies have demonstrated that semantic graphs can effectively improve the accuracy of text summarization. In this paper, we construct a more interpretable semantic graph from syntax trees and propose a novel unsupervised algorithm based on the personalized PageRank algorithm for summary extraction. We utilize tree transformation methods to enrich word-level information for graph construction, define node-merging rules to reduce graph complexity, use coreference chains to merge coreferring entities across sentences for enriching contextual links, and introduce the concept of Meta Node sets to capture thematic relationships that are not fully represented by syntactic dependencies or coreference chains alone. By clustering semantically related words, Meta Nodes enhance the graph’s ability to reflect deeper contextual coherence across the document. Compared with previous TextRank-based methods, our improvement yields significant ROUGE score boosts on the CNN-DM dataset. While the method was developed and evaluated using English-language datasets, its underlying design is language agnostic and can be adapted to other languages with suitable linguistic tools.
{"title":"A Method of Extractive Text Summarization Using Document Semantic Graph With Node Ranking","authors":"Zhenhao Li, Miao Liu, Wenbin Chen, Ligang Zheng","doi":"10.1155/int/5530784","DOIUrl":"https://doi.org/10.1155/int/5530784","url":null,"abstract":"<p>With the rise of neural networks and pre-trained models such as BERT, abstractive text summarization techniques have received widespread attention. Nevertheless, traditional extractive text summarization methods still hold substantial research value due to their low computational cost, interpretability, and robustness. In algorithms like TextRank and its variants, graph nodes are typically constructed based on surface-level lexical features. These graphs often fail to incorporate many contextual relationships, such as coreference relationships among nodes, resulting in fragmented representations of key concepts. For edge construction, a sliding window of size T is commonly used to connect word nodes within the window. However, these methods often fall short in modeling the rich contextual dependencies embedded in the document. Several recent studies have demonstrated that semantic graphs can effectively improve the accuracy of text summarization. In this paper, we construct a more interpretable semantic graph from syntax trees and propose a novel unsupervised algorithm based on the personalized PageRank algorithm for summary extraction. We utilize tree transformation methods to enrich word-level information for graph construction, define node-merging rules to reduce graph complexity, use coreference chains to merge coreferring entities across sentences for enriching contextual links, and introduce the concept of Meta Node sets to capture thematic relationships that are not fully represented by syntactic dependencies or coreference chains alone. By clustering semantically related words, Meta Nodes enhance the graph’s ability to reflect deeper contextual coherence across the document. Compared with previous TextRank-based methods, our improvement yields significant ROUGE score boosts on the CNN-DM dataset. While the method was developed and evaluated using English-language datasets, its underlying design is language agnostic and can be adapted to other languages with suitable linguistic tools.</p>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/5530784","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145572552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jun Wen, Long Liu, Xiaoli Li, Xiusheng Li, Hang Mao
Brain tumors account for approximately 2.5% of cancer-related deaths. Accurate classification of brain tumor types is essential for timely diagnosis and enhancing survival rates. Convolutional neural networks (CNNs) have demonstrated state-of-the-art performance in computer-aided diagnosis of brain tumors; however, the quality and availability of medical data significantly influence this process. Medical data must adhere to stringent privacy regulations, such as the General Data Protection Regulation (GDPR) in the European Union and the Health Insurance Portability and Accountability Act (HIPAA) in the United States. Federated learning (FL) enables the sharing of only model update parameters during collaborative training on locally stored data. However, these parameters may inadvertently enable reconstruction of the original data. Furthermore, medical data often exhibit nonindependent and nonidentically distributed (non-IID) characteristics, impeding model training performance. To address these challenges, this paper proposes a scheme that partitions confidential data into multiple segments during FL training, ensuring that only a subset exceeding a predefined threshold can reconstruct the data. The proposed scheme guarantees enhanced security, distributed control, and fault tolerance. In addition, this paper introduces a Conditional Mutual Information (CMI) regularizer to mitigate variability in model predictions. By minimizing the Kullback–Leibler (KL) divergence between local and global feature distributions, the CMI regularizer substantially enhances performance and convergence stability. Extensive experiments conducted on the Figshare dataset with varying α-values for data distributions validate the efficacy of the proposed model. Compared to FedAvg, FedProx, and FedDyn at α = 0.3, as well as the central model, the proposed model achieves a top-1 accuracy of 92.94% on the Figshare dataset, surpassing FedProx, FedAvg, and FedDyn by 2.42%, 2.82%, and 3.53%, respectively. Federated IID achieves performance comparable to that of the central model, further demonstrating its viability for practical applications.
{"title":"Pragmatic Brain Tumor Imaging Classification Using Federated Learning","authors":"Jun Wen, Long Liu, Xiaoli Li, Xiusheng Li, Hang Mao","doi":"10.1155/int/8817677","DOIUrl":"https://doi.org/10.1155/int/8817677","url":null,"abstract":"<p>Brain tumors account for approximately 2.5% of cancer-related deaths. Accurate classification of brain tumor types is essential for timely diagnosis and enhancing survival rates. Convolutional neural networks (CNNs) have demonstrated state-of-the-art performance in computer-aided diagnosis of brain tumors; however, the quality and availability of medical data significantly influence this process. Medical data must adhere to stringent privacy regulations, such as the General Data Protection Regulation (GDPR) in the European Union and the Health Insurance Portability and Accountability Act (HIPAA) in the United States. Federated learning (FL) enables the sharing of only model update parameters during collaborative training on locally stored data. However, these parameters may inadvertently enable reconstruction of the original data. Furthermore, medical data often exhibit nonindependent and nonidentically distributed (non-IID) characteristics, impeding model training performance. To address these challenges, this paper proposes a scheme that partitions confidential data into multiple segments during FL training, ensuring that only a subset exceeding a predefined threshold can reconstruct the data. The proposed scheme guarantees enhanced security, distributed control, and fault tolerance. In addition, this paper introduces a Conditional Mutual Information (CMI) regularizer to mitigate variability in model predictions. By minimizing the Kullback–Leibler (KL) divergence between local and global feature distributions, the CMI regularizer substantially enhances performance and convergence stability. Extensive experiments conducted on the Figshare dataset with varying α-values for data distributions validate the efficacy of the proposed model. Compared to FedAvg, FedProx, and FedDyn at <i>α</i> = 0.3, as well as the central model, the proposed model achieves a top-1 accuracy of 92.94% on the Figshare dataset, surpassing FedProx, FedAvg, and FedDyn by 2.42%, 2.82%, and 3.53%, respectively. Federated IID achieves performance comparable to that of the central model, further demonstrating its viability for practical applications.</p>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/8817677","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145572353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
De Li, Zhewei Zhang, Xuanyou Li, Xun Jin, Yanwei Wang
Because of the copyright issues of hyperspectral images continue to rise, in this paper, we propose to use a neural network–based watermarking model to protect the copyright. By applying normalization-based attention module (NAM) to deep dispersed watermarking with synchronization and fusion (DWSF), a NDWSF model is proposed for robust hyperspectral image watermarking. It consists of encoding, decoding, discrimination, and attack modules. The encoding and decoding modules are used for embedding and extracting watermarks. Discrimination module is proposed for improving the quality of watermarked image. The discrimination module and the encoding module are in an adversarial relationship to motivate the encoder to generate watermarks with stronger invisibility. Attack module is employed between embedding and extraction to improve robustness against compression and noise and geometric attacks. In order to more effectively utilize image features for watermarking, a kind of hybrid attention mechanism is employed in embedding and extraction by adding NAM. Experimental results show that the loss convergence and stability in training is improved. The peak signal-to-noise ratio of the proposed method is 48.08 dB, higher than other methods about 2.5 dB. The bit error rate of the proposed method is less than 2.5% for various hybrid attacks, showing good robustness.
{"title":"A Robust Watermarking Method for Hyperspectral Images Based on Hybrid Attention Mechanism","authors":"De Li, Zhewei Zhang, Xuanyou Li, Xun Jin, Yanwei Wang","doi":"10.1155/int/8844705","DOIUrl":"https://doi.org/10.1155/int/8844705","url":null,"abstract":"<p>Because of the copyright issues of hyperspectral images continue to rise, in this paper, we propose to use a neural network–based watermarking model to protect the copyright. By applying normalization-based attention module (NAM) to deep dispersed watermarking with synchronization and fusion (DWSF), a NDWSF model is proposed for robust hyperspectral image watermarking. It consists of encoding, decoding, discrimination, and attack modules. The encoding and decoding modules are used for embedding and extracting watermarks. Discrimination module is proposed for improving the quality of watermarked image. The discrimination module and the encoding module are in an adversarial relationship to motivate the encoder to generate watermarks with stronger invisibility. Attack module is employed between embedding and extraction to improve robustness against compression and noise and geometric attacks. In order to more effectively utilize image features for watermarking, a kind of hybrid attention mechanism is employed in embedding and extraction by adding NAM. Experimental results show that the loss convergence and stability in training is improved. The peak signal-to-noise ratio of the proposed method is 48.08 dB, higher than other methods about 2.5 dB. The bit error rate of the proposed method is less than 2.5% for various hybrid attacks, showing good robustness.</p>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/8844705","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145581097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Kathiravan, Ashwini A., Balasubramaniam S., T. D. Subha, Gururama Senthilvel P., Sivakumar T. A.
The pathological development of abnormal blood vessels results in neovascularization as a major vision-threatening condition of diabetic retinopathy. The main factor behind pathological vessel growth results from retinal capillary depletion of oxygen that causes abnormal vascular development patterns. Early detection of these fundus image abnormalities requires precision because it enables ophthalmologists to provide effective treatment and make proper diagnoses. A multiple-step image processing system treats this problem. A fusion-based contrast enhancement method begins the process of enhancing diabetic retinopathy fundus image brightness and contrast. After the initial process, the system applies detail weighted histogram equalization to the green channel for better structural detail visualization. In the second stage, the proposed online tiger-claw algorithm segments abnormal neovascularization from normal blood vessels. Next, the combination of fuzzy zone-based clustering with optimization and classifier thresholding performs local identification along with highlight generation for neovascularized areas. Neovascularization detection makes use of a YOLOv5 neural network in the third stage through feature extraction and classification operations. A refined segmentation process occurs with the application of multistage gray wolf optimization. The proposed algorithm underwent testing through its application to the public datasets STARE, DRIVE, MESSIDOR, and DIARETDB1. Experimental tests indicate that the neovascularization region marking performed with 98.19% sensitivity and 96.56% specificity while reaching 99.27% accuracy. The proposed approach demonstrates 97.03% accuracy and 98.94% sensitivity, together with 97.17% specificity in neovascularization detection.
{"title":"Optimization Enabled Online Tiger-Claw Fuzzy Region With Clustering Based Neovascularization Segmentation and Classification Using YOLO-V5 From Retinal Fundus Images","authors":"M. Kathiravan, Ashwini A., Balasubramaniam S., T. D. Subha, Gururama Senthilvel P., Sivakumar T. A.","doi":"10.1155/int/6119924","DOIUrl":"https://doi.org/10.1155/int/6119924","url":null,"abstract":"<p>The pathological development of abnormal blood vessels results in neovascularization as a major vision-threatening condition of diabetic retinopathy. The main factor behind pathological vessel growth results from retinal capillary depletion of oxygen that causes abnormal vascular development patterns. Early detection of these fundus image abnormalities requires precision because it enables ophthalmologists to provide effective treatment and make proper diagnoses. A multiple-step image processing system treats this problem. A fusion-based contrast enhancement method begins the process of enhancing diabetic retinopathy fundus image brightness and contrast. After the initial process, the system applies detail weighted histogram equalization to the green channel for better structural detail visualization. In the second stage, the proposed online tiger-claw algorithm segments abnormal neovascularization from normal blood vessels. Next, the combination of fuzzy zone-based clustering with optimization and classifier thresholding performs local identification along with highlight generation for neovascularized areas. Neovascularization detection makes use of a YOLOv5 neural network in the third stage through feature extraction and classification operations. A refined segmentation process occurs with the application of multistage gray wolf optimization. The proposed algorithm underwent testing through its application to the public datasets STARE, DRIVE, MESSIDOR, and DIARETDB1. Experimental tests indicate that the neovascularization region marking performed with 98.19% sensitivity and 96.56% specificity while reaching 99.27% accuracy. The proposed approach demonstrates 97.03% accuracy and 98.94% sensitivity, together with 97.17% specificity in neovascularization detection.</p>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/6119924","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145572459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fernando Sevilla Martínez, Jordi Casas-Roma, Laia Subirats, Raúl Parada
As autonomous driving (AD) systems grow more complex, their rising computational demands pose significant energy and sustainability challenges. This paper investigates spiking neural networks (SNNs) as low-power alternatives to convolutional neural networks (CNNs) for regression tasks in AD. We introduce a membrane-potential (Vmem) decoding framework that converts binary spike trains into continuous outputs and propose the energy-to-error ratio (EER), a unified metric combining prediction error with energy consumption. Three CNN architectures (PilotNet, LaksNet, and MiniNet) and their corresponding SNN variants are trained and evaluated using delta, latency, and rate encoding across varied parameter settings, with energy use and emissions logged. Delta-encoded SNNs achieve the highest EER, substantial energy savings with minimal performance loss, whereas CNNs, despite slightly better MSE, incur 10–20 × higher energy costs. Rate encoding underperforms, and latency encoding, though improving relative error, demands excessive energy. Parameter tuning (threshold θ, temporal dynamics (S), membrane time constant (τ), and gain G) directly influences eco-efficiency. All experiments run on standard GPUs, showing SNNs can surpass CNNs in eco-efficiency without specialized hardware. Paired statistical tests confirm that only delta-encoded SNNs achieve significant EER improvements. This work presents a practical, energy-aware evaluation framework for neural architectures, establishing EER as a critical metric for sustainable machine learning in intelligent transport and beyond.
{"title":"Energy-Aware Regression in Spiking Neural Networks for Autonomous Driving: A Comparative Study With Convolutional Networks","authors":"Fernando Sevilla Martínez, Jordi Casas-Roma, Laia Subirats, Raúl Parada","doi":"10.1155/int/4879993","DOIUrl":"https://doi.org/10.1155/int/4879993","url":null,"abstract":"<p>As autonomous driving (AD) systems grow more complex, their rising computational demands pose significant energy and sustainability challenges. This paper investigates spiking neural networks (SNNs) as low-power alternatives to convolutional neural networks (CNNs) for regression tasks in AD. We introduce a membrane-potential (<i>V</i><sub>mem</sub>) decoding framework that converts binary spike trains into continuous outputs and propose the energy-to-error ratio (EER), a unified metric combining prediction error with energy consumption. Three CNN architectures (PilotNet, LaksNet, and MiniNet) and their corresponding SNN variants are trained and evaluated using delta, latency, and rate encoding across varied parameter settings, with energy use and emissions logged. Delta-encoded SNNs achieve the highest EER, substantial energy savings with minimal performance loss, whereas CNNs, despite slightly better MSE, incur 10–20 × higher energy costs. Rate encoding underperforms, and latency encoding, though improving relative error, demands excessive energy. Parameter tuning (threshold <i>θ</i>, temporal dynamics (<i>S</i>), membrane time constant (<i>τ</i>), and gain <i>G</i>) directly influences eco-efficiency. All experiments run on standard GPUs, showing SNNs can surpass CNNs in eco-efficiency without specialized hardware. Paired statistical tests confirm that only delta-encoded SNNs achieve significant EER improvements. This work presents a practical, energy-aware evaluation framework for neural architectures, establishing EER as a critical metric for sustainable machine learning in intelligent transport and beyond.</p>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/4879993","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145522267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laiqiao Qin, Tianqing Zhu, Wanlei Zhou, Philip S. Yu
Federated learning (FL) is a distributed and privacy-preserving machine learning paradigm that coordinates multiple clients to train a model while keeping the raw data localized. However, this traditional FL poses some challenges, including privacy risks, data heterogeneity, communication bottlenecks, and system heterogeneity issues. To tackle these challenges, knowledge distillation (KD) has been widely applied in FL since 2020. KD is a validated and efficacious model compression and enhancement algorithm. The core concept of KD involves facilitating knowledge transfer between models by exchanging logits at intermediate or output layers. These properties make KD an excellent solution for the long-lasting challenges in FL. Up to now, there have been few reviews that summarize and analyze the current trend and methods for how KD can be applied in FL efficiently. This article aims to provide a comprehensive survey of KD-based FL, focusing on addressing the above challenges. First, we provide an overview of KD-based FL, including its motivation, basics, taxonomy, and a comparison with traditional FL and where KD should execute. We also analyze the critical factors in KD-based FL in the Appendix, including teachers, knowledge, data, and methods. We discuss how KD can address the challenges in FL, including privacy protection, data heterogeneity, communication efficiency, and personalization. Finally, we discuss the challenges facing KD-based FL algorithms and future research directions. We hope this survey can provide insights and guidance for researchers and practitioners in the FL area.
{"title":"Knowledge Distillation in Federated Learning: A Survey on Long Lasting Challenges and New Solutions","authors":"Laiqiao Qin, Tianqing Zhu, Wanlei Zhou, Philip S. Yu","doi":"10.1155/int/7406934","DOIUrl":"https://doi.org/10.1155/int/7406934","url":null,"abstract":"<p>Federated learning (FL) is a distributed and privacy-preserving machine learning paradigm that coordinates multiple clients to train a model while keeping the raw data localized. However, this traditional FL poses some challenges, including privacy risks, data heterogeneity, communication bottlenecks, and system heterogeneity issues. To tackle these challenges, knowledge distillation (KD) has been widely applied in FL since 2020. KD is a validated and efficacious model compression and enhancement algorithm. The core concept of KD involves facilitating knowledge transfer between models by exchanging logits at intermediate or output layers. These properties make KD an excellent solution for the long-lasting challenges in FL. Up to now, there have been few reviews that summarize and analyze the current trend and methods for how KD can be applied in FL efficiently. This article aims to provide a comprehensive survey of KD-based FL, focusing on addressing the above challenges. First, we provide an overview of KD-based FL, including its motivation, basics, taxonomy, and a comparison with traditional FL and where KD should execute. We also analyze the critical factors in KD-based FL in the Appendix, including teachers, knowledge, data, and methods. We discuss how KD can address the challenges in FL, including privacy protection, data heterogeneity, communication efficiency, and personalization. Finally, we discuss the challenges facing KD-based FL algorithms and future research directions. We hope this survey can provide insights and guidance for researchers and practitioners in the FL area.</p>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/7406934","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145521665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Effective health monitoring of electrical equipment is critical for industrial reliability. Although infrared thermal imaging offers a powerful noncontact diagnostic method, accurately interpreting its complex and often noisy thermal patterns remains a significant challenge. Entropy-based analysis is well suited for quantifying this complexity, but its application to images has been limited. Existing two-dimensional entropy methods are not only less developed than their one-dimensional counterparts but also typically require converting thermal images to grayscale, which discards vital diagnostic information from color channels. To overcome these limitations, this study introduces the modified multiscale two-dimensional color distribution entropy (MMCDEn2D). This novel method directly integrates the attributes of the RGB, preserving a richer feature set for analysis. The effectiveness of the proposed method is demonstrated first through synthetic signals, showing low sensitivity to image size and high computational efficiency. The study further extends the application of entropy-based analysis to noncontact health monitoring scenarios, implementing MMCDEn2D for thermal image-based fault diagnosis of induction motors and power transformers. The method achieves a diagnostic accuracy that exceeds 95%, significantly outperforming traditional approaches. Crucially, it demonstrates superior robustness in challenging scenarios, improving accuracy by 2%–5% under high-noise conditions and with small sample sizes. These results establish MMCDEn2D as a highly effective and reliable tool to advance noncontact fault diagnosis in critical electrical equipment.
{"title":"Noncontact Fault Diagnosis of Electrical Equipment Using Modified Multiscale Two-Dimensional Color Distribution Entropy and Thermal Imaging","authors":"Shun Wang, Yolanda Vidal, Francesc Pozo","doi":"10.1155/int/4805844","DOIUrl":"https://doi.org/10.1155/int/4805844","url":null,"abstract":"<p>Effective health monitoring of electrical equipment is critical for industrial reliability. Although infrared thermal imaging offers a powerful noncontact diagnostic method, accurately interpreting its complex and often noisy thermal patterns remains a significant challenge. Entropy-based analysis is well suited for quantifying this complexity, but its application to images has been limited. Existing two-dimensional entropy methods are not only less developed than their one-dimensional counterparts but also typically require converting thermal images to grayscale, which discards vital diagnostic information from color channels. To overcome these limitations, this study introduces the modified multiscale two-dimensional color distribution entropy (MMCDEn<sub>2D</sub>). This novel method directly integrates the attributes of the RGB, preserving a richer feature set for analysis. The effectiveness of the proposed method is demonstrated first through synthetic signals, showing low sensitivity to image size and high computational efficiency. The study further extends the application of entropy-based analysis to noncontact health monitoring scenarios, implementing MMCDEn<sub>2D</sub> for thermal image-based fault diagnosis of induction motors and power transformers. The method achieves a diagnostic accuracy that exceeds 95%, significantly outperforming traditional approaches. Crucially, it demonstrates superior robustness in challenging scenarios, improving accuracy by 2%–5% under high-noise conditions and with small sample sizes. These results establish MMCDEn<sub>2D</sub> as a highly effective and reliable tool to advance noncontact fault diagnosis in critical electrical equipment.</p>","PeriodicalId":14089,"journal":{"name":"International Journal of Intelligent Systems","volume":"2025 1","pages":""},"PeriodicalIF":3.7,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1155/int/4805844","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145521741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}