We present a novel computer vision method for detecting insect pests on plant and tree leaves under real-world conditions, combining deep learning with classical image processing techniques. Detecting small, sparsely distributed, or camouflaged insects is challenging, as current state-of-the-art object detection methods, primarily designed for larger objects, often overlook them. Our approach to this problem is twofold. First, we employ a deep learning model to analyze suspicious leaves for anomalies (a task well suited to deep learning). However, since deep models struggle with tiny objects in complex backgrounds, we complement them with conventional image processing to pre-identify potentially infested foliage, guiding the model toward relevant areas and mitigating its limitations. This combined strategy proves effective and competitive with other methods across diverse datasets and real-world scenarios. Furthermore, we also conduct a detailed analysis to interpret the model’s predictions, strengthening confidence in its effectiveness.
{"title":"AI-driven detection of tiny pests in foliage: Integrating image processing and deep learning","authors":"Lucía Baeza-Moreno , Pedro Blanco-Carmona , Eduardo Hidalgo-Fort , Rubén Martín-Clemente , Ramón González-Carvajal","doi":"10.1016/j.mlwa.2025.100834","DOIUrl":"10.1016/j.mlwa.2025.100834","url":null,"abstract":"<div><div>We present a novel computer vision method for detecting insect pests on plant and tree leaves under real-world conditions, combining deep learning with classical image processing techniques. Detecting small, sparsely distributed, or camouflaged insects is challenging, as current state-of-the-art object detection methods, primarily designed for larger objects, often overlook them. Our approach to this problem is twofold. First, we employ a deep learning model to analyze suspicious leaves for anomalies (a task well suited to deep learning). However, since deep models struggle with tiny objects in complex backgrounds, we complement them with conventional image processing to pre-identify potentially infested foliage, guiding the model toward relevant areas and mitigating its limitations. This combined strategy proves effective and competitive with other methods across diverse datasets and real-world scenarios. Furthermore, we also conduct a detailed analysis to interpret the model’s predictions, strengthening confidence in its effectiveness.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100834"},"PeriodicalIF":4.9,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-30DOI: 10.1016/j.mlwa.2025.100833
Chandramohan Abhishek , Nadimpalli Raghukiran
The present research showcases a machine-interactive approach for making decisions using a pre-trained natural language processing (NLP) model. The method is developed for 4D (4-dimensional) printing technique selection, as a plurality of variables is involved, such as process, material, design, and sequence selections. Due to the availability of numerous options, arriving at a preferred choice of technique requires expertise and time. The developed method aids in finding assistance from a single source. The approach incorporates bidirectional encoder representations from transformers (BERT), which accommodates parallel meanings of user requests, such as synonyms and adjectives, among others. The closed-loop system is programmed with a set of 7 prompts. It also introduces additional affirmation prompts to navigate both ambiguous phrasing and out-of-scope detection in order to receive a meaningful recommendation from the machine. The rule-governed technique (lightweight rule set) guides the selection of the conformable request during each prompt. The inference-based approach takes user requests, performs objective classification using BERT according to selected criteria, then dynamically filters the data, and recommends suggestions, with an inference time of 0.79 s. The modified model also establishes multi-level relationships among prompts for text classification. k-fold validation reached highest possible accuracy upon training with optimal hyperparameters. The fine-tuned method developed in Python environment can be generalized for other systems. The present research demonstrates the possibility of adapting an openly accessible model for developing a decision-assistance system with minimal personal computational resources.
{"title":"Machine-interactive decision-assistance using a pre-trained natural language processing model for 4D printing technique selection","authors":"Chandramohan Abhishek , Nadimpalli Raghukiran","doi":"10.1016/j.mlwa.2025.100833","DOIUrl":"10.1016/j.mlwa.2025.100833","url":null,"abstract":"<div><div>The present research showcases a machine-interactive approach for making decisions using a pre-trained natural language processing (NLP) model. The method is developed for 4D (4-dimensional) printing technique selection, as a plurality of variables is involved, such as process, material, design, and sequence selections. Due to the availability of numerous options, arriving at a preferred choice of technique requires expertise and time. The developed method aids in finding assistance from a single source. The approach incorporates bidirectional encoder representations from transformers (BERT), which accommodates parallel meanings of user requests, such as synonyms and adjectives, among others. The closed-loop system is programmed with a set of 7 prompts. It also introduces additional affirmation prompts to navigate both ambiguous phrasing and out-of-scope detection in order to receive a meaningful recommendation from the machine. The rule-governed technique (lightweight rule set) guides the selection of the conformable request during each prompt. The inference-based approach takes user requests, performs objective classification using BERT according to selected criteria, then dynamically filters the data, and recommends suggestions, with an inference time of 0.79 s. The modified model also establishes multi-level relationships among prompts for text classification. k-fold validation reached highest possible accuracy upon training with optimal hyperparameters. The fine-tuned method developed in Python environment can be generalized for other systems. The present research demonstrates the possibility of adapting an openly accessible model for developing a decision-assistance system with minimal personal computational resources.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100833"},"PeriodicalIF":4.9,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social-emotional learning (SEL) plays a crucial role in special education, yet current assessment approaches rely heavily on subjective teacher observation, which can be time-consuming and difficult to standardize. Music provides a meaningful medium for evaluating emotional competencies, creating an opportunity for artificial intelligence to support more objective and scalable SEL assessment.
Methods
We propose a lightweight social-emotional music classification model, termed LSEL, designed to identify three SEL-related competencies: Empathetic Perspective-Taking, Outlook, and Problem-Solving. LSEL utilizes 40×128 mel-frequency cepstral coefficient as input to capture core spectral–temporal characteristics relevant to SEL perception. Moreover, we provided an open-source SEM dataset for domain experts, utilizing 591 samples, which consisted of 194 Empathetic, 214 Outlook, and 183 Perspective-Taking samples, to analyze LSEL performance.
Results
LSEL reaching an average accuracy of 96.55 % and mAP of 99.29 % across experiments. With Cohen’s κ averaging 94.32 % and R² reaching 94.15 %, indicating high consistency with ground-truth. Per-category accuracies were similarly stable, including 96.95 % for Empathetic Perspective-Taking, 95.16 % for Outlook, and 95.36 % for Problem-Solving.
Conclusions
The lightweight LSEL framework offers an effective and robust solution for social-emotional music classification, supporting objective SEL assessment in educational contexts. The release of the SEM dataset further contributes to a valuable resource for advancing AI-assisted SEL research.
{"title":"LSEL: A lightweight deep learning model for social-emotional classification of classical music","authors":"Yuan-Jin Lin , Yu-Chi Chou , Shan-Ken Chien , Pen-Chiang Chao , Kuang-Kai Yeh , Yen-Chia Peng , Chen-Hao Tsao , Chih-Yun Chen , Shih-Lun Chen , Kuo-Chen Li , Wei-Chen Tu","doi":"10.1016/j.mlwa.2025.100832","DOIUrl":"10.1016/j.mlwa.2025.100832","url":null,"abstract":"<div><h3>Background/Objectives</h3><div>Social-emotional learning (SEL) plays a crucial role in special education, yet current assessment approaches rely heavily on subjective teacher observation, which can be time-consuming and difficult to standardize. Music provides a meaningful medium for evaluating emotional competencies, creating an opportunity for artificial intelligence to support more objective and scalable SEL assessment.</div></div><div><h3>Methods</h3><div>We propose a lightweight social-emotional music classification model, termed LSEL, designed to identify three SEL-related competencies: Empathetic Perspective-Taking, Outlook, and Problem-Solving. LSEL utilizes 40×128 mel-frequency cepstral coefficient as input to capture core spectral–temporal characteristics relevant to SEL perception. Moreover, we provided an open-source SEM dataset for domain experts, utilizing 591 samples, which consisted of 194 Empathetic, 214 Outlook, and 183 Perspective-Taking samples, to analyze LSEL performance.</div></div><div><h3>Results</h3><div>LSEL reaching an average accuracy of 96.55 % and mAP of 99.29 % across experiments. With Cohen’s κ averaging 94.32 % and R² reaching 94.15 %, indicating high consistency with ground-truth. Per-category accuracies were similarly stable, including 96.95 % for Empathetic Perspective-Taking, 95.16 % for Outlook, and 95.36 % for Problem-Solving.</div></div><div><h3>Conclusions</h3><div>The lightweight LSEL framework offers an effective and robust solution for social-emotional music classification, supporting objective SEL assessment in educational contexts. The release of the SEM dataset further contributes to a valuable resource for advancing AI-assisted SEL research.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100832"},"PeriodicalIF":4.9,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1016/j.mlwa.2025.100828
Jungmin Eom , Minjun Kang , Myungkeun Yoon , Nikil Dutt , Jinkyu Kim , Jaekoo Lee
Deep learning-based medical AI systems are increasingly deployed for disease diagnosis in decentralized healthcare environments where data are siloed across hospitals and IoT devices and cannot be freely shared due to strict privacy and security regulations. However, most existing continual learning and distributed learning approaches either assume centrally aggregated data or overlook incremental clinical changes, leading to catastrophic forgetting when applied to real-world medical data streams.
This paper introduces a novel healthcare-specific framework that integrates continual learning and distributed learning methods to utilize medical AI models effectively by addressing the practical constraints of the healthcare and medical ecosystem, such as data privacy, security, and changing clinical environments. Through the proposed framework, medical clients, such as hospital devices and IoT-based smart devices, can collaboratively train deep learning-based models on distributed computing resources without sharing sensitive data. Additionally, by considering incremental characteristics in medical environments such as mutations, new diseases, and abnormalities, the proposed framework can improve the disease diagnosis of medical AI models in actual clinical scenarios.
We propose Privacy-preserving Rehearsal-based Continual Split Learning (PRCSL), a healthcare-specific continual split learning framework that combines differential-privacy-based exemplar sharing, a mutual information alignment (MIA) module to correct representation shifts induced by noisy exemplars, and a parameter-free nearest-mean-of-exemplars (NME) classifier to mitigate task-recency bias under non-IID data distributions. o=Across eight benchmark datasets, including four MedMNIST subsets, HAM10000, CCH5000, c=CIFAR,cp=, p=100, and SVHN, PRCSL achieves competitive performance compared with representative continual learning baselines in terms of average accuracy and average forgetting. In particular, PRCSL achieves up to 3.62%p higher average accuracy than the best baseline. These results indicate that PRCSL enables privacy-preserving, communication-efficient, and continually adaptable medical AI in realistic decentralized clinical and IoT-enabled ecosystems. Our code is publicly available at our repository.
{"title":"PRCSL: A privacy-preserving continual split learning framework for decentralized medical diagnosis","authors":"Jungmin Eom , Minjun Kang , Myungkeun Yoon , Nikil Dutt , Jinkyu Kim , Jaekoo Lee","doi":"10.1016/j.mlwa.2025.100828","DOIUrl":"10.1016/j.mlwa.2025.100828","url":null,"abstract":"<div><div>Deep learning-based medical AI systems are increasingly deployed for disease diagnosis in decentralized healthcare environments where data are siloed across hospitals and IoT devices and cannot be freely shared due to strict privacy and security regulations. However, most existing continual learning and distributed learning approaches either assume centrally aggregated data or overlook incremental clinical changes, leading to catastrophic forgetting when applied to real-world medical data streams.</div><div>This paper introduces a novel healthcare-specific framework that integrates continual learning and distributed learning methods to utilize medical AI models effectively by addressing the practical constraints of the healthcare and medical ecosystem, such as data privacy, security, and changing clinical environments. Through the proposed framework, medical clients, such as hospital devices and IoT-based smart devices, can collaboratively train deep learning-based models on distributed computing resources without sharing sensitive data. Additionally, by considering incremental characteristics in medical environments such as mutations, new diseases, and abnormalities, the proposed framework can improve the disease diagnosis of medical AI models in actual clinical scenarios.</div><div>We propose Privacy-preserving Rehearsal-based Continual Split Learning (PRCSL), a healthcare-specific continual split learning framework that combines differential-privacy-based exemplar sharing, a mutual information alignment (MIA) module to correct representation shifts induced by noisy exemplars, and a parameter-free nearest-mean-of-exemplars (NME) classifier to mitigate task-recency bias under non-IID data distributions. o=Across eight benchmark datasets, including four MedMNIST subsets, HAM10000, CCH5000, c=CIFAR,cp=, p=100, and SVHN, PRCSL achieves competitive performance compared with representative continual learning baselines in terms of average accuracy and average forgetting. In particular, PRCSL achieves up to 3.62%p higher average accuracy than the best baseline. These results indicate that PRCSL enables privacy-preserving, communication-efficient, and continually adaptable medical AI in realistic decentralized clinical and IoT-enabled ecosystems. Our code is publicly available at our repository.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100828"},"PeriodicalIF":4.9,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1016/j.mlwa.2025.100829
Seerat Kaur, Sukhjit Singh Sehra, Darisuh Ebrahimi, Emad A. Mohammed
Reliable traffic predictions are essential for managing congestion, optimizing routes, improving commuter safety, and advancing the performance of intelligent transportation systems (ITS). However, existing centralized systems often lack adaptability to real-world traffic patterns and fail to capture spatio-temporal variability and client-level heterogeneity. These systems require large amounts of sensitive data to be collected on central servers, intensifying privacy risks. This study proposes a privacy-preserving Federated Learning (FL) framework for traffic flow and speed prediction (5 to 60 mins ahead) using non-independent and identically distributed (non-IID) traffic data. The objectives of this study are threefold: (1) design a client-aware custom FL aggregation strategy that accounts for traffic heterogeneity and client-specific dynamics, ignored in standard FL methods, (2) improve personalization by grouping clients based on real-world traffic pattern similarity via clustering-based approach and, (3) enhance convergence and predictive performance of global aggregation using dynamic, traffic-aware aggregation scores. The proposed framework designs a hybrid FL long-short-term memory (FedLSTM) model augmented with an attention mechanism to effectively model both temporal and spatial traffic variations across junctions, while ensuring that all raw data remains local. To improve learning under traffic diversity and imbalanced traffic distribution patterns, we propose a custom traffic-aware aggregation strategy that dynamically weighs client contributions based on six traffic-based metrics. Evaluations on clustered client partitions demonstrate that our custom aggregation consistently outperformed the baseline strategies across multiple evaluation metrics. These results highlight the effectiveness of integrating traffic-aware aggregation in enhancing the performance and generalization capability of FL-based traffic prediction frameworks.
{"title":"A traffic-aware federated learning prediction framework with custom aggregation","authors":"Seerat Kaur, Sukhjit Singh Sehra, Darisuh Ebrahimi, Emad A. Mohammed","doi":"10.1016/j.mlwa.2025.100829","DOIUrl":"10.1016/j.mlwa.2025.100829","url":null,"abstract":"<div><div>Reliable traffic predictions are essential for managing congestion, optimizing routes, improving commuter safety, and advancing the performance of intelligent transportation systems (ITS). However, existing centralized systems often lack adaptability to real-world traffic patterns and fail to capture spatio-temporal variability and client-level heterogeneity. These systems require large amounts of sensitive data to be collected on central servers, intensifying privacy risks. This study proposes a privacy-preserving Federated Learning (FL) framework for traffic flow and speed prediction (5 to 60 mins ahead) using non-independent and identically distributed (non-IID) traffic data. The objectives of this study are threefold: (1) design a client-aware custom FL aggregation strategy that accounts for traffic heterogeneity and client-specific dynamics, ignored in standard FL methods, (2) improve personalization by grouping clients based on real-world traffic pattern similarity via clustering-based approach and, (3) enhance convergence and predictive performance of global aggregation using dynamic, traffic-aware aggregation scores. The proposed framework designs a hybrid FL long-short-term memory (FedLSTM) model augmented with an attention mechanism to effectively model both temporal and spatial traffic variations across junctions, while ensuring that all raw data remains local. To improve learning under traffic diversity and imbalanced traffic distribution patterns, we propose a custom traffic-aware aggregation strategy that dynamically weighs client contributions based on six traffic-based metrics. Evaluations on clustered client partitions demonstrate that our custom aggregation consistently outperformed the baseline strategies across multiple evaluation metrics. These results highlight the effectiveness of integrating traffic-aware aggregation in enhancing the performance and generalization capability of FL-based traffic prediction frameworks.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100829"},"PeriodicalIF":4.9,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-29DOI: 10.1016/j.mlwa.2025.100831
Xianyuan Zhu
Accurate identification of crop growth stages is crucial for precision agriculture and automated field management. This study designed and developed an improved Swin Transformer-based detection system for wheat growth stages, with an emphasis on real time deployment on embedded edge devices. Specifically, we incorporate a Progressive Transfer Learning strategy to ensure robust generalization on agricultural data and introduce an Ordinal Regression Loss to effectively mitigate misclassifications in transitional growth stages. The proposed approach integrates a hierarchical Transformer backbone with an optimized deployment pipeline for NVIDIA Jetson Orin NX, supporting gallery images, video streams, and live camera inputs. Experimental evaluation demonstrated that the system achieves consistently high recognition accuracy (above 93%) while maintaining real-time performance (above 12FPS) under different modes, with moderate power consumption (6–8 W). Compared with baseline CNNs (ResNet-50, MobileNetV3) and Transformer models (ViT), the proposed design achieves a favorable balance among accuracy, efficiency, and robustness. These results suggest that the system can contribute to the development of practical agricultural monitoring and provide a step toward intelligent control strategies in precision farming.
作物生长阶段的准确识别对于精准农业和自动化田间管理至关重要。本研究设计并开发了一种改进的基于Swin变压器的小麦生长阶段检测系统,重点是在嵌入式边缘设备上的实时部署。具体而言,我们采用渐进迁移学习策略来确保农业数据的鲁棒泛化,并引入序数回归损失来有效减轻过渡生长阶段的错误分类。所提出的方法集成了一个分层Transformer主干和一个针对NVIDIA Jetson Orin NX的优化部署管道,支持图库图像、视频流和实时摄像机输入。实验评估表明,该系统在不同模式下均能保持较高的识别准确率(93%以上),同时保持实时性(12FPS以上),且功耗适中(6-8 W)。与基线cnn (ResNet-50、MobileNetV3)和Transformer模型(ViT)相比,本文提出的设计在准确率、效率和鲁棒性之间取得了良好的平衡。这些结果表明,该系统可以促进实际农业监测的发展,并为精准农业的智能控制策略提供一步。
{"title":"Real-time wheat growth stage detection via improved Swin transformer for edge devices","authors":"Xianyuan Zhu","doi":"10.1016/j.mlwa.2025.100831","DOIUrl":"10.1016/j.mlwa.2025.100831","url":null,"abstract":"<div><div>Accurate identification of crop growth stages is crucial for precision agriculture and automated field management. This study designed and developed an improved Swin Transformer-based detection system for wheat growth stages, with an emphasis on real time deployment on embedded edge devices. Specifically, we incorporate a Progressive Transfer Learning strategy to ensure robust generalization on agricultural data and introduce an Ordinal Regression Loss to effectively mitigate misclassifications in transitional growth stages. The proposed approach integrates a hierarchical Transformer backbone with an optimized deployment pipeline for NVIDIA Jetson Orin NX, supporting gallery images, video streams, and live camera inputs. Experimental evaluation demonstrated that the system achieves consistently high recognition accuracy (above 93%) while maintaining real-time performance (above 12FPS) under different modes, with moderate power consumption (6–8 W). Compared with baseline CNNs (ResNet-50, MobileNetV3) and Transformer models (ViT), the proposed design achieves a favorable balance among accuracy, efficiency, and robustness. These results suggest that the system can contribute to the development of practical agricultural monitoring and provide a step toward intelligent control strategies in precision farming.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100831"},"PeriodicalIF":4.9,"publicationDate":"2025-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}