Pub Date : 2025-12-03DOI: 10.1016/j.mlwa.2025.100811
Ramin Mousa , Saeed Chamani , Mohammad Morsali , Mohammad Kazzazi , Parsa Hatami , Soroush Sarabi
Skin cancer (SC) is a life-threatening disease where early diagnosis is critical for effective treatment and survival. While deep learning (DL) has advanced skin cancer diagnosis (SCD), current methods generally yield suboptimal accuracy and efficiency due to challenges in extracting multiscale features from dermoscopic images and optimizing complex model parameters through efficient exploration of the space of hyperparameters. To address this, we propose an approach integrating late Discrete Wavelet Transform (DWT) with pre-trained convolutional neural networks (CNNs) and swarm-based optimization. The late DWT decomposes CNN-extracted feature maps into low- and high-frequency components to improve the detection of subtle lesion patterns, while a self-attention mechanism further refines this by weighing feature importance, focusing on relevant diagnostic information. To refine hyperparameters, three novel swarm-based optimizers – Modified Gorilla Troops Optimizer (MGTO), Improved Gray Wolf Optimization (IGWO), and Fox Optimization (FOX) – are employed searching the space of the hyperparameters to fine-tune the model for superior performance. In comparison to existing methods, experiments on the ISIC-2016 and ISIC-2017 datasets show enhanced classification performance, obtaining at least a 1% accuracy gain. Thus, the suggested framework offers a reliable and effective way to diagnose skin cancer automatically.
{"title":"Enhancing skin cancer diagnosis using late discrete wavelet transform and new swarm-based optimizers","authors":"Ramin Mousa , Saeed Chamani , Mohammad Morsali , Mohammad Kazzazi , Parsa Hatami , Soroush Sarabi","doi":"10.1016/j.mlwa.2025.100811","DOIUrl":"10.1016/j.mlwa.2025.100811","url":null,"abstract":"<div><div>Skin cancer (SC) is a life-threatening disease where early diagnosis is critical for effective treatment and survival. While deep learning (DL) has advanced skin cancer diagnosis (SCD), current methods generally yield suboptimal accuracy and efficiency due to challenges in extracting multiscale features from dermoscopic images and optimizing complex model parameters through efficient exploration of the space of hyperparameters. To address this, we propose an approach integrating late Discrete Wavelet Transform (DWT) with pre-trained convolutional neural networks (CNNs) and swarm-based optimization. The late DWT decomposes CNN-extracted feature maps into low- and high-frequency components to improve the detection of subtle lesion patterns, while a self-attention mechanism further refines this by weighing feature importance, focusing on relevant diagnostic information. To refine hyperparameters, three novel swarm-based optimizers – Modified Gorilla Troops Optimizer (MGTO), Improved Gray Wolf Optimization (IGWO), and Fox Optimization (FOX) – are employed searching the space of the hyperparameters to fine-tune the model for superior performance. In comparison to existing methods, experiments on the ISIC-2016 and ISIC-2017 datasets show enhanced classification performance, obtaining at least a 1% accuracy gain. Thus, the suggested framework offers a reliable and effective way to diagnose skin cancer automatically.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100811"},"PeriodicalIF":4.9,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-02DOI: 10.1016/j.mlwa.2025.100809
Faisal Saeed , Anand Paul
Effectively detecting and assessing real-time structural and ecological parameters in contemporary manufacturing environments poses significant challenges, particularly in identifying minute objects within product images. The swift evolution of the industrial sector underscores the necessity for intelligent manufacturing environments to uphold stringent product quality standards. However, accelerating production processes at high speeds heightens the risk of defective product outcomes. This research addresses the challenges inherent in small object detection within industrial contexts, proposing an innovative detection transformer model tailored to modern manufacturing environments. The proposed model integrates a feature-enhanced multi-head self-attention block (FEMSA), merging cross-channel communication network and multiple multi-head self-attention (MSA) components to refine image features. A query proposal network is also introduced within the detection transformer framework to discern high-ranking proposals using Intersection over Union (IoU) and Non-Maximum Suppression (NMS) algorithms. Through extensive experimentation on custom industrial small objects, our proposed model demonstrates superior performance compared to existing models based on Non-Maximum Suppression and transformers. By tackling the challenges associated with small object detection, our model contributes to the dynamic synchronization between virtual and physical manufacturing realms, enhancing quality control in industrial production.
{"title":"ISO-DeTr: A novel detection transformer for industrial small object detection","authors":"Faisal Saeed , Anand Paul","doi":"10.1016/j.mlwa.2025.100809","DOIUrl":"10.1016/j.mlwa.2025.100809","url":null,"abstract":"<div><div>Effectively detecting and assessing real-time structural and ecological parameters in contemporary manufacturing environments poses significant challenges, particularly in identifying minute objects within product images. The swift evolution of the industrial sector underscores the necessity for intelligent manufacturing environments to uphold stringent product quality standards. However, accelerating production processes at high speeds heightens the risk of defective product outcomes. This research addresses the challenges inherent in small object detection within industrial contexts, proposing an innovative detection transformer model tailored to modern manufacturing environments. The proposed model integrates a feature-enhanced multi-head self-attention block (FEMSA), merging cross-channel communication network and multiple multi-head self-attention (MSA) components to refine image features. A query proposal network is also introduced within the detection transformer framework to discern high-ranking proposals using Intersection over Union (IoU) and Non-Maximum Suppression (NMS) algorithms. Through extensive experimentation on custom industrial small objects, our proposed model demonstrates superior performance compared to existing models based on Non-Maximum Suppression and transformers. By tackling the challenges associated with small object detection, our model contributes to the dynamic synchronization between virtual and physical manufacturing realms, enhancing quality control in industrial production.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100809"},"PeriodicalIF":4.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Detecting advertisements in digitized newspapers is a key step in large-scale media analytics and digital archiving. However, variations in layout, typography, and advertisement design across publishers and time periods cause significant domain shifts that reduce the generalization ability of supervised detectors. This paper presents AdAPT, a confidence-guided pseudo-labeling pipeline for unsupervised domain adaptation in advertisement detection. The proposed method leverages both advertisement-free (Null) and advertisement-containing pages from unlabeled target domains to generate reliable pseudo-labels. By retraining a YOLO-based detector using labeled source data combined with filtered pseudo-labeled target samples, AdAPT achieves robust adaptation without requiring manual annotation. Experiments conducted on two unseen newspapers (Adresseavisen and iTromsø) demonstrate that Null-based pseudo-labeling provides the most stable and accurate adaptation, yielding up to 38% error reduction compared to the baseline. The results highlight AdAPT as a simple, scalable, and annotation-efficient solution for maintaining high-performance advertisement detection across diverse newspaper collections.
{"title":"AdAPT: Advertisement detector adaptation under newspaper domain shift with null-based pseudo-labeling","authors":"Faeze Zakaryapour Sayyad , Tobias Pettersson , Seyed Jalaleddin Mousavirad , Irida Shallari , Mattias O’Nils","doi":"10.1016/j.mlwa.2025.100806","DOIUrl":"10.1016/j.mlwa.2025.100806","url":null,"abstract":"<div><div>Detecting advertisements in digitized newspapers is a key step in large-scale media analytics and digital archiving. However, variations in layout, typography, and advertisement design across publishers and time periods cause significant domain shifts that reduce the generalization ability of supervised detectors. This paper presents AdAPT, a confidence-guided pseudo-labeling pipeline for unsupervised domain adaptation in advertisement detection. The proposed method leverages both advertisement-free (Null) and advertisement-containing pages from unlabeled target domains to generate reliable pseudo-labels. By retraining a YOLO-based detector using labeled source data combined with filtered pseudo-labeled target samples, AdAPT achieves robust adaptation without requiring manual annotation. Experiments conducted on two unseen newspapers (Adresseavisen and iTromsø) demonstrate that Null-based pseudo-labeling provides the most stable and accurate adaptation, yielding up to 38% error reduction compared to the baseline. The results highlight AdAPT as a simple, scalable, and annotation-efficient solution for maintaining high-performance advertisement detection across diverse newspaper collections.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100806"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29DOI: 10.1016/j.mlwa.2025.100798
Temitope Olubanjo Kehinde , Azeez A. Oyedele , Morenikeji Kabirat Kareem , Joseph Akpan , Oludolapo A. Olanrewaju
This study presents an integrated Data Envelopment Analysis (DEA) and ensemble learning framework optimized with the Golden Jackal Optimization (GJO) algorithm to evaluate and predict the efficiency of United States information technology firms. Both Constant Returns to Scale and Variable Returns to Scale models were applied to measure firm efficiency and compute scale efficiency, providing a clearer distinction between managerial and scale-related effects. Using data from 3940 firms over the period 2013 to 2023, a robustness test introducing ±20% random noise to a 10% random sample confirmed that the CCR model achieved stronger stability, with a correlation coefficient of 0.795 compared to 0.773 for the BCC model. Consequently, the CCR results were adopted as the basis for predictive modeling. DEA efficiency scores were predicted using six ensemble learners, including XGBoost, Gradient Boosting Regressor, AdaBoost, Extra Trees Regressor, Random Forest, and LightGBM, with GJO employed for hyperparameter tuning. The Gradient Boosting Regressor optimized with GJO achieved the best predictive performance, accurately reproducing the observed efficiency scores. SHAP and feature importance analyses revealed that Total Equity, Operating Income, and Total Assets were the most influential determinants of efficiency. This research contributes a scalable and interpretable approach to efficiency prediction, offering actionable insights for managers, investors, and policymakers in volatile financial markets.
本研究提出一个整合数据包络分析(DEA)与金豺优化(GJO)算法的集成学习框架来评估和预测美国信息技术公司的效率。恒定规模回报和可变规模回报模型都被应用于衡量企业效率和计算规模效率,在管理效应和规模相关效应之间提供了更清晰的区分。利用2013年至2023年期间3940家企业的数据,对10%随机样本引入±20%随机噪声的稳健性检验证实,CCR模型具有更强的稳定性,其相关系数为0.795,而BCC模型的相关系数为0.773。因此,采用CCR结果作为预测建模的基础。使用XGBoost、Gradient Boosting Regressor、AdaBoost、Extra Trees Regressor、Random Forest和LightGBM等6个集成学习器预测DEA效率得分,并使用GJO进行超参数调优。使用GJO优化的梯度增强回归器获得了最佳的预测性能,准确地再现了观察到的效率得分。SHAP和特征重要性分析显示,总股本、营业收入和总资产是效率的最具影响力的决定因素。本研究为效率预测提供了一种可扩展和可解释的方法,为动荡的金融市场中的管理者、投资者和政策制定者提供了可操作的见解。
{"title":"Explainable DEA–ensemble approach with golden jackal optimization: efficiency evaluation and prediction for United States information technology firms","authors":"Temitope Olubanjo Kehinde , Azeez A. Oyedele , Morenikeji Kabirat Kareem , Joseph Akpan , Oludolapo A. Olanrewaju","doi":"10.1016/j.mlwa.2025.100798","DOIUrl":"10.1016/j.mlwa.2025.100798","url":null,"abstract":"<div><div>This study presents an integrated Data Envelopment Analysis (DEA) and ensemble learning framework optimized with the Golden Jackal Optimization (GJO) algorithm to evaluate and predict the efficiency of United States information technology firms. Both Constant Returns to Scale and Variable Returns to Scale models were applied to measure firm efficiency and compute scale efficiency, providing a clearer distinction between managerial and scale-related effects. Using data from 3940 firms over the period 2013 to 2023, a robustness test introducing ±20% random noise to a 10% random sample confirmed that the CCR model achieved stronger stability, with a correlation coefficient of 0.795 compared to 0.773 for the BCC model. Consequently, the CCR results were adopted as the basis for predictive modeling. DEA efficiency scores were predicted using six ensemble learners, including XGBoost, Gradient Boosting Regressor, AdaBoost, Extra Trees Regressor, Random Forest, and LightGBM, with GJO employed for hyperparameter tuning. The Gradient Boosting Regressor optimized with GJO achieved the best predictive performance, accurately reproducing the observed efficiency scores. SHAP and feature importance analyses revealed that Total Equity, Operating Income, and Total Assets were the most influential determinants of efficiency. This research contributes a scalable and interpretable approach to efficiency prediction, offering actionable insights for managers, investors, and policymakers in volatile financial markets.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100798"},"PeriodicalIF":4.9,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-28DOI: 10.1016/j.mlwa.2025.100805
Hongsang Lee , Jihun Hwang , Kyungjun Kim , Gyuwon Lee , Chun Kee Chung , Chang-Hwan Im
Speech synthesis from neural signals offers a promising avenue for restoring communication in individuals with speech impairments. Recent deep learning advances have improved decoding of neural activity into intelligible speech, yet further enhancement is required to improve the quality of synthesized speech. Here, we investigate whether an image-to-image translation approach can further refine Mel spectrograms synthesized from electrocorticography (ECoG) signals recorded while participants passively listened to spoken sentences. ECoG data were collected from volunteers performing an auditory speech perception task. A three-layer bidirectional long short-term memory (Bi-LSTM) network was first trained to predict Mel-spectrogram features from neural signals. Comparison with the Conformer model indicated that Bi-LSTM was more effective as the initial synthesis model under our limited data conditions. To further enhance the quality of the Bi-LSTM-synthesized Mel spectrograms, we applied Pix2pixHD, a high-resolution conditional GAN, as a post-processing module. The impact of Pix2pixHD was evaluated using Log-Spectral Distance (LSD), Scale-Invariant Signal-to-Distortion Ratio (SI-SDR), and Short-Time Objective Intelligibility (STOI) comparing outputs against the original ground truth. Furthermore, subjective listening tests (2AFC similarity judgment) were conducted to assess perceptual improvements. Across objective metrics, Pix2pixHD post-processing yielded consistent improvements in spectral fidelity, waveform similarity, and estimated intelligibility (lower LSD, higher SI-SDR and STOI), and subjective tests confirmed significantly enhanced perceived similarity to the original speech. These gains were supported by non-parametric significance testing (Wilcoxon signed-rank test, p < 0.005). The results indicate that high-resolution image-to-image translation is an effective vehicle to refine neural signal-based speech synthesis, complementing sequence models and improving the overall perceived quality of the synthesized speech.
{"title":"Enhanced synthesis of passively heard speech from electrocorticography signals using image-to-image spectrogram translation","authors":"Hongsang Lee , Jihun Hwang , Kyungjun Kim , Gyuwon Lee , Chun Kee Chung , Chang-Hwan Im","doi":"10.1016/j.mlwa.2025.100805","DOIUrl":"10.1016/j.mlwa.2025.100805","url":null,"abstract":"<div><div>Speech synthesis from neural signals offers a promising avenue for restoring communication in individuals with speech impairments. Recent deep learning advances have improved decoding of neural activity into intelligible speech, yet further enhancement is required to improve the quality of synthesized speech. Here, we investigate whether an image-to-image translation approach can further refine Mel spectrograms synthesized from electrocorticography (ECoG) signals recorded while participants passively listened to spoken sentences. ECoG data were collected from volunteers performing an auditory speech perception task. A three-layer bidirectional long short-term memory (Bi-LSTM) network was first trained to predict Mel-spectrogram features from neural signals. Comparison with the Conformer model indicated that Bi-LSTM was more effective as the initial synthesis model under our limited data conditions. To further enhance the quality of the Bi-LSTM-synthesized Mel spectrograms, we applied Pix2pixHD, a high-resolution conditional GAN, as a post-processing module. The impact of Pix2pixHD was evaluated using Log-Spectral Distance (LSD), Scale-Invariant Signal-to-Distortion Ratio (SI-SDR), and Short-Time Objective Intelligibility (STOI) comparing outputs against the original ground truth. Furthermore, subjective listening tests (2AFC similarity judgment) were conducted to assess perceptual improvements. Across objective metrics, Pix2pixHD post-processing yielded consistent improvements in spectral fidelity, waveform similarity, and estimated intelligibility (lower LSD, higher SI-SDR and STOI), and subjective tests confirmed significantly enhanced perceived similarity to the original speech. These gains were supported by non-parametric significance testing (Wilcoxon signed-rank test, <em>p</em> < 0.005). The results indicate that high-resolution image-to-image translation is an effective vehicle to refine neural signal-based speech synthesis, complementing sequence models and improving the overall perceived quality of the synthesized speech.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100805"},"PeriodicalIF":4.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-27DOI: 10.1016/j.mlwa.2025.100801
Xiao Zhang , Srinath Sridharan , Nur Hakim Bin Zahrin , Narayan Venkataraman , Siang Hiong Goh
Accurate forecasting of hospital demand is essential for operational resilience, yet traditional statistical and machine learning approaches often require extensive feature engineering and tuning, limiting adoption in resource-constrained environments. Foundation models for time-series forecasting offer the potential for robust, zero-shot performance across domains. This study evaluates the feasibility of TimeGPT, a general-purpose time-series foundation model, for forecasting daily Emergency Department (ED) arrivals.
We benchmarked TimeGPT against Seasonal Autoregressive Integrated Moving Average (SARIMAX), Prophet, and XGBoost under univariate and multivariate configurations. The experimental design simulated operational constraints by limiting the training window to 30 days and using a rolling forecast over a 60-day holdout period. Forecast accuracy was assessed using root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and directional accuracy.
TimeGPT consistently ranked among the top-performing models. In the univariate setting, it achieved a MAPE of 7.7% and directional accuracy of 75%, comparable to or exceeding traditional models with extensive feature engineering. TimeGPT required no model-specific tuning and maintained accuracy without exogenous features such as weather or calendar variables. SARIMAX achieved the best results in the temporal-plus-weather configuration (MAPE 7.0%, RMSE 31.0) but required substantially more configuration. TimeGPT recorded zero large-error days (>30% deviation), while SARIMAX had 5 such days, underscoring the trade-off between accuracy and robustness.
This benchmark demonstrates that foundation models can deliver accurate, reliable forecasts in healthcare operations with minimal data preparation. TimeGPT’s zero-shot capability highlights its potential as a scalable solution for diverse operational forecasting challenges.
{"title":"Benchmarking a time-series foundation model (TimeGPT) for real-world forecasting applications","authors":"Xiao Zhang , Srinath Sridharan , Nur Hakim Bin Zahrin , Narayan Venkataraman , Siang Hiong Goh","doi":"10.1016/j.mlwa.2025.100801","DOIUrl":"10.1016/j.mlwa.2025.100801","url":null,"abstract":"<div><div>Accurate forecasting of hospital demand is essential for operational resilience, yet traditional statistical and machine learning approaches often require extensive feature engineering and tuning, limiting adoption in resource-constrained environments. Foundation models for time-series forecasting offer the potential for robust, zero-shot performance across domains. This study evaluates the feasibility of TimeGPT, a general-purpose time-series foundation model, for forecasting daily Emergency Department (ED) arrivals.</div><div>We benchmarked TimeGPT against Seasonal Autoregressive Integrated Moving Average (SARIMAX), Prophet, and XGBoost under univariate and multivariate configurations. The experimental design simulated operational constraints by limiting the training window to 30 days and using a rolling forecast over a 60-day holdout period. Forecast accuracy was assessed using root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and directional accuracy.</div><div>TimeGPT consistently ranked among the top-performing models. In the univariate setting, it achieved a MAPE of 7.7% and directional accuracy of 75%, comparable to or exceeding traditional models with extensive feature engineering. TimeGPT required no model-specific tuning and maintained accuracy without exogenous features such as weather or calendar variables. SARIMAX achieved the best results in the temporal-plus-weather configuration (MAPE 7.0%, RMSE 31.0) but required substantially more configuration. TimeGPT recorded zero large-error days (>30% deviation), while SARIMAX had 5 such days, underscoring the trade-off between accuracy and robustness.</div><div>This benchmark demonstrates that foundation models can deliver accurate, reliable forecasts in healthcare operations with minimal data preparation. TimeGPT’s zero-shot capability highlights its potential as a scalable solution for diverse operational forecasting challenges.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100801"},"PeriodicalIF":4.9,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-27DOI: 10.1016/j.mlwa.2025.100804
Ketut Adnyana , Andreas Schwung
Industrial automation in Industry 5.0 demands deterministic, safety-compliant PLC code across heterogeneous vendor ecosystems. Prompt-engineered large language models (LLMs) offer a path forward but require reproducible methods and rigorous validation. This study introduces LLM-PLC-AS, a hybrid, prompt-invariant framework for IEC 61131-3 PLC code generation addressing these needs. We benchmark 21 fixed prompting techniques on 25 real-world use cases (simple, medium, complex), using a standardized dataset and workflow spanning Siemens TIA Portal and Beckhoff TwinCAT. The quality of the generated code is evaluated through a layered validation pipeline: Bilingual Evaluation Understudy (BLEU) for lexical similarity, LLM-in-the-Loop (LITL) for scalable semantic checks across four dimensions (functional correctness, readability, safety compliance, and modularity), and Human-in-the-Loop (HITL) for expert safety-critical review. DeepSeek and Gemini 2.5 Pro generate ST/IL; syntax is cross-checked by ChatGPT-4o and Copilot Pro. The framework achieved a very high degree of accuracy, with Structured Text (ST) programs reaching near-perfect scores and Instruction List (IL) programs also performing exceptionally well on our scoring rubric. This resulted in a substantial reduction in manual correction effort, decreasing it by nearly half compared to ad-hoc methods. Across tasks, our approach led to a more than twofold increase in Safety Compliance and a significant improvement in Functional Correctness against unstructured baselines. A key finding is that the structure of the prompt itself was found to have a greater influence on determinism and correctness than the choice of LLM. The fixed-prompt reasoning combined with the BLEU/LITL/HITL validation stack provides a scalable, reproducible, and safety-aware method for PLC code generation. BLEU is utilized for rapid lexical triage and regression tracking, LITL provides structured semantic verification, and HITL ensures final compliance. The framework establishes a standardized basis for AI-assisted PLC programming and transparent benchmarking. Future work will extend the pipeline to include graphical languages, such as Ladder Diagram (LAD) and Function Block Diagram (FBD), using multimodal/graph-aware models, and will incorporate runtime validation to further close the gap to real-world deployment. Safety verification in this study is limited to logical and semantic validation. Real-time behavior, communication latency, and physical safety-fault recovery require Hardware-in-the-Loop (HIL) simulation or deployment on industrial test benches, which is identified as future work.
{"title":"Benchmarking and validation of prompting techniques for AI-assisted industrial PLC programming","authors":"Ketut Adnyana , Andreas Schwung","doi":"10.1016/j.mlwa.2025.100804","DOIUrl":"10.1016/j.mlwa.2025.100804","url":null,"abstract":"<div><div>Industrial automation in Industry 5.0 demands deterministic, safety-compliant PLC code across heterogeneous vendor ecosystems. Prompt-engineered large language models (LLMs) offer a path forward but require reproducible methods and rigorous validation. This study introduces LLM-PLC-AS, a hybrid, prompt-invariant framework for IEC 61131-3 PLC code generation addressing these needs. We benchmark 21 fixed prompting techniques on 25 real-world use cases (simple, medium, complex), using a standardized dataset and workflow spanning Siemens TIA Portal and Beckhoff TwinCAT. The quality of the generated code is evaluated through a layered validation pipeline: Bilingual Evaluation Understudy (BLEU) for lexical similarity, LLM-in-the-Loop (LITL) for scalable semantic checks across four dimensions (functional correctness, readability, safety compliance, and modularity), and Human-in-the-Loop (HITL) for expert safety-critical review. DeepSeek and Gemini 2.5 Pro generate ST/IL; syntax is cross-checked by ChatGPT-4o and Copilot Pro. The framework achieved a very high degree of accuracy, with Structured Text (ST) programs reaching near-perfect scores and Instruction List (IL) programs also performing exceptionally well on our scoring rubric. This resulted in a substantial reduction in manual correction effort, decreasing it by nearly half compared to ad-hoc methods. Across tasks, our approach led to a more than twofold increase in Safety Compliance and a significant improvement in Functional Correctness against unstructured baselines. A key finding is that the structure of the prompt itself was found to have a greater influence on determinism and correctness than the choice of LLM. The fixed-prompt reasoning combined with the BLEU/LITL/HITL validation stack provides a scalable, reproducible, and safety-aware method for PLC code generation. BLEU is utilized for rapid lexical triage and regression tracking, LITL provides structured semantic verification, and HITL ensures final compliance. The framework establishes a standardized basis for AI-assisted PLC programming and transparent benchmarking. Future work will extend the pipeline to include graphical languages, such as Ladder Diagram (LAD) and Function Block Diagram (FBD), using multimodal/graph-aware models, and will incorporate runtime validation to further close the gap to real-world deployment. Safety verification in this study is limited to logical and semantic validation. Real-time behavior, communication latency, and physical safety-fault recovery require Hardware-in-the-Loop (HIL) simulation or deployment on industrial test benches, which is identified as future work.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100804"},"PeriodicalIF":4.9,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26DOI: 10.1016/j.mlwa.2025.100803
Xin Yu Huang , Venkat Margapuri
Stress is a widespread psychological concern that often manifests alongside conditions such as anxiety and depression. Traditional self-report tools like the Perceived Stress Scale (PSS-10) may not fully capture an individual’s stress experience. This study explores whether integrating multimodal biometric data through video, audio, and transcriptions can enhance stress detection by providing a more comprehensive and interpretive point of view. Participants completed the PSS-10 while being recorded, and emotional features were extracted using machine learning models across the three biometric modalities. Results revealed weak correlations among the modalities, indicating that each captures distinct aspects of stress. Notably, the combined biometric score demonstrated greater sensitivity than the PSS-10 alone, suggesting that multimodal models may detect stress-related states that self-reports overlook. These findings support the development of more comprehensive stress assessment tools, although they are not intended to replace professional clinical evaluation.
{"title":"Exploring multimodal, non-invasive stress assessment through audio-visual and textual cues integrated with psychometric survey data","authors":"Xin Yu Huang , Venkat Margapuri","doi":"10.1016/j.mlwa.2025.100803","DOIUrl":"10.1016/j.mlwa.2025.100803","url":null,"abstract":"<div><div>Stress is a widespread psychological concern that often manifests alongside conditions such as anxiety and depression. Traditional self-report tools like the Perceived Stress Scale (PSS-10) may not fully capture an individual’s stress experience. This study explores whether integrating multimodal biometric data through video, audio, and transcriptions can enhance stress detection by providing a more comprehensive and interpretive point of view. Participants completed the PSS-10 while being recorded, and emotional features were extracted using machine learning models across the three biometric modalities. Results revealed weak correlations among the modalities, indicating that each captures distinct aspects of stress. Notably, the combined biometric score demonstrated greater sensitivity than the PSS-10 alone, suggesting that multimodal models may detect stress-related states that self-reports overlook. These findings support the development of more comprehensive stress assessment tools, although they are not intended to replace professional clinical evaluation.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100803"},"PeriodicalIF":4.9,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145624849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26DOI: 10.1016/j.mlwa.2025.100800
Ali Asghari
As an unsupervised learning method, clustering is a critical technique in artificial intelligence for organizing raw data into meaningful groups. In this process, data is partitioned based on the internal similarity of members within the same cluster and the maximum external distance from other clusters. Beyond business analytics, healthcare, economics, and other fields, clustering has been widely applied across disciplines. Extracting practical knowledge from large datasets relies on an effective clustering technique. Processing speed, especially for large datasets, handling noisy data and outliers, and ensuring high accuracy are the main challenges in clustering. These problems are especially significant in contemporary applications, where heterogeneous and inherently noisy datasets are prevalent. Combining the Trees Social Relation Algorithm (TSR) with the Queue Learning (QL) algorithm, the proposed approach, TQC (Tree-Queue Clustering), addresses these problems. While the QL algorithm enhances clustering accuracy, the TSR method focuses on accelerating clustering. The suggested approach first divides the data into smaller groups. Then, by effectively computing group memberships, TSR's migration process causes clusters to develop progressively. Handling noise and outliers helps the QL algorithm prevent local optima and improve clustering efficiency. This hybrid approach ensures the formation of high-quality clusters and accelerates convergence. The suggested method is validated across several real-world datasets of varying sizes and properties. Experimental results, evaluated using five performance metrics — MICD, ARI, NMI, ET, and ODR — and compared with eight state-of-the-art algorithms, demonstrate the proposed method's superior performance in both speed and accuracy.
{"title":"TQC: An intelligent clustering approach for large-scale, noisy, and imbalanced data","authors":"Ali Asghari","doi":"10.1016/j.mlwa.2025.100800","DOIUrl":"10.1016/j.mlwa.2025.100800","url":null,"abstract":"<div><div>As an unsupervised learning method, clustering is a critical technique in artificial intelligence for organizing raw data into meaningful groups. In this process, data is partitioned based on the internal similarity of members within the same cluster and the maximum external distance from other clusters. Beyond business analytics, healthcare, economics, and other fields, clustering has been widely applied across disciplines. Extracting practical knowledge from large datasets relies on an effective clustering technique. Processing speed, especially for large datasets, handling noisy data and outliers, and ensuring high accuracy are the main challenges in clustering. These problems are especially significant in contemporary applications, where heterogeneous and inherently noisy datasets are prevalent. Combining the Trees Social Relation Algorithm (TSR) with the Queue Learning (QL) algorithm, the proposed approach, TQC (Tree-Queue Clustering), addresses these problems. While the QL algorithm enhances clustering accuracy, the TSR method focuses on accelerating clustering. The suggested approach first divides the data into smaller groups. Then, by effectively computing group memberships, TSR's migration process causes clusters to develop progressively. Handling noise and outliers helps the QL algorithm prevent local optima and improve clustering efficiency. This hybrid approach ensures the formation of high-quality clusters and accelerates convergence. The suggested method is validated across several real-world datasets of varying sizes and properties. Experimental results, evaluated using five performance metrics — MICD, ARI, NMI, ET, and ODR — and compared with eight state-of-the-art algorithms, demonstrate the proposed method's superior performance in both speed and accuracy.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100800"},"PeriodicalIF":4.9,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145624851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-26DOI: 10.1016/j.mlwa.2025.100802
Sajjad Saed, Babak Teimourpour
The rapid expansion of the fashion industry and the growing variety of products have made it increasingly challenging for users to identify compatible items on e-commerce platforms. Effective fashion recommendation systems are therefore crucial for filtering irrelevant options and suggesting suitable ones. However, simultaneously addressing outfit compatibility and personalized recommendations remains a significant challenge, as these aspects are typically treated independently in existing studies, thereby overlooking the complex interactions between items and user preferences. This research introduces a new framework named FGAT, which leverages a hierarchical graph representation together with attention mechanisms to address this problem. The framework constructs a three-tier graph of users, outfits, and items, integrating visual and textual features to jointly model outfit compatibility and user preferences. By dynamically weighting node importance during representation propagation, the graph attention mechanism captures key interactions and produces precise embeddings for both user preferences and outfit compatibility. Evaluated on the POG dataset, FGAT outperforms strong baselines such as HFGN, achieving notable improvements in accuracy, precision, hit ratio (HR), recall, and NDCG. These results demonstrate that combining multimodal visual–textual features with a hierarchical graph structure and attention mechanisms significantly enhances the effectiveness and efficiency of personalized fashion recommendation systems.
{"title":"Hybrid-hierarchical fashion graph attention network for compatibility-oriented and personalized outfit recommendation","authors":"Sajjad Saed, Babak Teimourpour","doi":"10.1016/j.mlwa.2025.100802","DOIUrl":"10.1016/j.mlwa.2025.100802","url":null,"abstract":"<div><div>The rapid expansion of the fashion industry and the growing variety of products have made it increasingly challenging for users to identify compatible items on e-commerce platforms. Effective fashion recommendation systems are therefore crucial for filtering irrelevant options and suggesting suitable ones. However, simultaneously addressing outfit compatibility and personalized recommendations remains a significant challenge, as these aspects are typically treated independently in existing studies, thereby overlooking the complex interactions between items and user preferences. This research introduces a new framework named FGAT, which leverages a hierarchical graph representation together with attention mechanisms to address this problem. The framework constructs a three-tier graph of users, outfits, and items, integrating visual and textual features to jointly model outfit compatibility and user preferences. By dynamically weighting node importance during representation propagation, the graph attention mechanism captures key interactions and produces precise embeddings for both user preferences and outfit compatibility. Evaluated on the POG dataset, FGAT outperforms strong baselines such as HFGN, achieving notable improvements in accuracy, precision, hit ratio (HR), recall, and NDCG. These results demonstrate that combining multimodal visual–textual features with a hierarchical graph structure and attention mechanisms significantly enhances the effectiveness and efficiency of personalized fashion recommendation systems.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100802"},"PeriodicalIF":4.9,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}