Pub Date : 2026-03-01Epub Date: 2025-11-25DOI: 10.1016/j.mlwa.2025.100799
Jeonghoe Lee, Lin Cai
Predicting stock prices is crucial for making informed investment decisions as stock markets significantly influence the global economy. Although previous studies have explored feature importance methods for stock price prediction, comprehensive comparisons of those methods have been limited. This study aims to provide a detailed comparison of different feature importance methods for selecting technical indicators to predict stock prices. Specifically, this research analyzed financial data from the 11 sectors of the NASDAQ. A moving window forecasting framework was implemented to dynamically capture the evolving patterns in financial markets over time. Model-specific feature importance methods were compared with model-agnostic approaches. Multiple machine learning algorithms, including Random Forest (RF), and Multi-layer Neural Network (MNNs), were employed to forecast stock prices. Additionally, extensive hyperparameter tuning was conducted to improve model explainability, contributing to the field of Explainable Artificial Intelligence (XAI). The results highlight the predictive effectiveness of different feature importance methods in selecting optimal technical indicators, thereby offering valuable insights for enhancing stock price forecasting accuracy and model transparency. In summary, this research offers a comprehensive comparison of feature importance methods, emphasizing their application in the selection of technical indicators in a dynamic, rolling prediction setting.
{"title":"Comparing model-specific and model-agnostic features importance methods using machine learning with technical indicators: A NASDAQ sector-based study","authors":"Jeonghoe Lee, Lin Cai","doi":"10.1016/j.mlwa.2025.100799","DOIUrl":"10.1016/j.mlwa.2025.100799","url":null,"abstract":"<div><div>Predicting stock prices is crucial for making informed investment decisions as stock markets significantly influence the global economy. Although previous studies have explored feature importance methods for stock price prediction, comprehensive comparisons of those methods have been limited. This study aims to provide a detailed comparison of different feature importance methods for selecting technical indicators to predict stock prices. Specifically, this research analyzed financial data from the 11 sectors of the NASDAQ. A moving window forecasting framework was implemented to dynamically capture the evolving patterns in financial markets over time. Model-specific feature importance methods were compared with model-agnostic approaches. Multiple machine learning algorithms, including Random Forest (RF), and Multi-layer Neural Network (MNNs), were employed to forecast stock prices. Additionally, extensive hyperparameter tuning was conducted to improve model explainability, contributing to the field of Explainable Artificial Intelligence (XAI). The results highlight the predictive effectiveness of different feature importance methods in selecting optimal technical indicators, thereby offering valuable insights for enhancing stock price forecasting accuracy and model transparency. In summary, this research offers a comprehensive comparison of feature importance methods, emphasizing their application in the selection of technical indicators in a dynamic, rolling prediction setting.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100799"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-12-09DOI: 10.1016/j.mlwa.2025.100818
Sara Fanati Rashidi , Maryam Olfati , Seyedali Mirjalili , Crina Grosan , Jan Platoš , Vaclav Snášel
This study integrates Data Envelopment Analysis (DEA) with Machine Learning (ML) to address key limitations of traditional DEA in identifying reference sets for inefficient Decision-Making Units (DMUs). In DEA, inefficient units are evaluated against benchmark units; however, some benchmarks may be inappropriate or even outliers, which can distort the efficiency frontier. Moreover, when a new DMU is added, the entire model must be recalculated, resulting in high computational costs for large datasets. To overcome these issues, we propose a hybrid approach that combines Fuzzy C-Means (FCM) and Possibilistic Fuzzy C-Means (PFCM) clustering. By leveraging Euclidean distance and membership degrees, the method identifies closer and more relevant reference units, while a sensitivity threshold is introduced to control the number of benchmarks according to practical requirements. The effectiveness of the proposed method is validated on two datasets: a banking dataset and a banknote authentication dataset with 1,372 samples. Results show that the reference sets derived from this ML-based framework achieve 71.6%–98.3% agreement with DEA, while overcoming two major drawbacks: (1) sensitivity to dataset size and (2) inclusion of inappropriate reference units. Furthermore, statistical analyses, including confidence intervals and McNemar’s test, confirm the robustness and practical significance of the findings.
{"title":"A hybrid DEA–fuzzy clustering approach for accurate reference set identification","authors":"Sara Fanati Rashidi , Maryam Olfati , Seyedali Mirjalili , Crina Grosan , Jan Platoš , Vaclav Snášel","doi":"10.1016/j.mlwa.2025.100818","DOIUrl":"10.1016/j.mlwa.2025.100818","url":null,"abstract":"<div><div>This study integrates Data Envelopment Analysis (DEA) with Machine Learning (ML) to address key limitations of traditional DEA in identifying reference sets for inefficient Decision-Making Units (DMUs). In DEA, inefficient units are evaluated against benchmark units; however, some benchmarks may be inappropriate or even outliers, which can distort the efficiency frontier. Moreover, when a new DMU is added, the entire model must be recalculated, resulting in high computational costs for large datasets. To overcome these issues, we propose a hybrid approach that combines Fuzzy C-Means (FCM) and Possibilistic Fuzzy C-Means (PFCM) clustering. By leveraging Euclidean distance and membership degrees, the method identifies closer and more relevant reference units, while a sensitivity threshold is introduced to control the number of benchmarks according to practical requirements. The effectiveness of the proposed method is validated on two datasets: a banking dataset and a banknote authentication dataset with 1,372 samples. Results show that the reference sets derived from this ML-based framework achieve 71.6%–98.3% agreement with DEA, while overcoming two major drawbacks: (1) sensitivity to dataset size and (2) inclusion of inappropriate reference units. Furthermore, statistical analyses, including confidence intervals and McNemar’s test, confirm the robustness and practical significance of the findings.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100818"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145797253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efficient identification of informative inputs is critical when training Machine Learning (ML) surrogates on large, multi-sensor datasets. In this paper, we benchmark several input selection methods from the literature alongside new methods proposed here. A baseline method based on expert-driven (human) selection is used as a reference. All methods are evaluated on a challenging inverse problem, in which Computational Fluid Dynamic (CFD) simulations are used to train a Deep Neural Network (DNN) to infer unknown momentum source terms from discrete velocity measurements. The proposed methodology does not explicitly depend on the geometry of the domain and is therefore transferable to other problems involving sparse sensor measurements, although domain-specific validation may still be required. The results show that four input selection methods reduce the number of inputs to as few as five, with minimal impact on the mean average predictive error. This corresponds to a forty-fold reduction relative to the original number of inputs. Analysis of the top four inputs shows that each method selects different locations, indicating that multiple combinations can yield similar accurate results. The top four methods significantly outperform the baseline method based on human selection. This study demonstrates that input selection methods reduce computational costs during both training and inference stages. They also lower experimental demands by identifying high-value sensor locations, thereby reducing the number of required sampling points. These findings suggest that input selection methods should be considered standard practice in ML applications with complex scenarios constrained by limited experimental data.
{"title":"Comparison of input selection methods for neural networks applied to complex fluid dynamic inverse problem","authors":"Jaume Luis-Gómez , Guillem Monrós-Andreu , Sergio Iserte , Sergio Chiva , Raúl Martínez-Cuenca","doi":"10.1016/j.mlwa.2026.100842","DOIUrl":"10.1016/j.mlwa.2026.100842","url":null,"abstract":"<div><div>Efficient identification of informative inputs is critical when training Machine Learning (ML) surrogates on large, multi-sensor datasets. In this paper, we benchmark several input selection methods from the literature alongside new methods proposed here. A baseline method based on expert-driven (human) selection is used as a reference. All methods are evaluated on a challenging inverse problem, in which Computational Fluid Dynamic (CFD) simulations are used to train a Deep Neural Network (DNN) to infer unknown momentum source terms from discrete velocity measurements. The proposed methodology does not explicitly depend on the geometry of the domain and is therefore transferable to other problems involving sparse sensor measurements, although domain-specific validation may still be required. The results show that four input selection methods reduce the number of inputs to as few as five, with minimal impact on the mean average predictive error. This corresponds to a forty-fold reduction relative to the original number of inputs. Analysis of the top four inputs shows that each method selects different locations, indicating that multiple combinations can yield similar accurate results. The top four methods significantly outperform the baseline method based on human selection. This study demonstrates that input selection methods reduce computational costs during both training and inference stages. They also lower experimental demands by identifying high-value sensor locations, thereby reducing the number of required sampling points. These findings suggest that input selection methods should be considered standard practice in ML applications with complex scenarios constrained by limited experimental data.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100842"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-31DOI: 10.1016/j.mlwa.2026.100858
Anthony Chan Chan
Anomaly detection in multispectral imagery must cope with high-dimensional inputs, scarce labeled anomalies and operational constraints. Tensor decompositions offer a structured way to compress such data, but their impact on anomaly detection performance and cost is not well quantified. This work studies how low-rank tensor representations affect one-class detectors on multispectral imagery.
Two decomposition strategies are evaluated as feature extractors: a global CP (PARAFAC) model fitted on training tiles and a per-tile Tucker model. The resulting coefficients are used to train one-class support vector machines, autoencoders and isolation forests on typical examples only, and anomalies are identified through their detector scores. The study uses multispectral Mastcam images from the Mars Science Laboratory Curiosity rover, a dataset with rare labeled novelties.
Experiments cover two evaluation regimes (randomized across subclasses and subclass-specific) and four training sizes . Under randomized sampling, CP and Tucker improve ROC–AUC over PCA for OC-SVM by approximately 10 to 15 percent at , while autoencoder gains span approximately 2 to 13 percent depending on decomposition and sample size. In subclass-specific tests for structured subclasses (drill-hole, DRT, dump-pile), CP and Tucker yield larger improvements at , with absolute ROC–AUC increases over PCA ranging from approximately 14 to 56 percent, whereas for visually homogeneous subclasses (bedrock, broken rock, float, veins) decompositions rarely improve over PCA and can reduce performance. Computationally, CP can require peak RSS memory above 50 GB, whereas Tucker often remains below 10 GB (8.67 GB vs 52.53 GB in the reported runs, a 84% reduction), albeit with longer runtimes. Overall, the results indicate that tensor decompositions are most valuable as selective enhancements to PCA in multispectral anomaly detection pipelines when multiway structure is informative and training data are limited.
多光谱图像中的异常检测必须处理高维输入、标记异常稀少和操作限制等问题。张量分解提供了一种结构化的方法来压缩这些数据,但它们对异常检测性能和成本的影响并没有很好地量化。这项工作研究了低秩张量表示如何影响多光谱图像上的一类探测器。两种分解策略被评估为特征提取器:一个全局CP (PARAFAC)模型拟合训练块和一个逐块Tucker模型。所得系数仅用于在典型示例上训练一类支持向量机、自动编码器和隔离森林,并通过其检测器得分识别异常。这项研究使用了火星科学实验室好奇号火星车的多光谱Mastcam图像,这是一个罕见的标记新颖的数据集。实验涵盖两种评估机制(随机跨子类和特定子类)和四种训练大小n∈{500,1000,1500,3000}。在随机抽样下,CP和Tucker在n≤1500时将OC-SVM的ROC-AUC提高了约10%至15%,而自动编码器的增益范围约为2%至13%,具体取决于分解和样本大小。在针对结构化亚类(钻孔、DRT、排土堆)的亚类特定测试中,CP和Tucker在n≤1500时产生更大的改进,相对于PCA的绝对oc - auc增加幅度约为14%至56%,而对于视觉上均匀的亚类(基岩、破碎岩石、浮子、矿体),分解很少比PCA改善,而且会降低性能。在计算上,尽管运行时间更长,但CP可能需要超过50 GB的峰值RSS内存,而Tucker通常保持在10 GB以下(8.67 GB vs 52.53 GB,减少了84%)。总的来说,结果表明,当多路结构信息丰富且训练数据有限时,张量分解作为PCA的选择性增强在多光谱异常检测管道中最有价值。
{"title":"From tensors to novelties: Low-dimensional representations for anomaly detection in multispectral imagery","authors":"Anthony Chan Chan","doi":"10.1016/j.mlwa.2026.100858","DOIUrl":"10.1016/j.mlwa.2026.100858","url":null,"abstract":"<div><div>Anomaly detection in multispectral imagery must cope with high-dimensional inputs, scarce labeled anomalies and operational constraints. Tensor decompositions offer a structured way to compress such data, but their impact on anomaly detection performance and cost is not well quantified. This work studies how low-rank tensor representations affect one-class detectors on multispectral imagery.</div><div>Two decomposition strategies are evaluated as feature extractors: a global CP (PARAFAC) model fitted on training tiles and a per-tile Tucker model. The resulting coefficients are used to train one-class support vector machines, autoencoders and isolation forests on typical examples only, and anomalies are identified through their detector scores. The study uses multispectral Mastcam images from the Mars Science Laboratory Curiosity rover, a dataset with rare labeled novelties.</div><div>Experiments cover two evaluation regimes (randomized across subclasses and subclass-specific) and four training sizes <span><math><mrow><mi>n</mi><mo>∈</mo><mrow><mo>{</mo><mn>500</mn><mo>,</mo><mn>1000</mn><mo>,</mo><mn>1500</mn><mo>,</mo><mn>3000</mn><mo>}</mo></mrow></mrow></math></span>. Under randomized sampling, CP and Tucker improve ROC–AUC over PCA for OC-SVM by approximately 10 to 15 percent at <span><math><mrow><mi>n</mi><mo>≤</mo><mn>1500</mn></mrow></math></span>, while autoencoder gains span approximately 2 to 13 percent depending on decomposition and sample size. In subclass-specific tests for structured subclasses (drill-hole, DRT, dump-pile), CP and Tucker yield larger improvements at <span><math><mrow><mi>n</mi><mo>≤</mo><mn>1500</mn></mrow></math></span>, with absolute ROC–AUC increases over PCA ranging from approximately 14 to 56 percent, whereas for visually homogeneous subclasses (bedrock, broken rock, float, veins) decompositions rarely improve over PCA and can reduce performance. Computationally, CP can require peak RSS memory above 50 GB, whereas Tucker often remains below 10 GB (8.67 GB vs 52.53 GB in the reported runs, a 84% reduction), albeit with longer runtimes. Overall, the results indicate that tensor decompositions are most valuable as selective enhancements to PCA in multispectral anomaly detection pipelines when multiway structure is informative and training data are limited.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100858"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146187665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-11-25DOI: 10.1016/j.mlwa.2025.100797
Nicholas Maurer, Mohammed Abdallah
This work presents a novel adaptive framework for soft error mitigation in space-based systems, designed to resolve the fundamental conflict between system performance and radiation protection. By leveraging a Long Short-Term Memory (LSTM) model to predict real-time solar particle flux, our approach dynamically enables or disables software-based mitigation techniques. This contrasts with the static, "always-on" methods of existing systems, offering a significant improvement in computational efficiency. The proposed LSTM model was trained on NASA solar particle flux data, achieving a mean average error of 7.65e-6, demonstrating its high accuracy in predicting nonlinear particle events. Our simulation, which applies this predictive model to a tiered system of redundant processing, checkpointing, and watchdog timers, shows a substantial reduction in overhead. During the 18,414-second test period, the combined adaptive mitigation methods introduced only 20.75–51.6 s of overhead, representing a 99.4 % reduction in overhead compared to continuous, static mitigation. This research's primary contribution is a demonstrated proof-of-concept for an intelligent, self-adaptive system that can maintain high reliability while drastically improving performance. This approach provides a pathway for utilizing more cost-effective commercial-off-the-shelf (COTS) processors in radiation-intensive environments.
{"title":"Machine learning based adaptive soft error mitigation efficiency","authors":"Nicholas Maurer, Mohammed Abdallah","doi":"10.1016/j.mlwa.2025.100797","DOIUrl":"10.1016/j.mlwa.2025.100797","url":null,"abstract":"<div><div>This work presents a novel adaptive framework for soft error mitigation in space-based systems, designed to resolve the fundamental conflict between system performance and radiation protection. By leveraging a Long Short-Term Memory (LSTM) model to predict real-time solar particle flux, our approach dynamically enables or disables software-based mitigation techniques. This contrasts with the static, \"always-on\" methods of existing systems, offering a significant improvement in computational efficiency. The proposed LSTM model was trained on NASA solar particle flux data, achieving a mean average error of 7.65e-6, demonstrating its high accuracy in predicting nonlinear particle events. Our simulation, which applies this predictive model to a tiered system of redundant processing, checkpointing, and watchdog timers, shows a substantial reduction in overhead. During the 18,414-second test period, the combined adaptive mitigation methods introduced only 20.75–51.6 s of overhead, representing a 99.4 % reduction in overhead compared to continuous, static mitigation. This research's primary contribution is a demonstrated proof-of-concept for an intelligent, self-adaptive system that can maintain high reliability while drastically improving performance. This approach provides a pathway for utilizing more cost-effective commercial-off-the-shelf (COTS) processors in radiation-intensive environments.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100797"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145624850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-11-26DOI: 10.1016/j.mlwa.2025.100800
Ali Asghari
As an unsupervised learning method, clustering is a critical technique in artificial intelligence for organizing raw data into meaningful groups. In this process, data is partitioned based on the internal similarity of members within the same cluster and the maximum external distance from other clusters. Beyond business analytics, healthcare, economics, and other fields, clustering has been widely applied across disciplines. Extracting practical knowledge from large datasets relies on an effective clustering technique. Processing speed, especially for large datasets, handling noisy data and outliers, and ensuring high accuracy are the main challenges in clustering. These problems are especially significant in contemporary applications, where heterogeneous and inherently noisy datasets are prevalent. Combining the Trees Social Relation Algorithm (TSR) with the Queue Learning (QL) algorithm, the proposed approach, TQC (Tree-Queue Clustering), addresses these problems. While the QL algorithm enhances clustering accuracy, the TSR method focuses on accelerating clustering. The suggested approach first divides the data into smaller groups. Then, by effectively computing group memberships, TSR's migration process causes clusters to develop progressively. Handling noise and outliers helps the QL algorithm prevent local optima and improve clustering efficiency. This hybrid approach ensures the formation of high-quality clusters and accelerates convergence. The suggested method is validated across several real-world datasets of varying sizes and properties. Experimental results, evaluated using five performance metrics — MICD, ARI, NMI, ET, and ODR — and compared with eight state-of-the-art algorithms, demonstrate the proposed method's superior performance in both speed and accuracy.
{"title":"TQC: An intelligent clustering approach for large-scale, noisy, and imbalanced data","authors":"Ali Asghari","doi":"10.1016/j.mlwa.2025.100800","DOIUrl":"10.1016/j.mlwa.2025.100800","url":null,"abstract":"<div><div>As an unsupervised learning method, clustering is a critical technique in artificial intelligence for organizing raw data into meaningful groups. In this process, data is partitioned based on the internal similarity of members within the same cluster and the maximum external distance from other clusters. Beyond business analytics, healthcare, economics, and other fields, clustering has been widely applied across disciplines. Extracting practical knowledge from large datasets relies on an effective clustering technique. Processing speed, especially for large datasets, handling noisy data and outliers, and ensuring high accuracy are the main challenges in clustering. These problems are especially significant in contemporary applications, where heterogeneous and inherently noisy datasets are prevalent. Combining the Trees Social Relation Algorithm (TSR) with the Queue Learning (QL) algorithm, the proposed approach, TQC (Tree-Queue Clustering), addresses these problems. While the QL algorithm enhances clustering accuracy, the TSR method focuses on accelerating clustering. The suggested approach first divides the data into smaller groups. Then, by effectively computing group memberships, TSR's migration process causes clusters to develop progressively. Handling noise and outliers helps the QL algorithm prevent local optima and improve clustering efficiency. This hybrid approach ensures the formation of high-quality clusters and accelerates convergence. The suggested method is validated across several real-world datasets of varying sizes and properties. Experimental results, evaluated using five performance metrics — MICD, ARI, NMI, ET, and ODR — and compared with eight state-of-the-art algorithms, demonstrate the proposed method's superior performance in both speed and accuracy.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100800"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145624851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In many domains, including online education, healthcare, security, and human–computer interaction, facial emotion recognition (FER) is essential. Real-world FER is still difficult because of factors like head positions, occlusions, illumination shifts, and demographic diversity. Engagement detection system, which is essential in virtual learning platforms is severely challenged by these factors. In this article, we propose ExpressNet-MoE, a novel hybrid deep learning architecture that combines Convolutional Neural Networks (CNNs) with a Mixture of Experts (MoE) framework to address these challenges. The proposed model dynamically selects the most relevant expert networks for each input, thereby improving generalization and adaptability across diverse datasets. Our methodology involves training ExpressNet-MoE independently on several benchmark datasets after preprocessing facial pictures using BlazeFace for face detection and alignment. To maintain class distribution, stratified sampling is used to divide each dataset into training and testing groups. Our model improves on the accuracy of emotion recognition by utilizing multi-scale feature extraction to collect both global and local facial features. ExpressNet-MoE includes numerous CNN-based feature extractors, a MoE module for adaptive feature selection, and finally a residual network backbone for deep feature learning. To demonstrate efficacy of our proposed model we evaluated it on four widely used datasets: , , RAF-DB, and FER-2013; and compared with current state-of-the-art methods. Our model achieves accuracies of 74.40% 0.45 on , 71.98% 0.66 on , 83.41% 1.06 on RAF-DB, and 67.05% 2.08 on FER-2013. Overall, the findings indicate that the adaptive expert selection and multi-scale feature extraction significantly enhances the robustness of facial emotion recognition across diverse real-world conditions and how it may be used to develop end-to-end emotion recognition systems in practical settings. Reproducible codes and results are made publicly accessible at https://github.com/DeeptimaanB/ExpressNet-MoE.
{"title":"ExpressNet-MoE: A hybrid deep neural network for emotion recognition","authors":"Deeptimaan Banerjee, Prateek Gothwal, Ashis Kumer Biswas","doi":"10.1016/j.mlwa.2025.100830","DOIUrl":"10.1016/j.mlwa.2025.100830","url":null,"abstract":"<div><div>In many domains, including online education, healthcare, security, and human–computer interaction, facial emotion recognition (FER) is essential. Real-world FER is still difficult because of factors like head positions, occlusions, illumination shifts, and demographic diversity. Engagement detection system, which is essential in virtual learning platforms is severely challenged by these factors. In this article, we propose ExpressNet-MoE, a novel hybrid deep learning architecture that combines Convolutional Neural Networks (CNNs) with a Mixture of Experts (MoE) framework to address these challenges. The proposed model dynamically selects the most relevant expert networks for each input, thereby improving generalization and adaptability across diverse datasets. Our methodology involves training ExpressNet-MoE independently on several benchmark datasets after preprocessing facial pictures using BlazeFace for face detection and alignment. To maintain class distribution, stratified sampling is used to divide each dataset into training and testing groups. Our model improves on the accuracy of emotion recognition by utilizing multi-scale feature extraction to collect both global and local facial features. ExpressNet-MoE includes numerous CNN-based feature extractors, a MoE module for adaptive feature selection, and finally a residual network backbone for deep feature learning. To demonstrate efficacy of our proposed model we evaluated it on four widely used datasets: <span><math><msub><mrow><mtext>AffectNet</mtext></mrow><mrow><mn>7</mn></mrow></msub></math></span>, <span><math><msub><mrow><mtext>AffectNet</mtext></mrow><mrow><mn>8</mn></mrow></msub></math></span>, RAF-DB, and FER-2013; and compared with current state-of-the-art methods. Our model achieves accuracies of 74.40% <span><math><mo>±</mo></math></span> 0.45 on <span><math><msub><mrow><mtext>AffectNet</mtext></mrow><mrow><mn>7</mn></mrow></msub></math></span>, 71.98% <span><math><mo>±</mo></math></span> 0.66 on <span><math><msub><mrow><mtext>AffectNet</mtext></mrow><mrow><mn>8</mn></mrow></msub></math></span>, 83.41% <span><math><mo>±</mo></math></span> 1.06 on RAF-DB, and 67.05% <span><math><mo>±</mo></math></span> 2.08 on FER-2013. Overall, the findings indicate that the adaptive expert selection and multi-scale feature extraction significantly enhances the robustness of facial emotion recognition across diverse real-world conditions and how it may be used to develop end-to-end emotion recognition systems in practical settings. Reproducible codes and results are made publicly accessible at <span><span>https://github.com/DeeptimaanB/ExpressNet-MoE</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100830"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-12-30DOI: 10.1016/j.mlwa.2025.100833
Chandramohan Abhishek , Nadimpalli Raghukiran
The present research showcases a machine-interactive approach for making decisions using a pre-trained natural language processing (NLP) model. The method is developed for 4D (4-dimensional) printing technique selection, as a plurality of variables is involved, such as process, material, design, and sequence selections. Due to the availability of numerous options, arriving at a preferred choice of technique requires expertise and time. The developed method aids in finding assistance from a single source. The approach incorporates bidirectional encoder representations from transformers (BERT), which accommodates parallel meanings of user requests, such as synonyms and adjectives, among others. The closed-loop system is programmed with a set of 7 prompts. It also introduces additional affirmation prompts to navigate both ambiguous phrasing and out-of-scope detection in order to receive a meaningful recommendation from the machine. The rule-governed technique (lightweight rule set) guides the selection of the conformable request during each prompt. The inference-based approach takes user requests, performs objective classification using BERT according to selected criteria, then dynamically filters the data, and recommends suggestions, with an inference time of 0.79 s. The modified model also establishes multi-level relationships among prompts for text classification. k-fold validation reached highest possible accuracy upon training with optimal hyperparameters. The fine-tuned method developed in Python environment can be generalized for other systems. The present research demonstrates the possibility of adapting an openly accessible model for developing a decision-assistance system with minimal personal computational resources.
{"title":"Machine-interactive decision-assistance using a pre-trained natural language processing model for 4D printing technique selection","authors":"Chandramohan Abhishek , Nadimpalli Raghukiran","doi":"10.1016/j.mlwa.2025.100833","DOIUrl":"10.1016/j.mlwa.2025.100833","url":null,"abstract":"<div><div>The present research showcases a machine-interactive approach for making decisions using a pre-trained natural language processing (NLP) model. The method is developed for 4D (4-dimensional) printing technique selection, as a plurality of variables is involved, such as process, material, design, and sequence selections. Due to the availability of numerous options, arriving at a preferred choice of technique requires expertise and time. The developed method aids in finding assistance from a single source. The approach incorporates bidirectional encoder representations from transformers (BERT), which accommodates parallel meanings of user requests, such as synonyms and adjectives, among others. The closed-loop system is programmed with a set of 7 prompts. It also introduces additional affirmation prompts to navigate both ambiguous phrasing and out-of-scope detection in order to receive a meaningful recommendation from the machine. The rule-governed technique (lightweight rule set) guides the selection of the conformable request during each prompt. The inference-based approach takes user requests, performs objective classification using BERT according to selected criteria, then dynamically filters the data, and recommends suggestions, with an inference time of 0.79 s. The modified model also establishes multi-level relationships among prompts for text classification. k-fold validation reached highest possible accuracy upon training with optimal hyperparameters. The fine-tuned method developed in Python environment can be generalized for other systems. The present research demonstrates the possibility of adapting an openly accessible model for developing a decision-assistance system with minimal personal computational resources.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100833"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145925172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Conducting prior patent searches before developing technologies and filing patent applications in companies or universities is essential for understanding technological trends among competitors and academic institutions, as well as for increasing the likelihood of obtaining patent rights. In these searches, it is important not only to include relevant keywords in the search queries but also to incorporate related terms retrieved from a thesaurus. To support this, methods using word embeddings for automatically extracting such synonyms have recently been proposed. However, patent documents often contain unique expressions and compound terms, such as specialized technical terminology and abstract conceptual terms, which are difficult to accurately capture using existing large language models trained at the token level.
In this study, we investigate a method for extracting synonyms from patent documents by embedding the definition sentences that explain technical terms. The experimental results demonstrate that the proposed method achieves more precise synonym extraction than conventional word embedding approaches, and it can contribute to the expansion of existing thesauri.
Thus, this research is expected to improve the recall of prior art searches and support the automatic extraction of technical elements for identifying technological trends.
{"title":"Synonym extraction from Japanese patent documents using term definition sentences","authors":"Koji Marusaki , Seiya Kawano , Asahi Hentona , Hirofumi Nonaka","doi":"10.1016/j.mlwa.2026.100848","DOIUrl":"10.1016/j.mlwa.2026.100848","url":null,"abstract":"<div><div>Conducting prior patent searches before developing technologies and filing patent applications in companies or universities is essential for understanding technological trends among competitors and academic institutions, as well as for increasing the likelihood of obtaining patent rights. In these searches, it is important not only to include relevant keywords in the search queries but also to incorporate related terms retrieved from a thesaurus. To support this, methods using word embeddings for automatically extracting such synonyms have recently been proposed. However, patent documents often contain unique expressions and compound terms, such as specialized technical terminology and abstract conceptual terms, which are difficult to accurately capture using existing large language models trained at the token level.</div><div>In this study, we investigate a method for extracting synonyms from patent documents by embedding the definition sentences that explain technical terms. The experimental results demonstrate that the proposed method achieves more precise synonym extraction than conventional word embedding approaches, and it can contribute to the expansion of existing thesauri.</div><div>Thus, this research is expected to improve the recall of prior art searches and support the automatic extraction of technical elements for identifying technological trends.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100848"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2025-12-29DOI: 10.1016/j.mlwa.2025.100828
Jungmin Eom , Minjun Kang , Myungkeun Yoon , Nikil Dutt , Jinkyu Kim , Jaekoo Lee
Deep learning-based medical AI systems are increasingly deployed for disease diagnosis in decentralized healthcare environments where data are siloed across hospitals and IoT devices and cannot be freely shared due to strict privacy and security regulations. However, most existing continual learning and distributed learning approaches either assume centrally aggregated data or overlook incremental clinical changes, leading to catastrophic forgetting when applied to real-world medical data streams.
This paper introduces a novel healthcare-specific framework that integrates continual learning and distributed learning methods to utilize medical AI models effectively by addressing the practical constraints of the healthcare and medical ecosystem, such as data privacy, security, and changing clinical environments. Through the proposed framework, medical clients, such as hospital devices and IoT-based smart devices, can collaboratively train deep learning-based models on distributed computing resources without sharing sensitive data. Additionally, by considering incremental characteristics in medical environments such as mutations, new diseases, and abnormalities, the proposed framework can improve the disease diagnosis of medical AI models in actual clinical scenarios.
We propose Privacy-preserving Rehearsal-based Continual Split Learning (PRCSL), a healthcare-specific continual split learning framework that combines differential-privacy-based exemplar sharing, a mutual information alignment (MIA) module to correct representation shifts induced by noisy exemplars, and a parameter-free nearest-mean-of-exemplars (NME) classifier to mitigate task-recency bias under non-IID data distributions. o=Across eight benchmark datasets, including four MedMNIST subsets, HAM10000, CCH5000, c=CIFAR,cp=, p=100, and SVHN, PRCSL achieves competitive performance compared with representative continual learning baselines in terms of average accuracy and average forgetting. In particular, PRCSL achieves up to 3.62%p higher average accuracy than the best baseline. These results indicate that PRCSL enables privacy-preserving, communication-efficient, and continually adaptable medical AI in realistic decentralized clinical and IoT-enabled ecosystems. Our code is publicly available at our repository.
{"title":"PRCSL: A privacy-preserving continual split learning framework for decentralized medical diagnosis","authors":"Jungmin Eom , Minjun Kang , Myungkeun Yoon , Nikil Dutt , Jinkyu Kim , Jaekoo Lee","doi":"10.1016/j.mlwa.2025.100828","DOIUrl":"10.1016/j.mlwa.2025.100828","url":null,"abstract":"<div><div>Deep learning-based medical AI systems are increasingly deployed for disease diagnosis in decentralized healthcare environments where data are siloed across hospitals and IoT devices and cannot be freely shared due to strict privacy and security regulations. However, most existing continual learning and distributed learning approaches either assume centrally aggregated data or overlook incremental clinical changes, leading to catastrophic forgetting when applied to real-world medical data streams.</div><div>This paper introduces a novel healthcare-specific framework that integrates continual learning and distributed learning methods to utilize medical AI models effectively by addressing the practical constraints of the healthcare and medical ecosystem, such as data privacy, security, and changing clinical environments. Through the proposed framework, medical clients, such as hospital devices and IoT-based smart devices, can collaboratively train deep learning-based models on distributed computing resources without sharing sensitive data. Additionally, by considering incremental characteristics in medical environments such as mutations, new diseases, and abnormalities, the proposed framework can improve the disease diagnosis of medical AI models in actual clinical scenarios.</div><div>We propose Privacy-preserving Rehearsal-based Continual Split Learning (PRCSL), a healthcare-specific continual split learning framework that combines differential-privacy-based exemplar sharing, a mutual information alignment (MIA) module to correct representation shifts induced by noisy exemplars, and a parameter-free nearest-mean-of-exemplars (NME) classifier to mitigate task-recency bias under non-IID data distributions. o=Across eight benchmark datasets, including four MedMNIST subsets, HAM10000, CCH5000, c=CIFAR,cp=, p=100, and SVHN, PRCSL achieves competitive performance compared with representative continual learning baselines in terms of average accuracy and average forgetting. In particular, PRCSL achieves up to 3.62%p higher average accuracy than the best baseline. These results indicate that PRCSL enables privacy-preserving, communication-efficient, and continually adaptable medical AI in realistic decentralized clinical and IoT-enabled ecosystems. Our code is publicly available at our repository.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100828"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}