首页 > 最新文献

Machine learning with applications最新文献

英文 中文
Enhancing skin cancer diagnosis using late discrete wavelet transform and new swarm-based optimizers 用晚期离散小波变换和新的群体优化器增强皮肤癌诊断
IF 4.9 Pub Date : 2025-12-03 DOI: 10.1016/j.mlwa.2025.100811
Ramin Mousa , Saeed Chamani , Mohammad Morsali , Mohammad Kazzazi , Parsa Hatami , Soroush Sarabi
Skin cancer (SC) is a life-threatening disease where early diagnosis is critical for effective treatment and survival. While deep learning (DL) has advanced skin cancer diagnosis (SCD), current methods generally yield suboptimal accuracy and efficiency due to challenges in extracting multiscale features from dermoscopic images and optimizing complex model parameters through efficient exploration of the space of hyperparameters. To address this, we propose an approach integrating late Discrete Wavelet Transform (DWT) with pre-trained convolutional neural networks (CNNs) and swarm-based optimization. The late DWT decomposes CNN-extracted feature maps into low- and high-frequency components to improve the detection of subtle lesion patterns, while a self-attention mechanism further refines this by weighing feature importance, focusing on relevant diagnostic information. To refine hyperparameters, three novel swarm-based optimizers – Modified Gorilla Troops Optimizer (MGTO), Improved Gray Wolf Optimization (IGWO), and Fox Optimization (FOX) – are employed searching the space of the hyperparameters to fine-tune the model for superior performance. In comparison to existing methods, experiments on the ISIC-2016 and ISIC-2017 datasets show enhanced classification performance, obtaining at least a 1% accuracy gain. Thus, the suggested framework offers a reliable and effective way to diagnose skin cancer automatically.
皮肤癌(SC)是一种危及生命的疾病,早期诊断对有效治疗和生存至关重要。虽然深度学习(DL)具有先进的皮肤癌诊断(SCD),但由于从皮肤镜图像中提取多尺度特征以及通过有效探索超参数空间来优化复杂模型参数的挑战,目前的方法通常产生次优的准确性和效率。为了解决这个问题,我们提出了一种将晚期离散小波变换(DWT)与预训练卷积神经网络(cnn)和基于群的优化相结合的方法。后期DWT将cnn提取的特征映射分解为低频和高频分量,以提高对细微病变模式的检测,而自关注机制通过权衡特征重要性进一步细化,专注于相关的诊断信息。为了优化超参数,采用了三种新的基于群体的优化器-改进的大猩猩部队优化器(MGTO),改进的灰狼优化器(IGWO)和狐狸优化器(Fox) -搜索超参数的空间来微调模型以获得更好的性能。与现有方法相比,在ISIC-2016和ISIC-2017数据集上的实验表明,该方法的分类性能得到了提高,准确率至少提高了1%。因此,该框架提供了一种可靠有效的皮肤癌自动诊断方法。
{"title":"Enhancing skin cancer diagnosis using late discrete wavelet transform and new swarm-based optimizers","authors":"Ramin Mousa ,&nbsp;Saeed Chamani ,&nbsp;Mohammad Morsali ,&nbsp;Mohammad Kazzazi ,&nbsp;Parsa Hatami ,&nbsp;Soroush Sarabi","doi":"10.1016/j.mlwa.2025.100811","DOIUrl":"10.1016/j.mlwa.2025.100811","url":null,"abstract":"<div><div>Skin cancer (SC) is a life-threatening disease where early diagnosis is critical for effective treatment and survival. While deep learning (DL) has advanced skin cancer diagnosis (SCD), current methods generally yield suboptimal accuracy and efficiency due to challenges in extracting multiscale features from dermoscopic images and optimizing complex model parameters through efficient exploration of the space of hyperparameters. To address this, we propose an approach integrating late Discrete Wavelet Transform (DWT) with pre-trained convolutional neural networks (CNNs) and swarm-based optimization. The late DWT decomposes CNN-extracted feature maps into low- and high-frequency components to improve the detection of subtle lesion patterns, while a self-attention mechanism further refines this by weighing feature importance, focusing on relevant diagnostic information. To refine hyperparameters, three novel swarm-based optimizers – Modified Gorilla Troops Optimizer (MGTO), Improved Gray Wolf Optimization (IGWO), and Fox Optimization (FOX) – are employed searching the space of the hyperparameters to fine-tune the model for superior performance. In comparison to existing methods, experiments on the ISIC-2016 and ISIC-2017 datasets show enhanced classification performance, obtaining at least a 1% accuracy gain. Thus, the suggested framework offers a reliable and effective way to diagnose skin cancer automatically.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100811"},"PeriodicalIF":4.9,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ISO-DeTr: A novel detection transformer for industrial small object detection ISO-DeTr:一种用于工业小物体检测的新型检测变压器
IF 4.9 Pub Date : 2025-12-02 DOI: 10.1016/j.mlwa.2025.100809
Faisal Saeed , Anand Paul
Effectively detecting and assessing real-time structural and ecological parameters in contemporary manufacturing environments poses significant challenges, particularly in identifying minute objects within product images. The swift evolution of the industrial sector underscores the necessity for intelligent manufacturing environments to uphold stringent product quality standards. However, accelerating production processes at high speeds heightens the risk of defective product outcomes. This research addresses the challenges inherent in small object detection within industrial contexts, proposing an innovative detection transformer model tailored to modern manufacturing environments. The proposed model integrates a feature-enhanced multi-head self-attention block (FEMSA), merging cross-channel communication network and multiple multi-head self-attention (MSA) components to refine image features. A query proposal network is also introduced within the detection transformer framework to discern high-ranking proposals using Intersection over Union (IoU) and Non-Maximum Suppression (NMS) algorithms. Through extensive experimentation on custom industrial small objects, our proposed model demonstrates superior performance compared to existing models based on Non-Maximum Suppression and transformers. By tackling the challenges associated with small object detection, our model contributes to the dynamic synchronization between virtual and physical manufacturing realms, enhancing quality control in industrial production.
在当代制造环境中,有效地检测和评估实时结构和生态参数提出了重大挑战,特别是在识别产品图像中的微小物体方面。工业部门的快速发展强调了智能制造环境维护严格的产品质量标准的必要性。然而,高速加速生产过程会增加产品缺陷的风险。本研究解决了工业环境中小物体检测固有的挑战,提出了一种适合现代制造环境的创新检测变压器模型。该模型集成了一个特征增强的多头自注意块(FEMSA),融合了跨信道通信网络和多个多头自注意(MSA)组件来细化图像特征。在检测变压器框架中还引入了一个查询提议网络,该网络使用交联(IoU)和非最大抑制(NMS)算法来识别高级提议。通过在定制工业小型对象上的大量实验,与基于非最大抑制和变压器的现有模型相比,我们提出的模型表现出优越的性能。通过解决与小物体检测相关的挑战,我们的模型有助于虚拟和物理制造领域之间的动态同步,增强工业生产中的质量控制。
{"title":"ISO-DeTr: A novel detection transformer for industrial small object detection","authors":"Faisal Saeed ,&nbsp;Anand Paul","doi":"10.1016/j.mlwa.2025.100809","DOIUrl":"10.1016/j.mlwa.2025.100809","url":null,"abstract":"<div><div>Effectively detecting and assessing real-time structural and ecological parameters in contemporary manufacturing environments poses significant challenges, particularly in identifying minute objects within product images. The swift evolution of the industrial sector underscores the necessity for intelligent manufacturing environments to uphold stringent product quality standards. However, accelerating production processes at high speeds heightens the risk of defective product outcomes. This research addresses the challenges inherent in small object detection within industrial contexts, proposing an innovative detection transformer model tailored to modern manufacturing environments. The proposed model integrates a feature-enhanced multi-head self-attention block (FEMSA), merging cross-channel communication network and multiple multi-head self-attention (MSA) components to refine image features. A query proposal network is also introduced within the detection transformer framework to discern high-ranking proposals using Intersection over Union (IoU) and Non-Maximum Suppression (NMS) algorithms. Through extensive experimentation on custom industrial small objects, our proposed model demonstrates superior performance compared to existing models based on Non-Maximum Suppression and transformers. By tackling the challenges associated with small object detection, our model contributes to the dynamic synchronization between virtual and physical manufacturing realms, enhancing quality control in industrial production.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100809"},"PeriodicalIF":4.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdAPT: Advertisement detector adaptation under newspaper domain shift with null-based pseudo-labeling AdAPT:基于null伪标记的报纸域偏移广告检测器自适应
IF 4.9 Pub Date : 2025-12-01 DOI: 10.1016/j.mlwa.2025.100806
Faeze Zakaryapour Sayyad , Tobias Pettersson , Seyed Jalaleddin Mousavirad , Irida Shallari , Mattias O’Nils
Detecting advertisements in digitized newspapers is a key step in large-scale media analytics and digital archiving. However, variations in layout, typography, and advertisement design across publishers and time periods cause significant domain shifts that reduce the generalization ability of supervised detectors. This paper presents AdAPT, a confidence-guided pseudo-labeling pipeline for unsupervised domain adaptation in advertisement detection. The proposed method leverages both advertisement-free (Null) and advertisement-containing pages from unlabeled target domains to generate reliable pseudo-labels. By retraining a YOLO-based detector using labeled source data combined with filtered pseudo-labeled target samples, AdAPT achieves robust adaptation without requiring manual annotation. Experiments conducted on two unseen newspapers (Adresseavisen and iTromsø) demonstrate that Null-based pseudo-labeling provides the most stable and accurate adaptation, yielding up to 38% error reduction compared to the baseline. The results highlight AdAPT as a simple, scalable, and annotation-efficient solution for maintaining high-performance advertisement detection across diverse newspaper collections.
在数字化报纸中检测广告是大规模媒体分析和数字化存档的关键步骤。然而,布局、排版和广告设计在出版商和时间段上的变化会导致显著的领域转移,从而降低监督检测器的泛化能力。该文提出了一种基于置信度引导的伪标记管道,用于广告检测中的无监督域自适应。该方法利用来自未标记目标域的无广告(Null)和包含广告的页面来生成可靠的伪标签。通过将标记的源数据与过滤后的伪标记目标样本相结合,对基于yolo的检测器进行再训练,AdAPT无需手动标注即可实现鲁棒自适应。在两份看不见的报纸(Adresseavisen和iTromsø)上进行的实验表明,基于空值的伪标记提供了最稳定和准确的自适应,与基线相比,误差减少了38%。结果表明,AdAPT是一种简单、可扩展且注释高效的解决方案,可在不同的报纸集合中维护高性能的广告检测。
{"title":"AdAPT: Advertisement detector adaptation under newspaper domain shift with null-based pseudo-labeling","authors":"Faeze Zakaryapour Sayyad ,&nbsp;Tobias Pettersson ,&nbsp;Seyed Jalaleddin Mousavirad ,&nbsp;Irida Shallari ,&nbsp;Mattias O’Nils","doi":"10.1016/j.mlwa.2025.100806","DOIUrl":"10.1016/j.mlwa.2025.100806","url":null,"abstract":"<div><div>Detecting advertisements in digitized newspapers is a key step in large-scale media analytics and digital archiving. However, variations in layout, typography, and advertisement design across publishers and time periods cause significant domain shifts that reduce the generalization ability of supervised detectors. This paper presents AdAPT, a confidence-guided pseudo-labeling pipeline for unsupervised domain adaptation in advertisement detection. The proposed method leverages both advertisement-free (Null) and advertisement-containing pages from unlabeled target domains to generate reliable pseudo-labels. By retraining a YOLO-based detector using labeled source data combined with filtered pseudo-labeled target samples, AdAPT achieves robust adaptation without requiring manual annotation. Experiments conducted on two unseen newspapers (Adresseavisen and iTromsø) demonstrate that Null-based pseudo-labeling provides the most stable and accurate adaptation, yielding up to 38% error reduction compared to the baseline. The results highlight AdAPT as a simple, scalable, and annotation-efficient solution for maintaining high-performance advertisement detection across diverse newspaper collections.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100806"},"PeriodicalIF":4.9,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Explainable DEA–ensemble approach with golden jackal optimization: efficiency evaluation and prediction for United States information technology firms 金豺狼优化下的可解释dea -集合方法:美国信息技术企业效率评价与预测
IF 4.9 Pub Date : 2025-11-29 DOI: 10.1016/j.mlwa.2025.100798
Temitope Olubanjo Kehinde , Azeez A. Oyedele , Morenikeji Kabirat Kareem , Joseph Akpan , Oludolapo A. Olanrewaju
This study presents an integrated Data Envelopment Analysis (DEA) and ensemble learning framework optimized with the Golden Jackal Optimization (GJO) algorithm to evaluate and predict the efficiency of United States information technology firms. Both Constant Returns to Scale and Variable Returns to Scale models were applied to measure firm efficiency and compute scale efficiency, providing a clearer distinction between managerial and scale-related effects. Using data from 3940 firms over the period 2013 to 2023, a robustness test introducing ±20% random noise to a 10% random sample confirmed that the CCR model achieved stronger stability, with a correlation coefficient of 0.795 compared to 0.773 for the BCC model. Consequently, the CCR results were adopted as the basis for predictive modeling. DEA efficiency scores were predicted using six ensemble learners, including XGBoost, Gradient Boosting Regressor, AdaBoost, Extra Trees Regressor, Random Forest, and LightGBM, with GJO employed for hyperparameter tuning. The Gradient Boosting Regressor optimized with GJO achieved the best predictive performance, accurately reproducing the observed efficiency scores. SHAP and feature importance analyses revealed that Total Equity, Operating Income, and Total Assets were the most influential determinants of efficiency. This research contributes a scalable and interpretable approach to efficiency prediction, offering actionable insights for managers, investors, and policymakers in volatile financial markets.
本研究提出一个整合数据包络分析(DEA)与金豺优化(GJO)算法的集成学习框架来评估和预测美国信息技术公司的效率。恒定规模回报和可变规模回报模型都被应用于衡量企业效率和计算规模效率,在管理效应和规模相关效应之间提供了更清晰的区分。利用2013年至2023年期间3940家企业的数据,对10%随机样本引入±20%随机噪声的稳健性检验证实,CCR模型具有更强的稳定性,其相关系数为0.795,而BCC模型的相关系数为0.773。因此,采用CCR结果作为预测建模的基础。使用XGBoost、Gradient Boosting Regressor、AdaBoost、Extra Trees Regressor、Random Forest和LightGBM等6个集成学习器预测DEA效率得分,并使用GJO进行超参数调优。使用GJO优化的梯度增强回归器获得了最佳的预测性能,准确地再现了观察到的效率得分。SHAP和特征重要性分析显示,总股本、营业收入和总资产是效率的最具影响力的决定因素。本研究为效率预测提供了一种可扩展和可解释的方法,为动荡的金融市场中的管理者、投资者和政策制定者提供了可操作的见解。
{"title":"Explainable DEA–ensemble approach with golden jackal optimization: efficiency evaluation and prediction for United States information technology firms","authors":"Temitope Olubanjo Kehinde ,&nbsp;Azeez A. Oyedele ,&nbsp;Morenikeji Kabirat Kareem ,&nbsp;Joseph Akpan ,&nbsp;Oludolapo A. Olanrewaju","doi":"10.1016/j.mlwa.2025.100798","DOIUrl":"10.1016/j.mlwa.2025.100798","url":null,"abstract":"<div><div>This study presents an integrated Data Envelopment Analysis (DEA) and ensemble learning framework optimized with the Golden Jackal Optimization (GJO) algorithm to evaluate and predict the efficiency of United States information technology firms. Both Constant Returns to Scale and Variable Returns to Scale models were applied to measure firm efficiency and compute scale efficiency, providing a clearer distinction between managerial and scale-related effects. Using data from 3940 firms over the period 2013 to 2023, a robustness test introducing ±20% random noise to a 10% random sample confirmed that the CCR model achieved stronger stability, with a correlation coefficient of 0.795 compared to 0.773 for the BCC model. Consequently, the CCR results were adopted as the basis for predictive modeling. DEA efficiency scores were predicted using six ensemble learners, including XGBoost, Gradient Boosting Regressor, AdaBoost, Extra Trees Regressor, Random Forest, and LightGBM, with GJO employed for hyperparameter tuning. The Gradient Boosting Regressor optimized with GJO achieved the best predictive performance, accurately reproducing the observed efficiency scores. SHAP and feature importance analyses revealed that Total Equity, Operating Income, and Total Assets were the most influential determinants of efficiency. This research contributes a scalable and interpretable approach to efficiency prediction, offering actionable insights for managers, investors, and policymakers in volatile financial markets.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100798"},"PeriodicalIF":4.9,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced synthesis of passively heard speech from electrocorticography signals using image-to-image spectrogram translation 利用图像到图像的谱图转换,从皮质电图信号中增强被动听到语音的合成
IF 4.9 Pub Date : 2025-11-28 DOI: 10.1016/j.mlwa.2025.100805
Hongsang Lee , Jihun Hwang , Kyungjun Kim , Gyuwon Lee , Chun Kee Chung , Chang-Hwan Im
Speech synthesis from neural signals offers a promising avenue for restoring communication in individuals with speech impairments. Recent deep learning advances have improved decoding of neural activity into intelligible speech, yet further enhancement is required to improve the quality of synthesized speech. Here, we investigate whether an image-to-image translation approach can further refine Mel spectrograms synthesized from electrocorticography (ECoG) signals recorded while participants passively listened to spoken sentences. ECoG data were collected from volunteers performing an auditory speech perception task. A three-layer bidirectional long short-term memory (Bi-LSTM) network was first trained to predict Mel-spectrogram features from neural signals. Comparison with the Conformer model indicated that Bi-LSTM was more effective as the initial synthesis model under our limited data conditions. To further enhance the quality of the Bi-LSTM-synthesized Mel spectrograms, we applied Pix2pixHD, a high-resolution conditional GAN, as a post-processing module. The impact of Pix2pixHD was evaluated using Log-Spectral Distance (LSD), Scale-Invariant Signal-to-Distortion Ratio (SI-SDR), and Short-Time Objective Intelligibility (STOI) comparing outputs against the original ground truth. Furthermore, subjective listening tests (2AFC similarity judgment) were conducted to assess perceptual improvements. Across objective metrics, Pix2pixHD post-processing yielded consistent improvements in spectral fidelity, waveform similarity, and estimated intelligibility (lower LSD, higher SI-SDR and STOI), and subjective tests confirmed significantly enhanced perceived similarity to the original speech. These gains were supported by non-parametric significance testing (Wilcoxon signed-rank test, p < 0.005). The results indicate that high-resolution image-to-image translation is an effective vehicle to refine neural signal-based speech synthesis, complementing sequence models and improving the overall perceived quality of the synthesized speech.
神经信号的语音合成为恢复言语障碍患者的沟通提供了一条有希望的途径。最近的深度学习进展已经改进了将神经活动解码为可理解的语音,但需要进一步增强以提高合成语音的质量。在这里,我们研究了一种图像到图像的翻译方法是否可以进一步改进脑皮层电图(ECoG)信号合成的Mel谱图,这些信号是在参与者被动地听口语句子时记录的。ECoG数据收集自执行听觉语音感知任务的志愿者。首先训练了一个三层双向长短期记忆(Bi-LSTM)网络来预测神经信号的mel谱图特征。与Conformer模型的比较表明,在有限的数据条件下,Bi-LSTM作为初始综合模型更为有效。为了进一步提高bi - lstm合成Mel谱图的质量,我们采用高分辨率条件GAN Pix2pixHD作为后处理模块。Pix2pixHD的影响通过对数光谱距离(LSD)、尺度不变信失真比(SI-SDR)和短期客观可理解性(STOI)来评估。此外,进行主观听力测试(2AFC相似性判断)来评估感知改善。在客观指标上,Pix2pixHD后处理在频谱保真度、波形相似性和估计可理解性(更低的LSD、更高的SI-SDR和STOI)方面取得了一致的改善,主观测试证实了与原始语音的感知相似性显著增强。这些成果得到了非参数显著性检验的支持(Wilcoxon符号秩检验,p < 0.005)。结果表明,高分辨率图像到图像翻译是改进基于神经信号的语音合成、补充序列模型和提高合成语音整体感知质量的有效工具。
{"title":"Enhanced synthesis of passively heard speech from electrocorticography signals using image-to-image spectrogram translation","authors":"Hongsang Lee ,&nbsp;Jihun Hwang ,&nbsp;Kyungjun Kim ,&nbsp;Gyuwon Lee ,&nbsp;Chun Kee Chung ,&nbsp;Chang-Hwan Im","doi":"10.1016/j.mlwa.2025.100805","DOIUrl":"10.1016/j.mlwa.2025.100805","url":null,"abstract":"<div><div>Speech synthesis from neural signals offers a promising avenue for restoring communication in individuals with speech impairments. Recent deep learning advances have improved decoding of neural activity into intelligible speech, yet further enhancement is required to improve the quality of synthesized speech. Here, we investigate whether an image-to-image translation approach can further refine Mel spectrograms synthesized from electrocorticography (ECoG) signals recorded while participants passively listened to spoken sentences. ECoG data were collected from volunteers performing an auditory speech perception task. A three-layer bidirectional long short-term memory (Bi-LSTM) network was first trained to predict Mel-spectrogram features from neural signals. Comparison with the Conformer model indicated that Bi-LSTM was more effective as the initial synthesis model under our limited data conditions. To further enhance the quality of the Bi-LSTM-synthesized Mel spectrograms, we applied Pix2pixHD, a high-resolution conditional GAN, as a post-processing module. The impact of Pix2pixHD was evaluated using Log-Spectral Distance (LSD), Scale-Invariant Signal-to-Distortion Ratio (SI-SDR), and Short-Time Objective Intelligibility (STOI) comparing outputs against the original ground truth. Furthermore, subjective listening tests (2AFC similarity judgment) were conducted to assess perceptual improvements. Across objective metrics, Pix2pixHD post-processing yielded consistent improvements in spectral fidelity, waveform similarity, and estimated intelligibility (lower LSD, higher SI-SDR and STOI), and subjective tests confirmed significantly enhanced perceived similarity to the original speech. These gains were supported by non-parametric significance testing (Wilcoxon signed-rank test, <em>p</em> &lt; 0.005). The results indicate that high-resolution image-to-image translation is an effective vehicle to refine neural signal-based speech synthesis, complementing sequence models and improving the overall perceived quality of the synthesized speech.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100805"},"PeriodicalIF":4.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking a time-series foundation model (TimeGPT) for real-world forecasting applications 对真实世界预测应用程序的时间序列基础模型(TimeGPT)进行基准测试
IF 4.9 Pub Date : 2025-11-27 DOI: 10.1016/j.mlwa.2025.100801
Xiao Zhang , Srinath Sridharan , Nur Hakim Bin Zahrin , Narayan Venkataraman , Siang Hiong Goh
Accurate forecasting of hospital demand is essential for operational resilience, yet traditional statistical and machine learning approaches often require extensive feature engineering and tuning, limiting adoption in resource-constrained environments. Foundation models for time-series forecasting offer the potential for robust, zero-shot performance across domains. This study evaluates the feasibility of TimeGPT, a general-purpose time-series foundation model, for forecasting daily Emergency Department (ED) arrivals.
We benchmarked TimeGPT against Seasonal Autoregressive Integrated Moving Average (SARIMAX), Prophet, and XGBoost under univariate and multivariate configurations. The experimental design simulated operational constraints by limiting the training window to 30 days and using a rolling forecast over a 60-day holdout period. Forecast accuracy was assessed using root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and directional accuracy.
TimeGPT consistently ranked among the top-performing models. In the univariate setting, it achieved a MAPE of 7.7% and directional accuracy of 75%, comparable to or exceeding traditional models with extensive feature engineering. TimeGPT required no model-specific tuning and maintained accuracy without exogenous features such as weather or calendar variables. SARIMAX achieved the best results in the temporal-plus-weather configuration (MAPE 7.0%, RMSE 31.0) but required substantially more configuration. TimeGPT recorded zero large-error days (>30% deviation), while SARIMAX had 5 such days, underscoring the trade-off between accuracy and robustness.
This benchmark demonstrates that foundation models can deliver accurate, reliable forecasts in healthcare operations with minimal data preparation. TimeGPT’s zero-shot capability highlights its potential as a scalable solution for diverse operational forecasting challenges.
医院需求的准确预测对于运营弹性至关重要,但传统的统计和机器学习方法通常需要大量的特征工程和调优,限制了在资源受限环境中的采用。时间序列预测的基础模型提供了跨领域的稳健、零概率性能的潜力。本研究评估TimeGPT的可行性,一个通用的时间序列基础模型,预测每日急诊科(ED)到达。我们在单变量和多变量配置下对TimeGPT与季节性自回归综合移动平均线(SARIMAX)、Prophet和XGBoost进行基准测试。实验设计通过将训练窗口限制为30天,并在60天的抵制期内使用滚动预测来模拟操作约束。预测精度采用均方根误差(RMSE)、平均绝对误差(MAE)、平均绝对百分比误差(MAPE)和方向精度进行评估。TimeGPT一直是表现最好的模型之一。在单变量设置下,它的MAPE为7.7%,方向精度为75%,与具有广泛特征工程的传统模型相当或超过。TimeGPT不需要特定于模型的调优,并且不需要天气或日历变量等外生特征来保持准确性。SARIMAX在时间+天气配置中取得了最好的结果(MAPE为7.0%,RMSE为31.0),但需要更多的配置。TimeGPT记录的大误差天数为零(>;30%偏差),而SARIMAX记录的大误差天数为5天,强调了准确性和鲁棒性之间的权衡。该基准测试表明,基础模型可以在医疗保健操作中以最少的数据准备提供准确、可靠的预测。TimeGPT的零射击能力突出了其作为一种可扩展的解决方案的潜力,可用于各种作战预测挑战。
{"title":"Benchmarking a time-series foundation model (TimeGPT) for real-world forecasting applications","authors":"Xiao Zhang ,&nbsp;Srinath Sridharan ,&nbsp;Nur Hakim Bin Zahrin ,&nbsp;Narayan Venkataraman ,&nbsp;Siang Hiong Goh","doi":"10.1016/j.mlwa.2025.100801","DOIUrl":"10.1016/j.mlwa.2025.100801","url":null,"abstract":"<div><div>Accurate forecasting of hospital demand is essential for operational resilience, yet traditional statistical and machine learning approaches often require extensive feature engineering and tuning, limiting adoption in resource-constrained environments. Foundation models for time-series forecasting offer the potential for robust, zero-shot performance across domains. This study evaluates the feasibility of TimeGPT, a general-purpose time-series foundation model, for forecasting daily Emergency Department (ED) arrivals.</div><div>We benchmarked TimeGPT against Seasonal Autoregressive Integrated Moving Average (SARIMAX), Prophet, and XGBoost under univariate and multivariate configurations. The experimental design simulated operational constraints by limiting the training window to 30 days and using a rolling forecast over a 60-day holdout period. Forecast accuracy was assessed using root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and directional accuracy.</div><div>TimeGPT consistently ranked among the top-performing models. In the univariate setting, it achieved a MAPE of 7.7% and directional accuracy of 75%, comparable to or exceeding traditional models with extensive feature engineering. TimeGPT required no model-specific tuning and maintained accuracy without exogenous features such as weather or calendar variables. SARIMAX achieved the best results in the temporal-plus-weather configuration (MAPE 7.0%, RMSE 31.0) but required substantially more configuration. TimeGPT recorded zero large-error days (&gt;30% deviation), while SARIMAX had 5 such days, underscoring the trade-off between accuracy and robustness.</div><div>This benchmark demonstrates that foundation models can deliver accurate, reliable forecasts in healthcare operations with minimal data preparation. TimeGPT’s zero-shot capability highlights its potential as a scalable solution for diverse operational forecasting challenges.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100801"},"PeriodicalIF":4.9,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking and validation of prompting techniques for AI-assisted industrial PLC programming 人工智能辅助工业PLC编程提示技术的基准测试和验证
IF 4.9 Pub Date : 2025-11-27 DOI: 10.1016/j.mlwa.2025.100804
Ketut Adnyana , Andreas Schwung
Industrial automation in Industry 5.0 demands deterministic, safety-compliant PLC code across heterogeneous vendor ecosystems. Prompt-engineered large language models (LLMs) offer a path forward but require reproducible methods and rigorous validation. This study introduces LLM-PLC-AS, a hybrid, prompt-invariant framework for IEC 61131-3 PLC code generation addressing these needs. We benchmark 21 fixed prompting techniques on 25 real-world use cases (simple, medium, complex), using a standardized dataset and workflow spanning Siemens TIA Portal and Beckhoff TwinCAT. The quality of the generated code is evaluated through a layered validation pipeline: Bilingual Evaluation Understudy (BLEU) for lexical similarity, LLM-in-the-Loop (LITL) for scalable semantic checks across four dimensions (functional correctness, readability, safety compliance, and modularity), and Human-in-the-Loop (HITL) for expert safety-critical review. DeepSeek and Gemini 2.5 Pro generate ST/IL; syntax is cross-checked by ChatGPT-4o and Copilot Pro. The framework achieved a very high degree of accuracy, with Structured Text (ST) programs reaching near-perfect scores and Instruction List (IL) programs also performing exceptionally well on our scoring rubric. This resulted in a substantial reduction in manual correction effort, decreasing it by nearly half compared to ad-hoc methods. Across tasks, our approach led to a more than twofold increase in Safety Compliance and a significant improvement in Functional Correctness against unstructured baselines. A key finding is that the structure of the prompt itself was found to have a greater influence on determinism and correctness than the choice of LLM. The fixed-prompt reasoning combined with the BLEU/LITL/HITL validation stack provides a scalable, reproducible, and safety-aware method for PLC code generation. BLEU is utilized for rapid lexical triage and regression tracking, LITL provides structured semantic verification, and HITL ensures final compliance. The framework establishes a standardized basis for AI-assisted PLC programming and transparent benchmarking. Future work will extend the pipeline to include graphical languages, such as Ladder Diagram (LAD) and Function Block Diagram (FBD), using multimodal/graph-aware models, and will incorporate runtime validation to further close the gap to real-world deployment. Safety verification in this study is limited to logical and semantic validation. Real-time behavior, communication latency, and physical safety-fault recovery require Hardware-in-the-Loop (HIL) simulation or deployment on industrial test benches, which is identified as future work.
工业5.0中的工业自动化需要跨异构供应商生态系统的确定性、安全兼容的PLC代码。快速工程的大型语言模型(llm)提供了一条前进的道路,但需要可重复的方法和严格的验证。本研究介绍了LLM-PLC-AS,一种用于IEC 61131-3 PLC代码生成的混合、快速不变框架,以满足这些需求。我们在25个实际用例(简单、中等、复杂)上对21种固定提示技术进行基准测试,使用标准化数据集和工作流,涵盖西门子TIA Portal和倍福TwinCAT。生成代码的质量通过分层验证管道进行评估:用于词汇相似性的双语评估代理(BLEU),用于跨四个维度(功能正确性、可读性、安全性遵从性和模块化)的可扩展语义检查的llm -in- loop (LITL),以及用于专家安全关键审查的human -in- loop (HITL)。DeepSeek和Gemini 2.5 Pro生成ST/IL;语法由chatgpt - 40和Copilot Pro交叉检查。该框架实现了非常高的准确性,结构化文本(ST)程序达到了近乎完美的分数,指令列表(IL)程序在我们的评分标准上也表现得非常好。这大大减少了人工校正的工作量,与临时方法相比减少了近一半。在不同的任务中,我们的方法使安全遵从性增加了两倍以上,并且在非结构化基线的功能正确性方面有了显著的改进。一个关键的发现是,提示本身的结构比选择LLM对决定论和正确性的影响更大。固定提示推理与BLEU/LITL/HITL验证堆栈相结合,为PLC代码生成提供了可扩展,可重复和安全意识的方法。BLEU用于快速词法分类和回归跟踪,LITL提供结构化语义验证,HITL确保最终的遵从性。该框架为人工智能辅助PLC编程和透明基准建立了标准化基础。未来的工作将扩展管道,包括图形语言,如梯形图(LAD)和功能块图(FBD),使用多模态/图形感知模型,并将纳入运行时验证,以进一步缩小与实际部署的差距。本研究的安全性验证仅限于逻辑和语义验证。实时行为、通信延迟和物理安全故障恢复需要硬件在环(HIL)仿真或在工业测试台上部署,这被确定为未来的工作。
{"title":"Benchmarking and validation of prompting techniques for AI-assisted industrial PLC programming","authors":"Ketut Adnyana ,&nbsp;Andreas Schwung","doi":"10.1016/j.mlwa.2025.100804","DOIUrl":"10.1016/j.mlwa.2025.100804","url":null,"abstract":"<div><div>Industrial automation in Industry 5.0 demands deterministic, safety-compliant PLC code across heterogeneous vendor ecosystems. Prompt-engineered large language models (LLMs) offer a path forward but require reproducible methods and rigorous validation. This study introduces LLM-PLC-AS, a hybrid, prompt-invariant framework for IEC 61131-3 PLC code generation addressing these needs. We benchmark 21 fixed prompting techniques on 25 real-world use cases (simple, medium, complex), using a standardized dataset and workflow spanning Siemens TIA Portal and Beckhoff TwinCAT. The quality of the generated code is evaluated through a layered validation pipeline: Bilingual Evaluation Understudy (BLEU) for lexical similarity, LLM-in-the-Loop (LITL) for scalable semantic checks across four dimensions (functional correctness, readability, safety compliance, and modularity), and Human-in-the-Loop (HITL) for expert safety-critical review. DeepSeek and Gemini 2.5 Pro generate ST/IL; syntax is cross-checked by ChatGPT-4o and Copilot Pro. The framework achieved a very high degree of accuracy, with Structured Text (ST) programs reaching near-perfect scores and Instruction List (IL) programs also performing exceptionally well on our scoring rubric. This resulted in a substantial reduction in manual correction effort, decreasing it by nearly half compared to ad-hoc methods. Across tasks, our approach led to a more than twofold increase in Safety Compliance and a significant improvement in Functional Correctness against unstructured baselines. A key finding is that the structure of the prompt itself was found to have a greater influence on determinism and correctness than the choice of LLM. The fixed-prompt reasoning combined with the BLEU/LITL/HITL validation stack provides a scalable, reproducible, and safety-aware method for PLC code generation. BLEU is utilized for rapid lexical triage and regression tracking, LITL provides structured semantic verification, and HITL ensures final compliance. The framework establishes a standardized basis for AI-assisted PLC programming and transparent benchmarking. Future work will extend the pipeline to include graphical languages, such as Ladder Diagram (LAD) and Function Block Diagram (FBD), using multimodal/graph-aware models, and will incorporate runtime validation to further close the gap to real-world deployment. Safety verification in this study is limited to logical and semantic validation. Real-time behavior, communication latency, and physical safety-fault recovery require Hardware-in-the-Loop (HIL) simulation or deployment on industrial test benches, which is identified as future work.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100804"},"PeriodicalIF":4.9,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring multimodal, non-invasive stress assessment through audio-visual and textual cues integrated with psychometric survey data 探索多模式,非侵入性的压力评估,通过视听和文字线索与心理测量调查数据相结合
IF 4.9 Pub Date : 2025-11-26 DOI: 10.1016/j.mlwa.2025.100803
Xin Yu Huang , Venkat Margapuri
Stress is a widespread psychological concern that often manifests alongside conditions such as anxiety and depression. Traditional self-report tools like the Perceived Stress Scale (PSS-10) may not fully capture an individual’s stress experience. This study explores whether integrating multimodal biometric data through video, audio, and transcriptions can enhance stress detection by providing a more comprehensive and interpretive point of view. Participants completed the PSS-10 while being recorded, and emotional features were extracted using machine learning models across the three biometric modalities. Results revealed weak correlations among the modalities, indicating that each captures distinct aspects of stress. Notably, the combined biometric score demonstrated greater sensitivity than the PSS-10 alone, suggesting that multimodal models may detect stress-related states that self-reports overlook. These findings support the development of more comprehensive stress assessment tools, although they are not intended to replace professional clinical evaluation.
压力是一种普遍存在的心理问题,通常与焦虑和抑郁等症状一起表现出来。传统的自我报告工具,如感知压力量表(PSS-10)可能无法完全捕捉到个人的压力体验。本研究探讨了通过视频、音频和转录整合多模态生物识别数据是否可以通过提供更全面和更具解释性的观点来增强应力检测。参与者在被记录的同时完成了PSS-10,并使用跨三种生物识别模式的机器学习模型提取了情绪特征。结果显示,模式之间的弱相关性,表明每个捕获不同的方面的压力。值得注意的是,联合生物特征评分比单独使用PSS-10表现出更高的灵敏度,这表明多模态模型可以检测到自我报告忽略的压力相关状态。这些发现支持开发更全面的压力评估工具,尽管它们并不打算取代专业的临床评估。
{"title":"Exploring multimodal, non-invasive stress assessment through audio-visual and textual cues integrated with psychometric survey data","authors":"Xin Yu Huang ,&nbsp;Venkat Margapuri","doi":"10.1016/j.mlwa.2025.100803","DOIUrl":"10.1016/j.mlwa.2025.100803","url":null,"abstract":"<div><div>Stress is a widespread psychological concern that often manifests alongside conditions such as anxiety and depression. Traditional self-report tools like the Perceived Stress Scale (PSS-10) may not fully capture an individual’s stress experience. This study explores whether integrating multimodal biometric data through video, audio, and transcriptions can enhance stress detection by providing a more comprehensive and interpretive point of view. Participants completed the PSS-10 while being recorded, and emotional features were extracted using machine learning models across the three biometric modalities. Results revealed weak correlations among the modalities, indicating that each captures distinct aspects of stress. Notably, the combined biometric score demonstrated greater sensitivity than the PSS-10 alone, suggesting that multimodal models may detect stress-related states that self-reports overlook. These findings support the development of more comprehensive stress assessment tools, although they are not intended to replace professional clinical evaluation.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100803"},"PeriodicalIF":4.9,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145624849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TQC: An intelligent clustering approach for large-scale, noisy, and imbalanced data TQC:一种针对大规模、嘈杂和不平衡数据的智能聚类方法
IF 4.9 Pub Date : 2025-11-26 DOI: 10.1016/j.mlwa.2025.100800
Ali Asghari
As an unsupervised learning method, clustering is a critical technique in artificial intelligence for organizing raw data into meaningful groups. In this process, data is partitioned based on the internal similarity of members within the same cluster and the maximum external distance from other clusters. Beyond business analytics, healthcare, economics, and other fields, clustering has been widely applied across disciplines. Extracting practical knowledge from large datasets relies on an effective clustering technique. Processing speed, especially for large datasets, handling noisy data and outliers, and ensuring high accuracy are the main challenges in clustering. These problems are especially significant in contemporary applications, where heterogeneous and inherently noisy datasets are prevalent. Combining the Trees Social Relation Algorithm (TSR) with the Queue Learning (QL) algorithm, the proposed approach, TQC (Tree-Queue Clustering), addresses these problems. While the QL algorithm enhances clustering accuracy, the TSR method focuses on accelerating clustering. The suggested approach first divides the data into smaller groups. Then, by effectively computing group memberships, TSR's migration process causes clusters to develop progressively. Handling noise and outliers helps the QL algorithm prevent local optima and improve clustering efficiency. This hybrid approach ensures the formation of high-quality clusters and accelerates convergence. The suggested method is validated across several real-world datasets of varying sizes and properties. Experimental results, evaluated using five performance metrics — MICD, ARI, NMI, ET, and ODR — and compared with eight state-of-the-art algorithms, demonstrate the proposed method's superior performance in both speed and accuracy.
聚类作为一种无监督学习方法,是人工智能中将原始数据组织成有意义组的关键技术。在此过程中,根据同一簇内成员的内部相似度和与其他簇的最大外部距离对数据进行分区。除了业务分析、医疗保健、经济学和其他领域之外,集群还被广泛应用于各个学科。从大型数据集中提取实用知识依赖于有效的聚类技术。处理速度,特别是对于大型数据集,处理噪声数据和异常值,并确保高准确性是聚类的主要挑战。这些问题在当代应用中尤为重要,因为异构和固有噪声数据集很普遍。将树社会关系算法(TSR)与队列学习算法(QL)相结合,提出的树队列聚类(TQC)方法解决了这些问题。QL算法提高了聚类的精度,而TSR方法侧重于加速聚类。建议的方法首先将数据分成较小的组。然后,通过有效地计算群组成员,TSR的迁移过程使集群逐步发展。处理噪声和异常值有助于QL算法防止局部最优,提高聚类效率。这种混合方式保证了高质量集群的形成,加速了收敛。建议的方法在几个不同大小和属性的真实数据集上进行了验证。实验结果,使用五个性能指标(MICD, ARI, NMI, ET和ODR)进行评估,并与八种最先进的算法进行比较,证明了该方法在速度和准确性方面的优越性能。
{"title":"TQC: An intelligent clustering approach for large-scale, noisy, and imbalanced data","authors":"Ali Asghari","doi":"10.1016/j.mlwa.2025.100800","DOIUrl":"10.1016/j.mlwa.2025.100800","url":null,"abstract":"<div><div>As an unsupervised learning method, clustering is a critical technique in artificial intelligence for organizing raw data into meaningful groups. In this process, data is partitioned based on the internal similarity of members within the same cluster and the maximum external distance from other clusters. Beyond business analytics, healthcare, economics, and other fields, clustering has been widely applied across disciplines. Extracting practical knowledge from large datasets relies on an effective clustering technique. Processing speed, especially for large datasets, handling noisy data and outliers, and ensuring high accuracy are the main challenges in clustering. These problems are especially significant in contemporary applications, where heterogeneous and inherently noisy datasets are prevalent. Combining the Trees Social Relation Algorithm (TSR) with the Queue Learning (QL) algorithm, the proposed approach, TQC (Tree-Queue Clustering), addresses these problems. While the QL algorithm enhances clustering accuracy, the TSR method focuses on accelerating clustering. The suggested approach first divides the data into smaller groups. Then, by effectively computing group memberships, TSR's migration process causes clusters to develop progressively. Handling noise and outliers helps the QL algorithm prevent local optima and improve clustering efficiency. This hybrid approach ensures the formation of high-quality clusters and accelerates convergence. The suggested method is validated across several real-world datasets of varying sizes and properties. Experimental results, evaluated using five performance metrics — MICD, ARI, NMI, ET, and ODR — and compared with eight state-of-the-art algorithms, demonstrate the proposed method's superior performance in both speed and accuracy.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100800"},"PeriodicalIF":4.9,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145624851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid-hierarchical fashion graph attention network for compatibility-oriented and personalized outfit recommendation 面向兼容性和个性化服装推荐的混合分层时尚图关注网络
IF 4.9 Pub Date : 2025-11-26 DOI: 10.1016/j.mlwa.2025.100802
Sajjad Saed, Babak Teimourpour
The rapid expansion of the fashion industry and the growing variety of products have made it increasingly challenging for users to identify compatible items on e-commerce platforms. Effective fashion recommendation systems are therefore crucial for filtering irrelevant options and suggesting suitable ones. However, simultaneously addressing outfit compatibility and personalized recommendations remains a significant challenge, as these aspects are typically treated independently in existing studies, thereby overlooking the complex interactions between items and user preferences. This research introduces a new framework named FGAT, which leverages a hierarchical graph representation together with attention mechanisms to address this problem. The framework constructs a three-tier graph of users, outfits, and items, integrating visual and textual features to jointly model outfit compatibility and user preferences. By dynamically weighting node importance during representation propagation, the graph attention mechanism captures key interactions and produces precise embeddings for both user preferences and outfit compatibility. Evaluated on the POG dataset, FGAT outperforms strong baselines such as HFGN, achieving notable improvements in accuracy, precision, hit ratio (HR), recall, and NDCG. These results demonstrate that combining multimodal visual–textual features with a hierarchical graph structure and attention mechanisms significantly enhances the effectiveness and efficiency of personalized fashion recommendation systems.
时尚产业的迅速扩张和产品的日益多样化使得用户在电子商务平台上识别兼容的商品越来越具有挑战性。因此,有效的时尚推荐系统对于过滤不相关的选项并推荐合适的选项至关重要。然而,同时解决服装兼容性和个性化推荐仍然是一个重大挑战,因为在现有的研究中,这些方面通常是独立处理的,从而忽略了物品和用户偏好之间复杂的相互作用。本研究引入了一个名为FGAT的新框架,它利用分层图表示和注意机制来解决这个问题。该框架构建了一个用户、服装和物品的三层图,集成了视觉和文本特征,共同为服装兼容性和用户偏好建模。通过在表示传播过程中动态加权节点重要性,图注意机制捕获关键交互,并为用户偏好和装备兼容性生成精确的嵌入。在POG数据集上进行评估,FGAT优于强基线(如HFGN),在准确性、精度、命中率(HR)、召回率和NDCG方面取得了显着改善。这些结果表明,将多模态视觉文本特征与分层图结构和注意机制相结合,可以显著提高个性化时尚推荐系统的有效性和效率。
{"title":"Hybrid-hierarchical fashion graph attention network for compatibility-oriented and personalized outfit recommendation","authors":"Sajjad Saed,&nbsp;Babak Teimourpour","doi":"10.1016/j.mlwa.2025.100802","DOIUrl":"10.1016/j.mlwa.2025.100802","url":null,"abstract":"<div><div>The rapid expansion of the fashion industry and the growing variety of products have made it increasingly challenging for users to identify compatible items on e-commerce platforms. Effective fashion recommendation systems are therefore crucial for filtering irrelevant options and suggesting suitable ones. However, simultaneously addressing outfit compatibility and personalized recommendations remains a significant challenge, as these aspects are typically treated independently in existing studies, thereby overlooking the complex interactions between items and user preferences. This research introduces a new framework named FGAT, which leverages a hierarchical graph representation together with attention mechanisms to address this problem. The framework constructs a three-tier graph of users, outfits, and items, integrating visual and textual features to jointly model outfit compatibility and user preferences. By dynamically weighting node importance during representation propagation, the graph attention mechanism captures key interactions and produces precise embeddings for both user preferences and outfit compatibility. Evaluated on the POG dataset, FGAT outperforms strong baselines such as HFGN, achieving notable improvements in accuracy, precision, hit ratio (HR), recall, and NDCG. These results demonstrate that combining multimodal visual–textual features with a hierarchical graph structure and attention mechanisms significantly enhances the effectiveness and efficiency of personalized fashion recommendation systems.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100802"},"PeriodicalIF":4.9,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine learning with applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1