首页 > 最新文献

Machine learning with applications最新文献

英文 中文
Enhanced synthesis of passively heard speech from electrocorticography signals using image-to-image spectrogram translation 利用图像到图像的谱图转换,从皮质电图信号中增强被动听到语音的合成
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-11-28 DOI: 10.1016/j.mlwa.2025.100805
Hongsang Lee , Jihun Hwang , Kyungjun Kim , Gyuwon Lee , Chun Kee Chung , Chang-Hwan Im
Speech synthesis from neural signals offers a promising avenue for restoring communication in individuals with speech impairments. Recent deep learning advances have improved decoding of neural activity into intelligible speech, yet further enhancement is required to improve the quality of synthesized speech. Here, we investigate whether an image-to-image translation approach can further refine Mel spectrograms synthesized from electrocorticography (ECoG) signals recorded while participants passively listened to spoken sentences. ECoG data were collected from volunteers performing an auditory speech perception task. A three-layer bidirectional long short-term memory (Bi-LSTM) network was first trained to predict Mel-spectrogram features from neural signals. Comparison with the Conformer model indicated that Bi-LSTM was more effective as the initial synthesis model under our limited data conditions. To further enhance the quality of the Bi-LSTM-synthesized Mel spectrograms, we applied Pix2pixHD, a high-resolution conditional GAN, as a post-processing module. The impact of Pix2pixHD was evaluated using Log-Spectral Distance (LSD), Scale-Invariant Signal-to-Distortion Ratio (SI-SDR), and Short-Time Objective Intelligibility (STOI) comparing outputs against the original ground truth. Furthermore, subjective listening tests (2AFC similarity judgment) were conducted to assess perceptual improvements. Across objective metrics, Pix2pixHD post-processing yielded consistent improvements in spectral fidelity, waveform similarity, and estimated intelligibility (lower LSD, higher SI-SDR and STOI), and subjective tests confirmed significantly enhanced perceived similarity to the original speech. These gains were supported by non-parametric significance testing (Wilcoxon signed-rank test, p < 0.005). The results indicate that high-resolution image-to-image translation is an effective vehicle to refine neural signal-based speech synthesis, complementing sequence models and improving the overall perceived quality of the synthesized speech.
神经信号的语音合成为恢复言语障碍患者的沟通提供了一条有希望的途径。最近的深度学习进展已经改进了将神经活动解码为可理解的语音,但需要进一步增强以提高合成语音的质量。在这里,我们研究了一种图像到图像的翻译方法是否可以进一步改进脑皮层电图(ECoG)信号合成的Mel谱图,这些信号是在参与者被动地听口语句子时记录的。ECoG数据收集自执行听觉语音感知任务的志愿者。首先训练了一个三层双向长短期记忆(Bi-LSTM)网络来预测神经信号的mel谱图特征。与Conformer模型的比较表明,在有限的数据条件下,Bi-LSTM作为初始综合模型更为有效。为了进一步提高bi - lstm合成Mel谱图的质量,我们采用高分辨率条件GAN Pix2pixHD作为后处理模块。Pix2pixHD的影响通过对数光谱距离(LSD)、尺度不变信失真比(SI-SDR)和短期客观可理解性(STOI)来评估。此外,进行主观听力测试(2AFC相似性判断)来评估感知改善。在客观指标上,Pix2pixHD后处理在频谱保真度、波形相似性和估计可理解性(更低的LSD、更高的SI-SDR和STOI)方面取得了一致的改善,主观测试证实了与原始语音的感知相似性显著增强。这些成果得到了非参数显著性检验的支持(Wilcoxon符号秩检验,p < 0.005)。结果表明,高分辨率图像到图像翻译是改进基于神经信号的语音合成、补充序列模型和提高合成语音整体感知质量的有效工具。
{"title":"Enhanced synthesis of passively heard speech from electrocorticography signals using image-to-image spectrogram translation","authors":"Hongsang Lee ,&nbsp;Jihun Hwang ,&nbsp;Kyungjun Kim ,&nbsp;Gyuwon Lee ,&nbsp;Chun Kee Chung ,&nbsp;Chang-Hwan Im","doi":"10.1016/j.mlwa.2025.100805","DOIUrl":"10.1016/j.mlwa.2025.100805","url":null,"abstract":"<div><div>Speech synthesis from neural signals offers a promising avenue for restoring communication in individuals with speech impairments. Recent deep learning advances have improved decoding of neural activity into intelligible speech, yet further enhancement is required to improve the quality of synthesized speech. Here, we investigate whether an image-to-image translation approach can further refine Mel spectrograms synthesized from electrocorticography (ECoG) signals recorded while participants passively listened to spoken sentences. ECoG data were collected from volunteers performing an auditory speech perception task. A three-layer bidirectional long short-term memory (Bi-LSTM) network was first trained to predict Mel-spectrogram features from neural signals. Comparison with the Conformer model indicated that Bi-LSTM was more effective as the initial synthesis model under our limited data conditions. To further enhance the quality of the Bi-LSTM-synthesized Mel spectrograms, we applied Pix2pixHD, a high-resolution conditional GAN, as a post-processing module. The impact of Pix2pixHD was evaluated using Log-Spectral Distance (LSD), Scale-Invariant Signal-to-Distortion Ratio (SI-SDR), and Short-Time Objective Intelligibility (STOI) comparing outputs against the original ground truth. Furthermore, subjective listening tests (2AFC similarity judgment) were conducted to assess perceptual improvements. Across objective metrics, Pix2pixHD post-processing yielded consistent improvements in spectral fidelity, waveform similarity, and estimated intelligibility (lower LSD, higher SI-SDR and STOI), and subjective tests confirmed significantly enhanced perceived similarity to the original speech. These gains were supported by non-parametric significance testing (Wilcoxon signed-rank test, <em>p</em> &lt; 0.005). The results indicate that high-resolution image-to-image translation is an effective vehicle to refine neural signal-based speech synthesis, complementing sequence models and improving the overall perceived quality of the synthesized speech.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100805"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-domain convergence of generative models: From biomedical to astronomical applications 生成模型的跨领域收敛:从生物医学到天文学应用
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-15 DOI: 10.1016/j.mlwa.2026.100841
Hajer Ghodhbani , Suvendi Rimer , Khmaies Ouahada , Adel M. Alimi
This paper investigates the convergence of generative modeling techniques across diverse image analysis tasks by examining their application in two data-intensive scientific domains: biomedical imaging and astronomy. In these two domains, which tend to be scientifically distinct due to their size and aims, they share common challenges, including noise corruption, limited availability of annotated data, and the demand for high-fidelity image reconstruction. This study provides a critical review of the various variants of generative models, with a particular focus on cross-domain applications. Unlike existing surveys that predominantly focus on a single discipline, this study emphasises the transferability and adaptability of generative models across biomedical and astronomical imaging. The proposed review highlights the potential offered by generative models, particularly Generative Adversarial Networks (GANS), in enhancing data generation, image restoration, and analysis in both biomedical and astronomical studies.
本文通过研究生成建模技术在两个数据密集型科学领域:生物医学成像和天文学中的应用,研究了生成建模技术在不同图像分析任务中的融合。在这两个领域中,由于它们的大小和目标,它们往往在科学上是不同的,它们面临着共同的挑战,包括噪声损坏,注释数据的有限可用性以及对高保真图像重建的需求。本研究对生成模型的各种变体进行了批判性回顾,特别关注跨领域应用。与现有的主要集中在单一学科的调查不同,这项研究强调了生物医学和天文成像生成模型的可转移性和适应性。该综述强调了生成模型,特别是生成对抗网络(GANS)在增强生物医学和天文学研究中的数据生成、图像恢复和分析方面所提供的潜力。
{"title":"Cross-domain convergence of generative models: From biomedical to astronomical applications","authors":"Hajer Ghodhbani ,&nbsp;Suvendi Rimer ,&nbsp;Khmaies Ouahada ,&nbsp;Adel M. Alimi","doi":"10.1016/j.mlwa.2026.100841","DOIUrl":"10.1016/j.mlwa.2026.100841","url":null,"abstract":"<div><div>This paper investigates the convergence of generative modeling techniques across diverse image analysis tasks by examining their application in two data-intensive scientific domains: biomedical imaging and astronomy. In these two domains, which tend to be scientifically distinct due to their size and aims, they share common challenges, including noise corruption, limited availability of annotated data, and the demand for high-fidelity image reconstruction. This study provides a critical review of the various variants of generative models, with a particular focus on cross-domain applications. Unlike existing surveys that predominantly focus on a single discipline, this study emphasises the transferability and adaptability of generative models across biomedical and astronomical imaging. The proposed review highlights the potential offered by generative models, particularly Generative Adversarial Networks (GANS), in enhancing data generation, image restoration, and analysis in both biomedical and astronomical studies.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100841"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal information fusion for financial forecasting via cross-attention and calibrated uncertainty 基于交叉关注和校准不确定性的财务预测多模态信息融合
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-12 DOI: 10.1016/j.mlwa.2026.100840
Josué Bustarviejo, Carlos Bousoño-Calzón
Forecasting financial markets requires synthesizing heterogeneous information sources such as historical prices, company indicators, and unstructured news, whose interactions are nonlinear and regime dependent. We investigate a cross-attention transformer framework that fuses these modalities for probabilistic financial forecasting with calibrated uncertainty. We propose a framework that anchors fusion on Chronos-T5, a transformer pretrained on large-scale time series and used here as a frozen encoder for market dynamics. Parameter-efficient projection layers map company-level indicators and daily news embeddings into a shared representation space, while bidirectional cross-attention learns how to align and weight the different sources. We evaluate the approach on daily EUR/USD forecasting, with additional experiments across currency pairs and market regimes. The multimodal model consistently outperforms autoregressive and deep learning baselines in point prediction, as measured by mean squared error and Diebold–Mariano tests, and delivers sharper probabilistic forecasts according to the continuous ranked probability score (CRPS), weighted interval score (WIS), and empirical coverage. Raw predictive distributions tend to be overconfident, but a post-hoc split conformal recalibration restores nominal coverage and improves interval quality without retraining the backbone. From a soft computing perspective, the system combines approximate Bayesian inference via Monte Carlo dropout with distribution-free calibration, within a structured cross-modal fusion architecture that improves the reliability and interpretability of multimodal financial forecasts.
预测金融市场需要综合异构信息源,如历史价格、公司指标和非结构化新闻,它们的相互作用是非线性的,并且依赖于制度。我们研究了一个交叉关注变压器框架,融合了这些模式的概率金融预测与校准的不确定性。我们提出了一个框架,将融合锚定在Chronos-T5上,这是一个在大规模时间序列上进行预训练的变压器,在这里用作市场动态的冻结编码器。参数高效的投影层将公司级指标和每日新闻嵌入映射到共享的表示空间中,而双向交叉注意学习如何对齐和加权不同的来源。我们评估了每日欧元/美元预测的方法,并对货币对和市场机制进行了额外的实验。通过均方误差和Diebold-Mariano测试测量,多模态模型在点预测方面始终优于自回归和深度学习基线,并根据连续排名概率得分(CRPS)、加权区间得分(WIS)和经验覆盖率提供更清晰的概率预测。原始的预测分布往往过于自信,但事后分割保形重新校准恢复了名义覆盖并提高了间隔质量,而无需重新训练主干。从软计算的角度来看,该系统结合了通过蒙特卡罗dropout和无分布校准的近似贝叶斯推断,在结构化的跨模态融合架构内,提高了多模态财务预测的可靠性和可解释性。
{"title":"Multimodal information fusion for financial forecasting via cross-attention and calibrated uncertainty","authors":"Josué Bustarviejo,&nbsp;Carlos Bousoño-Calzón","doi":"10.1016/j.mlwa.2026.100840","DOIUrl":"10.1016/j.mlwa.2026.100840","url":null,"abstract":"<div><div>Forecasting financial markets requires synthesizing heterogeneous information sources such as historical prices, company indicators, and unstructured news, whose interactions are nonlinear and regime dependent. We investigate a cross-attention transformer framework that fuses these modalities for probabilistic financial forecasting with calibrated uncertainty. We propose a framework that anchors fusion on Chronos-T5, a transformer pretrained on large-scale time series and used here as a frozen encoder for market dynamics. Parameter-efficient projection layers map company-level indicators and daily news embeddings into a shared representation space, while bidirectional cross-attention learns how to align and weight the different sources. We evaluate the approach on daily EUR/USD forecasting, with additional experiments across currency pairs and market regimes. The multimodal model consistently outperforms autoregressive and deep learning baselines in point prediction, as measured by mean squared error and Diebold–Mariano tests, and delivers sharper probabilistic forecasts according to the continuous ranked probability score (CRPS), weighted interval score (WIS), and empirical coverage. Raw predictive distributions tend to be overconfident, but a post-hoc split conformal recalibration restores nominal coverage and improves interval quality without retraining the backbone. From a soft computing perspective, the system combines approximate Bayesian inference via Monte Carlo dropout with distribution-free calibration, within a structured cross-modal fusion architecture that improves the reliability and interpretability of multimodal financial forecasts.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100840"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conformalized classifiers with reject option 具有拒绝选项的符合化分类器
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-06 DOI: 10.1016/j.mlwa.2026.100838
Ulf Johansson, Cecilia Sönströd
In data-driven decision support, predictive models built using machine learning aid in making informed decisions. In this context, models with a reject option may refrain from making predictions for certain instances. Accurately assessing the trade-off between predictive performance and throughput requires the ability to estimate performance at different rejection levels in advance. In this paper, we demonstrate how conformal prediction can be used for this purpose. Under exchangeability, the proposed conformalized classifiers can perfectly estimate accuracy or precision for any rejection level. In an empirical investigation using 41 publicly available datasets, the conformalized classifiers with a reject option are shown to clearly outperform probabilistic predictors calibrated with state-of-the-art techniques.
在数据驱动的决策支持中,使用机器学习建立的预测模型有助于做出明智的决策。在这种情况下,带有拒绝选项的模型可能会避免对某些实例进行预测。准确评估预测性能和吞吐量之间的权衡需要能够提前估计不同拒绝级别下的性能。在本文中,我们演示了如何将保形预测用于此目的。在互换性下,所提出的符合化分类器可以很好地估计任何拒绝水平的准确度或精度。在使用41个公开可用数据集的实证调查中,具有拒绝选项的符合化分类器被证明明显优于使用最先进技术校准的概率预测器。
{"title":"Conformalized classifiers with reject option","authors":"Ulf Johansson,&nbsp;Cecilia Sönströd","doi":"10.1016/j.mlwa.2026.100838","DOIUrl":"10.1016/j.mlwa.2026.100838","url":null,"abstract":"<div><div>In data-driven decision support, predictive models built using machine learning aid in making informed decisions. In this context, models with a reject option may refrain from making predictions for certain instances. Accurately assessing the trade-off between predictive performance and throughput requires the ability to estimate performance at different rejection levels in advance. In this paper, we demonstrate how conformal prediction can be used for this purpose. Under exchangeability, the proposed conformalized classifiers can perfectly estimate accuracy or precision for any rejection level. In an empirical investigation using 41 publicly available datasets, the conformalized classifiers with a reject option are shown to clearly outperform probabilistic predictors calibrated with state-of-the-art techniques.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100838"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing marine mammal monitoring: Large-scale UAV delphinidae datasets and robust motion tracking for group size estimation 推进海洋哺乳动物监测:大规模无人机飞燕科数据集和鲁棒运动跟踪群体大小估计
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-04 DOI: 10.1016/j.mlwa.2025.100808
Leonardo Viegas Filipe , João Canelas , Mário Vieira , Francisco Correia da Fonseca , André Cid , Joana Castro , Inês Machado
Reliable estimates of dolphin abundance are essential for conservation and impact assessment, yet manual analysis of aerial surveys is time-consuming and difficult to scale. This paper presents an end-to-end pipeline for automatic dolphin counting from unmanned aerial vehicle (UAV) video that combines modern object detection and multi-object tracking. We construct a large detection dataset of 64,705 images with 225,305 dolphin bounding boxes and a tracking dataset of 54,274 frames with 207,850 boxes and 603 unique tracks, derived from UAV line-transect surveys. Using these data, we train a YOLO11-based detector that achieves a precision of approximately 0.93 across a range of sea states. For tracking, we adopt BoT-SORT and tune its parameters with a genetic algorithm using a multi-metric objective, reducing ID fragmentation by about 29% relative to default settings. Recent YOLO-based cetacean detectors trained on UAV imagery of beluga whales report precision/recall around 0.92/0.92 for adults and 0.94/0.89 for calves, but rely on DeepSORT tracking whose MOTA remains below 0.5 and must be boosted to roughly 0.7 with post-hoc trajectory post-processing. In this context, our pipeline offers competitive detection performance, substantially larger and fully documented detection and tracking benchmarks, and GA-optimized tracking without manual post-processing. Applied to dolphin group counting, the full pipeline attains a mean absolute error of 1.24 on a held-out validation set, demonstrating that UAV-based automated counting can support robust, scalable monitoring of coastal dolphin populations.
海豚数量的可靠估计对保育和影响评估至关重要,但人工分析航空调查既耗时又难以衡量。提出了一种结合现代目标检测和多目标跟踪的端到端无人机视频海豚自动计数方法。我们构建了一个由64,705张图像组成的大型检测数据集,其中包含225,305个海豚边界框;以及一个由54,274帧图像组成的跟踪数据集,其中包含207,850个框和603个独特的轨迹。利用这些数据,我们训练了一个基于yolo11的探测器,该探测器在各种海况下的精度约为0.93。对于跟踪,我们采用BoT-SORT并使用多度量目标的遗传算法调整其参数,相对于默认设置减少了约29%的ID碎片。最近基于yolo的鲸类探测器在无人机图像上训练的白鲸报告精度/召回率约为0.92/0.92,幼鲸为0.94/0.89,但依赖于深度排序跟踪,其MOTA仍然低于0.5,必须通过事后轨迹后处理提高到大约0.7。在这种情况下,我们的管道提供具有竞争力的检测性能,更大的和完整文档的检测和跟踪基准,以及无需手动后处理的ga优化跟踪。应用于海豚种群计数,整个管道的平均绝对误差为1.24,这表明基于无人机的自动计数可以支持强大的、可扩展的沿海海豚种群监测。
{"title":"Advancing marine mammal monitoring: Large-scale UAV delphinidae datasets and robust motion tracking for group size estimation","authors":"Leonardo Viegas Filipe ,&nbsp;João Canelas ,&nbsp;Mário Vieira ,&nbsp;Francisco Correia da Fonseca ,&nbsp;André Cid ,&nbsp;Joana Castro ,&nbsp;Inês Machado","doi":"10.1016/j.mlwa.2025.100808","DOIUrl":"10.1016/j.mlwa.2025.100808","url":null,"abstract":"<div><div>Reliable estimates of dolphin abundance are essential for conservation and impact assessment, yet manual analysis of aerial surveys is time-consuming and difficult to scale. This paper presents an end-to-end pipeline for automatic dolphin counting from unmanned aerial vehicle (UAV) video that combines modern object detection and multi-object tracking. We construct a large detection dataset of 64,705 images with 225,305 dolphin bounding boxes and a tracking dataset of 54,274 frames with 207,850 boxes and 603 unique tracks, derived from UAV line-transect surveys. Using these data, we train a YOLO11-based detector that achieves a precision of approximately 0.93 across a range of sea states. For tracking, we adopt BoT-SORT and tune its parameters with a genetic algorithm using a multi-metric objective, reducing ID fragmentation by about 29% relative to default settings. Recent YOLO-based cetacean detectors trained on UAV imagery of beluga whales report precision/recall around 0.92/0.92 for adults and 0.94/0.89 for calves, but rely on DeepSORT tracking whose MOTA remains below 0.5 and must be boosted to roughly 0.7 with post-hoc trajectory post-processing. In this context, our pipeline offers competitive detection performance, substantially larger and fully documented detection and tracking benchmarks, and GA-optimized tracking without manual post-processing. Applied to dolphin group counting, the full pipeline attains a mean absolute error of 1.24 on a held-out validation set, demonstrating that UAV-based automated counting can support robust, scalable monitoring of coastal dolphin populations.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100808"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking a time-series foundation model (TimeGPT) for real-world forecasting applications 对真实世界预测应用程序的时间序列基础模型(TimeGPT)进行基准测试
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-11-27 DOI: 10.1016/j.mlwa.2025.100801
Xiao Zhang , Srinath Sridharan , Nur Hakim Bin Zahrin , Narayan Venkataraman , Siang Hiong Goh
Accurate forecasting of hospital demand is essential for operational resilience, yet traditional statistical and machine learning approaches often require extensive feature engineering and tuning, limiting adoption in resource-constrained environments. Foundation models for time-series forecasting offer the potential for robust, zero-shot performance across domains. This study evaluates the feasibility of TimeGPT, a general-purpose time-series foundation model, for forecasting daily Emergency Department (ED) arrivals.
We benchmarked TimeGPT against Seasonal Autoregressive Integrated Moving Average (SARIMAX), Prophet, and XGBoost under univariate and multivariate configurations. The experimental design simulated operational constraints by limiting the training window to 30 days and using a rolling forecast over a 60-day holdout period. Forecast accuracy was assessed using root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and directional accuracy.
TimeGPT consistently ranked among the top-performing models. In the univariate setting, it achieved a MAPE of 7.7% and directional accuracy of 75%, comparable to or exceeding traditional models with extensive feature engineering. TimeGPT required no model-specific tuning and maintained accuracy without exogenous features such as weather or calendar variables. SARIMAX achieved the best results in the temporal-plus-weather configuration (MAPE 7.0%, RMSE 31.0) but required substantially more configuration. TimeGPT recorded zero large-error days (>30% deviation), while SARIMAX had 5 such days, underscoring the trade-off between accuracy and robustness.
This benchmark demonstrates that foundation models can deliver accurate, reliable forecasts in healthcare operations with minimal data preparation. TimeGPT’s zero-shot capability highlights its potential as a scalable solution for diverse operational forecasting challenges.
医院需求的准确预测对于运营弹性至关重要,但传统的统计和机器学习方法通常需要大量的特征工程和调优,限制了在资源受限环境中的采用。时间序列预测的基础模型提供了跨领域的稳健、零概率性能的潜力。本研究评估TimeGPT的可行性,一个通用的时间序列基础模型,预测每日急诊科(ED)到达。我们在单变量和多变量配置下对TimeGPT与季节性自回归综合移动平均线(SARIMAX)、Prophet和XGBoost进行基准测试。实验设计通过将训练窗口限制为30天,并在60天的抵制期内使用滚动预测来模拟操作约束。预测精度采用均方根误差(RMSE)、平均绝对误差(MAE)、平均绝对百分比误差(MAPE)和方向精度进行评估。TimeGPT一直是表现最好的模型之一。在单变量设置下,它的MAPE为7.7%,方向精度为75%,与具有广泛特征工程的传统模型相当或超过。TimeGPT不需要特定于模型的调优,并且不需要天气或日历变量等外生特征来保持准确性。SARIMAX在时间+天气配置中取得了最好的结果(MAPE为7.0%,RMSE为31.0),但需要更多的配置。TimeGPT记录的大误差天数为零(>;30%偏差),而SARIMAX记录的大误差天数为5天,强调了准确性和鲁棒性之间的权衡。该基准测试表明,基础模型可以在医疗保健操作中以最少的数据准备提供准确、可靠的预测。TimeGPT的零射击能力突出了其作为一种可扩展的解决方案的潜力,可用于各种作战预测挑战。
{"title":"Benchmarking a time-series foundation model (TimeGPT) for real-world forecasting applications","authors":"Xiao Zhang ,&nbsp;Srinath Sridharan ,&nbsp;Nur Hakim Bin Zahrin ,&nbsp;Narayan Venkataraman ,&nbsp;Siang Hiong Goh","doi":"10.1016/j.mlwa.2025.100801","DOIUrl":"10.1016/j.mlwa.2025.100801","url":null,"abstract":"<div><div>Accurate forecasting of hospital demand is essential for operational resilience, yet traditional statistical and machine learning approaches often require extensive feature engineering and tuning, limiting adoption in resource-constrained environments. Foundation models for time-series forecasting offer the potential for robust, zero-shot performance across domains. This study evaluates the feasibility of TimeGPT, a general-purpose time-series foundation model, for forecasting daily Emergency Department (ED) arrivals.</div><div>We benchmarked TimeGPT against Seasonal Autoregressive Integrated Moving Average (SARIMAX), Prophet, and XGBoost under univariate and multivariate configurations. The experimental design simulated operational constraints by limiting the training window to 30 days and using a rolling forecast over a 60-day holdout period. Forecast accuracy was assessed using root mean squared error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and directional accuracy.</div><div>TimeGPT consistently ranked among the top-performing models. In the univariate setting, it achieved a MAPE of 7.7% and directional accuracy of 75%, comparable to or exceeding traditional models with extensive feature engineering. TimeGPT required no model-specific tuning and maintained accuracy without exogenous features such as weather or calendar variables. SARIMAX achieved the best results in the temporal-plus-weather configuration (MAPE 7.0%, RMSE 31.0) but required substantially more configuration. TimeGPT recorded zero large-error days (&gt;30% deviation), while SARIMAX had 5 such days, underscoring the trade-off between accuracy and robustness.</div><div>This benchmark demonstrates that foundation models can deliver accurate, reliable forecasts in healthcare operations with minimal data preparation. TimeGPT’s zero-shot capability highlights its potential as a scalable solution for diverse operational forecasting challenges.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100801"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel hybrid model of flying geese optimization and attention-LSTM for predicting survival outcomes in clear cell renal cell carcinoma 一种预测透明细胞肾细胞癌生存结果的新的雁优化和注意力- lstm混合模型
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-16 DOI: 10.1016/j.mlwa.2026.100846
Cheng-Hong Yang , Tin-Ho Cheung , Yi-Ling Chen , Sin-Hua Moi , Li-Yeh Chuang
Clear Cell Renal Cell Carcinoma (ccRCC) is the most aggressive and metastatic subtype of renal cell carcinoma and also the type with the highest mortality rate. To enhance survival prediction accuracy and facilitate informed clinical decision-making, this study presents a hybrid model that combines the Flying Geese Optimization Algorithm (FGOA) with an attention-based Long Short-Term Memory (A-LSTM) network. The proposed framework is trained and evaluated using data from the Cancer Genome Atlas Kidney Clear Cell Carcinoma (TCGA-KIRC) database. The feature selection process employed seven representative optimization algorithms covering evolutionary, swarm intelligence, and bio-inspired paradigms. The selected features were then analyzed using the attention-based A-LSTM network to predict survival outcomes in patients with ccRCC. Evaluation metrics for model performance included accuracy, precision, recall, and F1 score. The results showed that the FGOA-A-LSTM model performed best, with an accuracy of 80.8%, precision of 81.5%, recall of 86.9%, and F1 score of 84.1%, outperforming the other models. This result also indicates that on imbalanced datasets, the F1 score may be higher than the accuracy. Furthermore, Cox proportional hazards regression analysis showed that survival outcomes were significantly correlated with factors such as gender, tumor stage, previous treatment, and treatment method. This study introduces an innovative FGOA-A-LSTM framework that improves survival prediction in ccRCC. By integrating optimization-driven feature selection with an attention-enhanced deep learning architecture, the work makes a contribution to improving clinical risk assessment.
透明细胞肾细胞癌(ccRCC)是肾细胞癌中最具侵袭性和转移性的亚型,也是死亡率最高的类型。为了提高生存预测的准确性,促进临床决策,本研究提出了一种将雁群优化算法(FGOA)与基于注意的长短期记忆(a - lstm)网络相结合的混合模型。所提出的框架使用来自癌症基因组图谱肾透明细胞癌(TCGA-KIRC)数据库的数据进行训练和评估。特征选择过程采用了七种具有代表性的优化算法,包括进化、群体智能和生物启发范式。然后使用基于注意力的A-LSTM网络分析所选择的特征,以预测ccRCC患者的生存结果。模型性能的评估指标包括准确性、精密度、召回率和F1分数。结果表明,FGOA-A-LSTM模型表现最好,准确率为80.8%,精密度为81.5%,召回率为86.9%,F1得分为84.1%,优于其他模型。这一结果也表明,在不平衡的数据集上,F1得分可能高于准确率。此外,Cox比例风险回归分析显示,生存结局与性别、肿瘤分期、既往治疗、治疗方法等因素显著相关。本研究引入了一种创新的FGOA-A-LSTM框架,可提高ccRCC的生存预测。通过将优化驱动的特征选择与注意力增强的深度学习架构相结合,该工作有助于改善临床风险评估。
{"title":"A novel hybrid model of flying geese optimization and attention-LSTM for predicting survival outcomes in clear cell renal cell carcinoma","authors":"Cheng-Hong Yang ,&nbsp;Tin-Ho Cheung ,&nbsp;Yi-Ling Chen ,&nbsp;Sin-Hua Moi ,&nbsp;Li-Yeh Chuang","doi":"10.1016/j.mlwa.2026.100846","DOIUrl":"10.1016/j.mlwa.2026.100846","url":null,"abstract":"<div><div>Clear Cell Renal Cell Carcinoma (ccRCC) is the most aggressive and metastatic subtype of renal cell carcinoma and also the type with the highest mortality rate. To enhance survival prediction accuracy and facilitate informed clinical decision-making, this study presents a hybrid model that combines the Flying Geese Optimization Algorithm (FGOA) with an attention-based Long Short-Term Memory (A-LSTM) network. The proposed framework is trained and evaluated using data from the Cancer Genome Atlas Kidney Clear Cell Carcinoma (TCGA-KIRC) database. The feature selection process employed seven representative optimization algorithms covering evolutionary, swarm intelligence, and bio-inspired paradigms. The selected features were then analyzed using the attention-based A-LSTM network to predict survival outcomes in patients with ccRCC. Evaluation metrics for model performance included accuracy, precision, recall, and F1 score. The results showed that the FGOA-A-LSTM model performed best, with an accuracy of 80.8%, precision of 81.5%, recall of 86.9%, and F1 score of 84.1%, outperforming the other models. This result also indicates that on imbalanced datasets, the F1 score may be higher than the accuracy. Furthermore, Cox proportional hazards regression analysis showed that survival outcomes were significantly correlated with factors such as gender, tumor stage, previous treatment, and treatment method. This study introduces an innovative FGOA-A-LSTM framework that improves survival prediction in ccRCC. By integrating optimization-driven feature selection with an attention-enhanced deep learning architecture, the work makes a contribution to improving clinical risk assessment.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100846"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced Heart disease prediction using LLM ranked feature selection, Dynamic custom Kernel 增强心脏病预测使用LLM排名特征选择,动态自定义内核
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-31 DOI: 10.1016/j.mlwa.2026.100860
Nikesh P.L. , Sebastian Terence , Anishin Raj , Jude Immaculate , Deepak Mishra
Heart disease, a major cause of death worldwide, accounts for millions of deaths each year. This makes it critical to detect heart disease at an earlier stage so that a treatment plan, including medications and counseling, can be started. Machine learning (ML) algorithms trained on large datasets have made it possible to predict heart disease more effectively. Traditional machine learning approaches provide statistical correlations, but often lack explicit integration of clinical knowledge, which limits their usefulness in real-world scenarios. This paper investigates the use of Large Language Model (LLM) combined with Retrieval-Augmented Generation (RAG) to derive clinically grounded feature relevance based on medical guidelines. A curated corpus of medical guidelines and practice protocols from internationally approved organizations was used to train the RAG pipeline. The features were ranked using LLM powered by RAG, and themost important features were selected and used in a Support Vector Machine (SVM) with a custom kernel. A custom formulation combining linear and non linear functions were explored as an auxiliary modeling component. This enables the model to keep the clinical importance of the features, linear transparency and also captures complex interactions using a polynomial function. This approach is evaluated on the UCI Heart Disease dataset, which includes data from Cleveland, Hungary, Switzerland, and VA Medical Center in Long Beach. This study conducted in two parts one using only Cleveland alone and a full set of data using all 4 regions. This integration of statistical learning with LLM driven reasoning supports cardiovascular risk assessment in a clinically informed manner. This approach helps to identify clinically relevant features for the learning process. On the Cleveland dataset the model achieved an accuracy of 95%, an F1 score of 0.936, and an AUC-ROC of 0.973, but it was comparable with traditional models and without weighted kernel due to the size of the data. When applied on the combined dataset, using the entire UCI dataset, the model achieved an accuracy of 93.3%, F1 score 0.923 and AUC-ROC of 0.961. Statistical testing showed that the weighted and unweighted kernels performed similarly, suggesting that the primary contribution arises from clinically guided feature selection rather than kernel weighting. The combination of statistical methods and reasoning from LLM models improves both the effectiveness and clarity of predictions. This process helps develop clinically informed AI systems for cardiovascular risk assessment. This paper also includes a comparative study of logistic regression, decision tree, random forest, gradient boosting, and support vector machine with RBF, sigmoid, linear and polynomial kernels.
心脏病是全世界死亡的主要原因之一,每年造成数百万人死亡。因此,在早期阶段发现心脏病至关重要,这样就可以开始制定包括药物和咨询在内的治疗计划。在大型数据集上训练的机器学习(ML)算法使更有效地预测心脏病成为可能。传统的机器学习方法提供统计相关性,但往往缺乏临床知识的明确整合,这限制了它们在现实世界中的实用性。本文研究了使用大型语言模型(LLM)结合检索增强生成(RAG)来获得基于医疗指南的临床基础特征相关性。国际认可组织的医疗准则和实践规程的精选语料库用于培训RAG管道。使用由RAG支持的LLM对特征进行排序,并选择最重要的特征并在具有自定义内核的支持向量机(SVM)中使用。结合线性和非线性函数的自定义公式作为辅助建模组件进行了探索。这使模型能够保持特征的临床重要性,线性透明度,并使用多项式函数捕获复杂的相互作用。该方法在UCI心脏病数据集上进行了评估,该数据集包括来自克利夫兰、匈牙利、瑞士和长滩VA医疗中心的数据。本研究分两部分进行,第一部分仅使用克利夫兰,另一部分使用所有4个地区的全套数据。这种统计学学习与法学硕士驱动推理的整合支持心血管风险评估在临床知情的方式。这种方法有助于识别学习过程的临床相关特征。在Cleveland数据集上,模型的准确率为95%,F1得分为0.936,AUC-ROC为0.973,但由于数据的大小,它与传统模型具有可比性,并且没有加权核。当应用于组合数据集时,使用整个UCI数据集,该模型的准确率为93.3%,F1得分为0.923,AUC-ROC为0.961。统计测试表明,加权和未加权的核表现相似,这表明主要贡献来自临床指导的特征选择,而不是核加权。统计方法和LLM模型推理的结合提高了预测的有效性和清晰度。这一过程有助于开发临床知情的人工智能系统,用于心血管风险评估。本文还比较研究了逻辑回归、决策树、随机森林、梯度增强和支持向量机与RBF、s型核、线性核和多项式核的关系。
{"title":"Enhanced Heart disease prediction using LLM ranked feature selection, Dynamic custom Kernel","authors":"Nikesh P.L. ,&nbsp;Sebastian Terence ,&nbsp;Anishin Raj ,&nbsp;Jude Immaculate ,&nbsp;Deepak Mishra","doi":"10.1016/j.mlwa.2026.100860","DOIUrl":"10.1016/j.mlwa.2026.100860","url":null,"abstract":"<div><div>Heart disease, a major cause of death worldwide, accounts for millions of deaths each year. This makes it critical to detect heart disease at an earlier stage so that a treatment plan, including medications and counseling, can be started. Machine learning (ML) algorithms trained on large datasets have made it possible to predict heart disease more effectively. Traditional machine learning approaches provide statistical correlations, but often lack explicit integration of clinical knowledge, which limits their usefulness in real-world scenarios. This paper investigates the use of Large Language Model (LLM) combined with Retrieval-Augmented Generation (RAG) to derive clinically grounded feature relevance based on medical guidelines. A curated corpus of medical guidelines and practice protocols from internationally approved organizations was used to train the RAG pipeline. The features were ranked using LLM powered by RAG, and themost important features were selected and used in a Support Vector Machine (SVM) with a custom kernel. A custom formulation combining linear and non linear functions were explored as an auxiliary modeling component. This enables the model to keep the clinical importance of the features, linear transparency and also captures complex interactions using a polynomial function. This approach is evaluated on the UCI Heart Disease dataset, which includes data from Cleveland, Hungary, Switzerland, and VA Medical Center in Long Beach. This study conducted in two parts one using only Cleveland alone and a full set of data using all 4 regions. This integration of statistical learning with LLM driven reasoning supports cardiovascular risk assessment in a clinically informed manner. This approach helps to identify clinically relevant features for the learning process. On the Cleveland dataset the model achieved an accuracy of 95%, an F1 score of 0.936, and an AUC-ROC of 0.973, but it was comparable with traditional models and without weighted kernel due to the size of the data. When applied on the combined dataset, using the entire UCI dataset, the model achieved an accuracy of 93.3%, F1 score 0.923 and AUC-ROC of 0.961. Statistical testing showed that the weighted and unweighted kernels performed similarly, suggesting that the primary contribution arises from clinically guided feature selection rather than kernel weighting. The combination of statistical methods and reasoning from LLM models improves both the effectiveness and clarity of predictions. This process helps develop clinically informed AI systems for cardiovascular risk assessment. This paper also includes a comparative study of logistic regression, decision tree, random forest, gradient boosting, and support vector machine with RBF, sigmoid, linear and polynomial kernels.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100860"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146188223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoding vision transformer variations for image classification: A guide to performance and usability 解码图像分类用视觉变换器的变化:性能和可用性指南
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2026-01-14 DOI: 10.1016/j.mlwa.2026.100844
João Montrezol , Hugo S. Oliveira , Hélder P. Oliveira
With the rise of Transformers, Vision Transformers (ViTs) have become a new standard in visual recognition. This has led to the development of numerous architectures with diverse designs and applications. This survey identifies 22 key ViT and hybrid CNN–ViT models, along with 5 top Convolutional Neural Network (CNN) models. These were selected based on their new architecture, relevance to benchmarks, and overall impact. The models are organised using a defined taxonomy formed by CNN-based, pure Transformer-based, and hybrid architectures. We analyse their main components, training methods, and computational features, while assessing performance using reported results on standard benchmarks such as ImageNet and CIFAR, along with our training and fine-tuning evaluations on specific imaging datasets. In addition to accuracy, we look at real-world deployment issues by analysing the trade-offs between accuracy and efficiency in embedded, mobile, and clinical settings. The results indicate that modern CNNs are still very competitive in limited-resource environments, while advanced ViT variants perform well after large-scale pretraining, especially in areas with high variability. Hybrid CNN–ViT architectures, on the other hand, tend to offer the best balance between accuracy, data efficiency, and computational cost. This survey establishes a consolidated benchmark and reference framework for understanding the evolution, capabilities, and practical applicability of contemporary vision architectures.
随着变形金刚的兴起,视觉变形金刚(Vision Transformers, ViTs)已经成为视觉识别的新标准。这导致了许多具有不同设计和应用程序的架构的发展。该调查确定了22个关键的ViT和混合CNN - ViT模型,以及5个顶级卷积神经网络(CNN)模型。它们是根据它们的新架构、与基准的相关性以及总体影响来选择的。这些模型使用由基于cnn、纯基于transformer和混合架构形成的定义分类法进行组织。我们分析了它们的主要组成部分、训练方法和计算特征,同时使用ImageNet和CIFAR等标准基准测试报告的结果评估性能,以及我们对特定成像数据集的训练和微调评估。除了准确性之外,我们还通过分析在嵌入式、移动和临床环境中准确性和效率之间的权衡来研究实际部署问题。结果表明,现代cnn在有限资源环境中仍然具有很强的竞争力,而先进的ViT变体在大规模预训练后表现良好,特别是在高变异性的区域。另一方面,混合CNN-ViT架构倾向于在准确性、数据效率和计算成本之间提供最佳平衡。这个调查建立了一个统一的基准和参考框架,用于理解当代视觉体系结构的演变、能力和实际适用性。
{"title":"Decoding vision transformer variations for image classification: A guide to performance and usability","authors":"João Montrezol ,&nbsp;Hugo S. Oliveira ,&nbsp;Hélder P. Oliveira","doi":"10.1016/j.mlwa.2026.100844","DOIUrl":"10.1016/j.mlwa.2026.100844","url":null,"abstract":"<div><div>With the rise of Transformers, Vision Transformers (ViTs) have become a new standard in visual recognition. This has led to the development of numerous architectures with diverse designs and applications. This survey identifies 22 key ViT and hybrid CNN–ViT models, along with 5 top Convolutional Neural Network (CNN) models. These were selected based on their new architecture, relevance to benchmarks, and overall impact. The models are organised using a defined taxonomy formed by CNN-based, pure Transformer-based, and hybrid architectures. We analyse their main components, training methods, and computational features, while assessing performance using reported results on standard benchmarks such as ImageNet and CIFAR, along with our training and fine-tuning evaluations on specific imaging datasets. In addition to accuracy, we look at real-world deployment issues by analysing the trade-offs between accuracy and efficiency in embedded, mobile, and clinical settings. The results indicate that modern CNNs are still very competitive in limited-resource environments, while advanced ViT variants perform well after large-scale pretraining, especially in areas with high variability. Hybrid CNN–ViT architectures, on the other hand, tend to offer the best balance between accuracy, data efficiency, and computational cost. This survey establishes a consolidated benchmark and reference framework for understanding the evolution, capabilities, and practical applicability of contemporary vision architectures.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100844"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-driven detection of tiny pests in foliage: Integrating image processing and deep learning 树叶中微小害虫的人工智能检测:融合图像处理和深度学习
IF 4.9 Pub Date : 2026-03-01 Epub Date: 2025-12-31 DOI: 10.1016/j.mlwa.2025.100834
Lucía Baeza-Moreno , Pedro Blanco-Carmona , Eduardo Hidalgo-Fort , Rubén Martín-Clemente , Ramón González-Carvajal
We present a novel computer vision method for detecting insect pests on plant and tree leaves under real-world conditions, combining deep learning with classical image processing techniques. Detecting small, sparsely distributed, or camouflaged insects is challenging, as current state-of-the-art object detection methods, primarily designed for larger objects, often overlook them. Our approach to this problem is twofold. First, we employ a deep learning model to analyze suspicious leaves for anomalies (a task well suited to deep learning). However, since deep models struggle with tiny objects in complex backgrounds, we complement them with conventional image processing to pre-identify potentially infested foliage, guiding the model toward relevant areas and mitigating its limitations. This combined strategy proves effective and competitive with other methods across diverse datasets and real-world scenarios. Furthermore, we also conduct a detailed analysis to interpret the model’s predictions, strengthening confidence in its effectiveness.
我们提出了一种新的计算机视觉方法,将深度学习与经典图像处理技术相结合,在现实世界条件下检测植物和树木叶片上的害虫。检测小的、稀疏分布的或伪装的昆虫是具有挑战性的,因为目前最先进的目标检测方法主要是为较大的目标设计的,经常忽略它们。我们解决这个问题的方法是双重的。首先,我们采用深度学习模型来分析可疑叶子的异常(这是一项非常适合深度学习的任务)。然而,由于深度模型在复杂背景下与微小物体作斗争,我们用传统的图像处理来补充它们,以预先识别潜在的受感染的树叶,引导模型走向相关区域并减轻其局限性。事实证明,这种组合策略在不同的数据集和现实场景中与其他方法相比是有效的,并且具有竞争力。此外,我们还进行了详细的分析来解释模型的预测,加强了对其有效性的信心。
{"title":"AI-driven detection of tiny pests in foliage: Integrating image processing and deep learning","authors":"Lucía Baeza-Moreno ,&nbsp;Pedro Blanco-Carmona ,&nbsp;Eduardo Hidalgo-Fort ,&nbsp;Rubén Martín-Clemente ,&nbsp;Ramón González-Carvajal","doi":"10.1016/j.mlwa.2025.100834","DOIUrl":"10.1016/j.mlwa.2025.100834","url":null,"abstract":"<div><div>We present a novel computer vision method for detecting insect pests on plant and tree leaves under real-world conditions, combining deep learning with classical image processing techniques. Detecting small, sparsely distributed, or camouflaged insects is challenging, as current state-of-the-art object detection methods, primarily designed for larger objects, often overlook them. Our approach to this problem is twofold. First, we employ a deep learning model to analyze suspicious leaves for anomalies (a task well suited to deep learning). However, since deep models struggle with tiny objects in complex backgrounds, we complement them with conventional image processing to pre-identify potentially infested foliage, guiding the model toward relevant areas and mitigating its limitations. This combined strategy proves effective and competitive with other methods across diverse datasets and real-world scenarios. Furthermore, we also conduct a detailed analysis to interpret the model’s predictions, strengthening confidence in its effectiveness.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100834"},"PeriodicalIF":4.9,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine learning with applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1