Pub Date : 2026-01-20DOI: 10.1016/j.mlwa.2026.100850
Antoine Oriou , Philipp Krah , Julian Koellermeier
This paper introduces the Intrinsic Dimension Estimating Autoencoder (IDEA), which identifies the underlying intrinsic dimension of a wide range of datasets whose samples lie on either linear or nonlinear manifolds. Beyond estimating the intrinsic dimension, IDEA is also able to reconstruct the original dataset after projecting it onto the corresponding latent space, which is structured using re-weighted double CancelOut layers. Our key contribution is the introduction of the projected reconstruction loss term, guiding the training of the model by continuously assessing the reconstruction quality under the removal of an additional latent dimension.
We first assess the performance of IDEA on a series of theoretical benchmarks to validate its robustness. These experiments allow us to test its reconstruction ability and compare its performance with state-of-the-art intrinsic dimension estimators. The benchmarks show good accuracy and high versatility of our approach. Subsequently, we apply our model to data generated from the numerical solution of a vertically resolved one-dimensional free-surface flow, following a pointwise discretization of the vertical velocity profile in the horizontal direction, vertical direction, and time. IDEA succeeds in estimating the dataset’s intrinsic dimension and then reconstructs the original solution by working directly within the projection space identified by the network.
{"title":"Intrinsic Dimension Estimating Autoencoder (IDEA) using CancelOut layer and a projected loss","authors":"Antoine Oriou , Philipp Krah , Julian Koellermeier","doi":"10.1016/j.mlwa.2026.100850","DOIUrl":"10.1016/j.mlwa.2026.100850","url":null,"abstract":"<div><div>This paper introduces the Intrinsic Dimension Estimating Autoencoder (IDEA), which identifies the underlying intrinsic dimension of a wide range of datasets whose samples lie on either linear or nonlinear manifolds. Beyond estimating the intrinsic dimension, IDEA is also able to reconstruct the original dataset after projecting it onto the corresponding latent space, which is structured using re-weighted double CancelOut layers. Our key contribution is the introduction of the <em>projected reconstruction loss</em> term, guiding the training of the model by continuously assessing the reconstruction quality under the removal of an additional latent dimension.</div><div>We first assess the performance of IDEA on a series of theoretical benchmarks to validate its robustness. These experiments allow us to test its reconstruction ability and compare its performance with state-of-the-art intrinsic dimension estimators. The benchmarks show good accuracy and high versatility of our approach. Subsequently, we apply our model to data generated from the numerical solution of a vertically resolved one-dimensional free-surface flow, following a pointwise discretization of the vertical velocity profile in the horizontal direction, vertical direction, and time. IDEA succeeds in estimating the dataset’s intrinsic dimension and then reconstructs the original solution by working directly within the projection space identified by the network.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100850"},"PeriodicalIF":4.9,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1016/j.mlwa.2026.100851
Sayeh Gholipour Picha, Dawood Al Chanti, Alice Caplier
Large language and vision-language models have greatly advanced automated chest X-ray report generation (RRG),. yet current evaluation practices remain largely text-based and detached from image evidence. Traditional machine translation metrics fail to determine whether generated findings are clinically correct or visually grounded, limiting their suitability for medical applications.
This study introduces a comprehensive, image-aware evaluation framework that integrates the VICCA (Visual Interpretation and Comprehension of Chest X-ray Anomalies) protocol with the domain-specific semantic metric MCSE (Medical Corpus Similarity Evaluation). VICCA combines visual grounding and text-guided image generation to assess visual-textual consistency, while MCSE measures semantic and factual fidelity through clinically meaningful entities, negations, and modifiers. Together, they provide a unified, semi-reference-free assessment of pathology-level accuracy, semantic coherence, and visual consistency.
Five representative RRG models, R2Gen, M2Trans, CXR-RePaiR, RGRG, and MedGemma, are benchmarked on 2461 MIMIC-CXR studies using a standardized pipeline. Results reveal systematic trade-offs: models with high pathology agreement often generate semantically weak or visually inconsistent reports, whereas textually fluent models may lack proper image grounding. By integrating clinical semantics and visual reliability within a single multimodal framework, VICCA establishes a robust paradigm for evaluating the trustworthiness and interpretability of AI-generated radiology reports.
{"title":"Trust but verify: Image-aware evaluation of radiology report generators","authors":"Sayeh Gholipour Picha, Dawood Al Chanti, Alice Caplier","doi":"10.1016/j.mlwa.2026.100851","DOIUrl":"10.1016/j.mlwa.2026.100851","url":null,"abstract":"<div><div>Large language and vision-language models have greatly advanced automated chest X-ray report generation (RRG),. yet current evaluation practices remain largely text-based and detached from image evidence. Traditional machine translation metrics fail to determine whether generated findings are clinically correct or visually grounded, limiting their suitability for medical applications.</div><div>This study introduces a comprehensive, image-aware evaluation framework that integrates the VICCA (<em>Visual Interpretation and Comprehension of Chest X-ray Anomalies</em>) protocol with the domain-specific semantic metric MCSE (<em>Medical Corpus Similarity Evaluation</em>). VICCA combines visual grounding and text-guided image generation to assess visual-textual consistency, while MCSE measures semantic and factual fidelity through clinically meaningful entities, negations, and modifiers. Together, they provide a unified, semi-reference-free assessment of pathology-level accuracy, semantic coherence, and visual consistency.</div><div>Five representative RRG models, R2Gen, M2Trans, CXR-RePaiR, RGRG, and MedGemma, are benchmarked on 2461 MIMIC-CXR studies using a standardized pipeline. Results reveal systematic trade-offs: models with high pathology agreement often generate semantically weak or visually inconsistent reports, whereas textually fluent models may lack proper image grounding. By integrating clinical semantics and visual reliability within a single multimodal framework, VICCA establishes a robust paradigm for evaluating the trustworthiness and interpretability of AI-generated radiology reports.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100851"},"PeriodicalIF":4.9,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1016/j.mlwa.2026.100843
Mekhla Sarkar , Yen-Chu Huang , Tsong-Hai Lee , Jiann-Der Lee , Prasan Kumar Sahoo
Intracranial arterial stenosis (ICAS) is a leading cause of cerebrovascular accidents, and accurate morphological assessment of intracranial arteries is critical for diagnosis and treatment planning. Complex vascular structures, imaging noise, and variability in time-of-flight magnetic resonance angiography (TOF-MRA) images are challenging issues for the manual delineation that motivates the use of deep learning (DL) for automatic segmentation of the intracranial arteries. DL based automatic segmentation offers a promising solution by providing consistent and noise-reduced vessel delineation. However, selecting an optimal segmentation architecture remains challenging due to the diversity of network designs and encoder backbones. Therefore, this study presents a systematic benchmarking of five widely used DL segmentation architectures, UNet, LinkNet, Feature Pyramid Networks (FPN), Pyramid Scene Parsing Network (PSPNet), and DeepLabV3+, each combined with nine backbone networks, yielding 45 model variants, including previously unexplored configurations for intracranial artery segmentation in TOF-MRA. Models were trained and cross-validated on four datasets: in-house, CereVessMRA, IXI and ADAM, and evaluated on held-out independent test set. Performance metrics included Intersection over Union (IoU), Dice Similarity Coefficient (DSC), and a Stability Score, combining the coefficient of variation of IoU and DSC to quantify segmentation consistency and reproducibility. Experimental results demonstrated highest DSC score was achieved with UNet–SE-ResNeXt50, LinkNet-SE-ResNeXt50, FPN-DenseNet169, FPN-SENet154. The most stable configurations were LinkNet–EfficientNetB6, LinkNet–SENet154, UNet–DenseNet169, and UNet–EfficientNetB6. Conversely, DeepLabV3+ and PSPNet variants consistently underperformed. These findings provide actionable guidance for selecting backbone–segmentation pairs and highlight trade-offs between accuracy, robustness, and reproducibility for complex intracranial artery TOF-MRA segmentation tasks.
颅内动脉狭窄(ICAS)是脑血管意外的主要原因,准确的颅内动脉形态学评估对诊断和治疗计划至关重要。复杂的血管结构、成像噪声和飞行时间磁共振血管造影(TOF-MRA)图像的可变性是人工描绘的挑战问题,这促使使用深度学习(DL)来自动分割颅内动脉。基于深度学习的自动分割提供了一个很有前途的解决方案,它提供了一致的、降噪的血管描绘。然而,由于网络设计和编码器主干网的多样性,选择一个最佳的分割架构仍然是一个挑战。因此,本研究对五种广泛使用的深度学习分割架构UNet、LinkNet、特征金字塔网络(FPN)、金字塔场景解析网络(PSPNet)和DeepLabV3+进行了系统的基准测试,每一种都与9个骨干网络相结合,产生45种模型变体,包括以前未开发的TOF-MRA颅内动脉分割配置。模型在内部、CereVessMRA、IXI和ADAM四个数据集上进行训练和交叉验证,并在独立测试集上进行评估。性能指标包括交联(Intersection over Union, IoU)、骰子相似系数(Dice Similarity Coefficient, DSC)和稳定性评分,结合IoU和DSC的变异系数来量化分割的一致性和可重复性。实验结果表明,UNet-SE-ResNeXt50、LinkNet-SE-ResNeXt50、FPN-DenseNet169、FPN-SENet154的DSC得分最高。最稳定的配置是LinkNet-EfficientNetB6、LinkNet-SENet154、UNet-DenseNet169和UNet-EfficientNetB6。相反,DeepLabV3+和PSPNet变体一直表现不佳。这些发现为选择骨干分割对提供了可操作的指导,并突出了复杂颅内动脉TOF-MRA分割任务的准确性,稳健性和可重复性之间的权衡。
{"title":"Analysis of major segmentation models for intracranial artery time-of-flight magnetic resonance angiography images","authors":"Mekhla Sarkar , Yen-Chu Huang , Tsong-Hai Lee , Jiann-Der Lee , Prasan Kumar Sahoo","doi":"10.1016/j.mlwa.2026.100843","DOIUrl":"10.1016/j.mlwa.2026.100843","url":null,"abstract":"<div><div>Intracranial arterial stenosis (ICAS) is a leading cause of cerebrovascular accidents, and accurate morphological assessment of intracranial arteries is critical for diagnosis and treatment planning. Complex vascular structures, imaging noise, and variability in time-of-flight magnetic resonance angiography (TOF-MRA) images are challenging issues for the manual delineation that motivates the use of deep learning (DL) for automatic segmentation of the intracranial arteries. DL based automatic segmentation offers a promising solution by providing consistent and noise-reduced vessel delineation. However, selecting an optimal segmentation architecture remains challenging due to the diversity of network designs and encoder backbones. Therefore, this study presents a systematic benchmarking of five widely used DL segmentation architectures, UNet, LinkNet, Feature Pyramid Networks (FPN), Pyramid Scene Parsing Network (PSPNet), and DeepLabV3+, each combined with nine backbone networks, yielding 45 model variants, including previously unexplored configurations for intracranial artery segmentation in TOF-MRA. Models were trained and cross-validated on four datasets: in-house, CereVessMRA, IXI and ADAM, and evaluated on held-out independent test set. Performance metrics included Intersection over Union (IoU), Dice Similarity Coefficient (DSC), and a Stability Score, combining the coefficient of variation of IoU and DSC to quantify segmentation consistency and reproducibility. Experimental results demonstrated highest DSC score was achieved with UNet–SE-ResNeXt50, LinkNet-SE-ResNeXt50, FPN-DenseNet169, FPN-SENet154. The most stable configurations were LinkNet–EfficientNetB6, LinkNet–SENet154, UNet–DenseNet169, and UNet–EfficientNetB6. Conversely, DeepLabV3+ and PSPNet variants consistently underperformed. These findings provide actionable guidance for selecting backbone–segmentation pairs and highlight trade-offs between accuracy, robustness, and reproducibility for complex intracranial artery TOF-MRA segmentation tasks.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100843"},"PeriodicalIF":4.9,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-18DOI: 10.1016/j.mlwa.2026.100845
Obaida AlHousrya, Aseel Bennagi, Petru A. Cotfas, Daniel T. Cotfas
Driver fatigue remains a critical factor in road accidents, particularly in long duration or cognitively demanding driving scenarios. This study presents a comprehensive, low cost, and real time system for monitoring driver health and electric vehicle status through physiological signal analysis. By integrating heart rate, eye movement, and breathing rate sensors, both simulated and real, this hybrid framework detects signs of fatigue using machine learning classifiers trained on publicly available datasets including OpenDriver, DriveDB, MAUS, YawDD, TinyML, and the Driver Respiration Dataset. The system architecture combines Arduino based hardware, cloud integration via Microsoft Azure, and advanced classification and anomaly detection algorithms such as Random Forest and Isolation Forest. Evaluation across diverse datasets revealed robust fatigue detection capabilities, with OpenDriver achieving 97.6% cross validation accuracy and 95.8% F1-score, while image and respiration-based models complemented the electrocardiogram-based analysis. These results demonstrate the feasibility of affordable, multimodal health monitoring in EVs, offering a scalable and deployable solution for enhancing road safety.
{"title":"A hybrid machine learning and IoT system for driver fatigue monitoring in connected electric vehicles","authors":"Obaida AlHousrya, Aseel Bennagi, Petru A. Cotfas, Daniel T. Cotfas","doi":"10.1016/j.mlwa.2026.100845","DOIUrl":"10.1016/j.mlwa.2026.100845","url":null,"abstract":"<div><div>Driver fatigue remains a critical factor in road accidents, particularly in long duration or cognitively demanding driving scenarios. This study presents a comprehensive, low cost, and real time system for monitoring driver health and electric vehicle status through physiological signal analysis. By integrating heart rate, eye movement, and breathing rate sensors, both simulated and real, this hybrid framework detects signs of fatigue using machine learning classifiers trained on publicly available datasets including OpenDriver, DriveDB, MAUS, YawDD, TinyML, and the Driver Respiration Dataset. The system architecture combines Arduino based hardware, cloud integration via Microsoft Azure, and advanced classification and anomaly detection algorithms such as Random Forest and Isolation Forest. Evaluation across diverse datasets revealed robust fatigue detection capabilities, with OpenDriver achieving 97.6% cross validation accuracy and 95.8% F1-score, while image and respiration-based models complemented the electrocardiogram-based analysis. These results demonstrate the feasibility of affordable, multimodal health monitoring in EVs, offering a scalable and deployable solution for enhancing road safety.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100845"},"PeriodicalIF":4.9,"publicationDate":"2026-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.mlwa.2026.100846
Cheng-Hong Yang , Tin-Ho Cheung , Yi-Ling Chen , Sin-Hua Moi , Li-Yeh Chuang
Clear Cell Renal Cell Carcinoma (ccRCC) is the most aggressive and metastatic subtype of renal cell carcinoma and also the type with the highest mortality rate. To enhance survival prediction accuracy and facilitate informed clinical decision-making, this study presents a hybrid model that combines the Flying Geese Optimization Algorithm (FGOA) with an attention-based Long Short-Term Memory (A-LSTM) network. The proposed framework is trained and evaluated using data from the Cancer Genome Atlas Kidney Clear Cell Carcinoma (TCGA-KIRC) database. The feature selection process employed seven representative optimization algorithms covering evolutionary, swarm intelligence, and bio-inspired paradigms. The selected features were then analyzed using the attention-based A-LSTM network to predict survival outcomes in patients with ccRCC. Evaluation metrics for model performance included accuracy, precision, recall, and F1 score. The results showed that the FGOA-A-LSTM model performed best, with an accuracy of 80.8%, precision of 81.5%, recall of 86.9%, and F1 score of 84.1%, outperforming the other models. This result also indicates that on imbalanced datasets, the F1 score may be higher than the accuracy. Furthermore, Cox proportional hazards regression analysis showed that survival outcomes were significantly correlated with factors such as gender, tumor stage, previous treatment, and treatment method. This study introduces an innovative FGOA-A-LSTM framework that improves survival prediction in ccRCC. By integrating optimization-driven feature selection with an attention-enhanced deep learning architecture, the work makes a contribution to improving clinical risk assessment.
{"title":"A novel hybrid model of flying geese optimization and attention-LSTM for predicting survival outcomes in clear cell renal cell carcinoma","authors":"Cheng-Hong Yang , Tin-Ho Cheung , Yi-Ling Chen , Sin-Hua Moi , Li-Yeh Chuang","doi":"10.1016/j.mlwa.2026.100846","DOIUrl":"10.1016/j.mlwa.2026.100846","url":null,"abstract":"<div><div>Clear Cell Renal Cell Carcinoma (ccRCC) is the most aggressive and metastatic subtype of renal cell carcinoma and also the type with the highest mortality rate. To enhance survival prediction accuracy and facilitate informed clinical decision-making, this study presents a hybrid model that combines the Flying Geese Optimization Algorithm (FGOA) with an attention-based Long Short-Term Memory (A-LSTM) network. The proposed framework is trained and evaluated using data from the Cancer Genome Atlas Kidney Clear Cell Carcinoma (TCGA-KIRC) database. The feature selection process employed seven representative optimization algorithms covering evolutionary, swarm intelligence, and bio-inspired paradigms. The selected features were then analyzed using the attention-based A-LSTM network to predict survival outcomes in patients with ccRCC. Evaluation metrics for model performance included accuracy, precision, recall, and F1 score. The results showed that the FGOA-A-LSTM model performed best, with an accuracy of 80.8%, precision of 81.5%, recall of 86.9%, and F1 score of 84.1%, outperforming the other models. This result also indicates that on imbalanced datasets, the F1 score may be higher than the accuracy. Furthermore, Cox proportional hazards regression analysis showed that survival outcomes were significantly correlated with factors such as gender, tumor stage, previous treatment, and treatment method. This study introduces an innovative FGOA-A-LSTM framework that improves survival prediction in ccRCC. By integrating optimization-driven feature selection with an attention-enhanced deep learning architecture, the work makes a contribution to improving clinical risk assessment.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100846"},"PeriodicalIF":4.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.mlwa.2026.100847
Zihan Zhang, Aman Anand, Farhana Zulkernine
Human Activity Recognition (HAR) has become a prominent research topic in artificial intelligence, with applications in surveillance, healthcare, and human–computer interaction. Among various data modalities used for HAR, skeleton and point cloud data offer strong potential due to their privacy-preserving and environment-agnostic properties. However, point cloud-based HAR faces challenges like data sparsity, high computation cost, and a lack of large annotated datasets. In this paper, we propose a novel two-stage framework that first transforms radar-based point cloud data into skeleton data using a Skeletal Dynamic Graph Convolutional Neural Network (SK-DGCNN), and then classifies the estimated skeletons using an efficient Spatial Temporal Graph Convolutional Network++ (ST-GCN++). The SK-DGCNN leverages dynamic edge convolution, attention mechanisms, and a custom loss function that combines Mean Square Error and Kullback–Leibler divergence to preserve the structural integrity of the human pose. Our pipeline achieves state-of-the-art performance on the MMActivity and DGUHA datasets, with Top-1 accuracy of 99.73% and 99.25%, and F1-scores of 99.62% and 99.25%, respectively. The proposed method provides an effective, lightweight, and privacy-conscious solution for real-world HAR applications using radar point cloud data.
{"title":"SK-DGCNN: Human activity recognition from point cloud data with skeleton transformation","authors":"Zihan Zhang, Aman Anand, Farhana Zulkernine","doi":"10.1016/j.mlwa.2026.100847","DOIUrl":"10.1016/j.mlwa.2026.100847","url":null,"abstract":"<div><div>Human Activity Recognition (HAR) has become a prominent research topic in artificial intelligence, with applications in surveillance, healthcare, and human–computer interaction. Among various data modalities used for HAR, skeleton and point cloud data offer strong potential due to their privacy-preserving and environment-agnostic properties. However, point cloud-based HAR faces challenges like data sparsity, high computation cost, and a lack of large annotated datasets. In this paper, we propose a novel two-stage framework that first transforms radar-based point cloud data into skeleton data using a Skeletal Dynamic Graph Convolutional Neural Network (SK-DGCNN), and then classifies the estimated skeletons using an efficient Spatial Temporal Graph Convolutional Network++ (ST-GCN++). The SK-DGCNN leverages dynamic edge convolution, attention mechanisms, and a custom loss function that combines Mean Square Error and Kullback–Leibler divergence to preserve the structural integrity of the human pose. Our pipeline achieves state-of-the-art performance on the MMActivity and DGUHA datasets, with Top-1 accuracy of 99.73% and 99.25%, and F1-scores of 99.62% and 99.25%, respectively. The proposed method provides an effective, lightweight, and privacy-conscious solution for real-world HAR applications using radar point cloud data.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100847"},"PeriodicalIF":4.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1016/j.mlwa.2026.100841
Hajer Ghodhbani , Suvendi Rimer , Khmaies Ouahada , Adel M. Alimi
This paper investigates the convergence of generative modeling techniques across diverse image analysis tasks by examining their application in two data-intensive scientific domains: biomedical imaging and astronomy. In these two domains, which tend to be scientifically distinct due to their size and aims, they share common challenges, including noise corruption, limited availability of annotated data, and the demand for high-fidelity image reconstruction. This study provides a critical review of the various variants of generative models, with a particular focus on cross-domain applications. Unlike existing surveys that predominantly focus on a single discipline, this study emphasises the transferability and adaptability of generative models across biomedical and astronomical imaging. The proposed review highlights the potential offered by generative models, particularly Generative Adversarial Networks (GANS), in enhancing data generation, image restoration, and analysis in both biomedical and astronomical studies.
{"title":"Cross-domain convergence of generative models: From biomedical to astronomical applications","authors":"Hajer Ghodhbani , Suvendi Rimer , Khmaies Ouahada , Adel M. Alimi","doi":"10.1016/j.mlwa.2026.100841","DOIUrl":"10.1016/j.mlwa.2026.100841","url":null,"abstract":"<div><div>This paper investigates the convergence of generative modeling techniques across diverse image analysis tasks by examining their application in two data-intensive scientific domains: biomedical imaging and astronomy. In these two domains, which tend to be scientifically distinct due to their size and aims, they share common challenges, including noise corruption, limited availability of annotated data, and the demand for high-fidelity image reconstruction. This study provides a critical review of the various variants of generative models, with a particular focus on cross-domain applications. Unlike existing surveys that predominantly focus on a single discipline, this study emphasises the transferability and adaptability of generative models across biomedical and astronomical imaging. The proposed review highlights the potential offered by generative models, particularly Generative Adversarial Networks (GANS), in enhancing data generation, image restoration, and analysis in both biomedical and astronomical studies.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100841"},"PeriodicalIF":4.9,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-14DOI: 10.1016/j.mlwa.2026.100844
João Montrezol , Hugo S. Oliveira , Hélder P. Oliveira
With the rise of Transformers, Vision Transformers (ViTs) have become a new standard in visual recognition. This has led to the development of numerous architectures with diverse designs and applications. This survey identifies 22 key ViT and hybrid CNN–ViT models, along with 5 top Convolutional Neural Network (CNN) models. These were selected based on their new architecture, relevance to benchmarks, and overall impact. The models are organised using a defined taxonomy formed by CNN-based, pure Transformer-based, and hybrid architectures. We analyse their main components, training methods, and computational features, while assessing performance using reported results on standard benchmarks such as ImageNet and CIFAR, along with our training and fine-tuning evaluations on specific imaging datasets. In addition to accuracy, we look at real-world deployment issues by analysing the trade-offs between accuracy and efficiency in embedded, mobile, and clinical settings. The results indicate that modern CNNs are still very competitive in limited-resource environments, while advanced ViT variants perform well after large-scale pretraining, especially in areas with high variability. Hybrid CNN–ViT architectures, on the other hand, tend to offer the best balance between accuracy, data efficiency, and computational cost. This survey establishes a consolidated benchmark and reference framework for understanding the evolution, capabilities, and practical applicability of contemporary vision architectures.
{"title":"Decoding vision transformer variations for image classification: A guide to performance and usability","authors":"João Montrezol , Hugo S. Oliveira , Hélder P. Oliveira","doi":"10.1016/j.mlwa.2026.100844","DOIUrl":"10.1016/j.mlwa.2026.100844","url":null,"abstract":"<div><div>With the rise of Transformers, Vision Transformers (ViTs) have become a new standard in visual recognition. This has led to the development of numerous architectures with diverse designs and applications. This survey identifies 22 key ViT and hybrid CNN–ViT models, along with 5 top Convolutional Neural Network (CNN) models. These were selected based on their new architecture, relevance to benchmarks, and overall impact. The models are organised using a defined taxonomy formed by CNN-based, pure Transformer-based, and hybrid architectures. We analyse their main components, training methods, and computational features, while assessing performance using reported results on standard benchmarks such as ImageNet and CIFAR, along with our training and fine-tuning evaluations on specific imaging datasets. In addition to accuracy, we look at real-world deployment issues by analysing the trade-offs between accuracy and efficiency in embedded, mobile, and clinical settings. The results indicate that modern CNNs are still very competitive in limited-resource environments, while advanced ViT variants perform well after large-scale pretraining, especially in areas with high variability. Hybrid CNN–ViT architectures, on the other hand, tend to offer the best balance between accuracy, data efficiency, and computational cost. This survey establishes a consolidated benchmark and reference framework for understanding the evolution, capabilities, and practical applicability of contemporary vision architectures.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100844"},"PeriodicalIF":4.9,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-12DOI: 10.1016/j.mlwa.2026.100840
Josué Bustarviejo, Carlos Bousoño-Calzón
Forecasting financial markets requires synthesizing heterogeneous information sources such as historical prices, company indicators, and unstructured news, whose interactions are nonlinear and regime dependent. We investigate a cross-attention transformer framework that fuses these modalities for probabilistic financial forecasting with calibrated uncertainty. We propose a framework that anchors fusion on Chronos-T5, a transformer pretrained on large-scale time series and used here as a frozen encoder for market dynamics. Parameter-efficient projection layers map company-level indicators and daily news embeddings into a shared representation space, while bidirectional cross-attention learns how to align and weight the different sources. We evaluate the approach on daily EUR/USD forecasting, with additional experiments across currency pairs and market regimes. The multimodal model consistently outperforms autoregressive and deep learning baselines in point prediction, as measured by mean squared error and Diebold–Mariano tests, and delivers sharper probabilistic forecasts according to the continuous ranked probability score (CRPS), weighted interval score (WIS), and empirical coverage. Raw predictive distributions tend to be overconfident, but a post-hoc split conformal recalibration restores nominal coverage and improves interval quality without retraining the backbone. From a soft computing perspective, the system combines approximate Bayesian inference via Monte Carlo dropout with distribution-free calibration, within a structured cross-modal fusion architecture that improves the reliability and interpretability of multimodal financial forecasts.
{"title":"Multimodal information fusion for financial forecasting via cross-attention and calibrated uncertainty","authors":"Josué Bustarviejo, Carlos Bousoño-Calzón","doi":"10.1016/j.mlwa.2026.100840","DOIUrl":"10.1016/j.mlwa.2026.100840","url":null,"abstract":"<div><div>Forecasting financial markets requires synthesizing heterogeneous information sources such as historical prices, company indicators, and unstructured news, whose interactions are nonlinear and regime dependent. We investigate a cross-attention transformer framework that fuses these modalities for probabilistic financial forecasting with calibrated uncertainty. We propose a framework that anchors fusion on Chronos-T5, a transformer pretrained on large-scale time series and used here as a frozen encoder for market dynamics. Parameter-efficient projection layers map company-level indicators and daily news embeddings into a shared representation space, while bidirectional cross-attention learns how to align and weight the different sources. We evaluate the approach on daily EUR/USD forecasting, with additional experiments across currency pairs and market regimes. The multimodal model consistently outperforms autoregressive and deep learning baselines in point prediction, as measured by mean squared error and Diebold–Mariano tests, and delivers sharper probabilistic forecasts according to the continuous ranked probability score (CRPS), weighted interval score (WIS), and empirical coverage. Raw predictive distributions tend to be overconfident, but a post-hoc split conformal recalibration restores nominal coverage and improves interval quality without retraining the backbone. From a soft computing perspective, the system combines approximate Bayesian inference via Monte Carlo dropout with distribution-free calibration, within a structured cross-modal fusion architecture that improves the reliability and interpretability of multimodal financial forecasts.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100840"},"PeriodicalIF":4.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efficient identification of informative inputs is critical when training Machine Learning (ML) surrogates on large, multi-sensor datasets. In this paper, we benchmark several input selection methods from the literature alongside new methods proposed here. A baseline method based on expert-driven (human) selection is used as a reference. All methods are evaluated on a challenging inverse problem, in which Computational Fluid Dynamic (CFD) simulations are used to train a Deep Neural Network (DNN) to infer unknown momentum source terms from discrete velocity measurements. The proposed methodology does not explicitly depend on the geometry of the domain and is therefore transferable to other problems involving sparse sensor measurements, although domain-specific validation may still be required. The results show that four input selection methods reduce the number of inputs to as few as five, with minimal impact on the mean average predictive error. This corresponds to a forty-fold reduction relative to the original number of inputs. Analysis of the top four inputs shows that each method selects different locations, indicating that multiple combinations can yield similar accurate results. The top four methods significantly outperform the baseline method based on human selection. This study demonstrates that input selection methods reduce computational costs during both training and inference stages. They also lower experimental demands by identifying high-value sensor locations, thereby reducing the number of required sampling points. These findings suggest that input selection methods should be considered standard practice in ML applications with complex scenarios constrained by limited experimental data.
{"title":"Comparison of input selection methods for neural networks applied to complex fluid dynamic inverse problem","authors":"Jaume Luis-Gómez , Guillem Monrós-Andreu , Sergio Iserte , Sergio Chiva , Raúl Martínez-Cuenca","doi":"10.1016/j.mlwa.2026.100842","DOIUrl":"10.1016/j.mlwa.2026.100842","url":null,"abstract":"<div><div>Efficient identification of informative inputs is critical when training Machine Learning (ML) surrogates on large, multi-sensor datasets. In this paper, we benchmark several input selection methods from the literature alongside new methods proposed here. A baseline method based on expert-driven (human) selection is used as a reference. All methods are evaluated on a challenging inverse problem, in which Computational Fluid Dynamic (CFD) simulations are used to train a Deep Neural Network (DNN) to infer unknown momentum source terms from discrete velocity measurements. The proposed methodology does not explicitly depend on the geometry of the domain and is therefore transferable to other problems involving sparse sensor measurements, although domain-specific validation may still be required. The results show that four input selection methods reduce the number of inputs to as few as five, with minimal impact on the mean average predictive error. This corresponds to a forty-fold reduction relative to the original number of inputs. Analysis of the top four inputs shows that each method selects different locations, indicating that multiple combinations can yield similar accurate results. The top four methods significantly outperform the baseline method based on human selection. This study demonstrates that input selection methods reduce computational costs during both training and inference stages. They also lower experimental demands by identifying high-value sensor locations, thereby reducing the number of required sampling points. These findings suggest that input selection methods should be considered standard practice in ML applications with complex scenarios constrained by limited experimental data.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100842"},"PeriodicalIF":4.9,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}