Pub Date : 2026-02-05DOI: 10.1016/j.mlwa.2026.100852
Khafiizh Hastuti, Erwin Yudi Hidayat, Abu Salam, Usman Sudibyo
Fine-grained recognition of cultural artifacts remains challenging because of the scarcity of annotated data, subtle intra-class differences, and heterogeneous imaging conditions. This study addresses these issues through a domain-specific deep learning pipeline, demonstrated on Indonesian keris classification across three tasks: pamor (27 classes), dhapur (42), and tangguh (5). The pipeline integrates background homogenization, orientation normalization, and YOLOv8-based blade cropping with mask-aware augmentation restricted to the blade regions. For classification, we propose KerisRDNet, which extends InceptionResNetV2 with Inception-Residual-Dilated (IRD) blocks and squeeze-and-excitation to model the elongated geometries and subtle forging motifs. Experiments show that baseline networks collapse under fine-grained settings, with macro-F1 near zero, whereas the proposed approach achieves 0.268 (pamor), 0.276 (dhapur), and 0.635 (tangguh) with Top-3 accuracy above 0.5 and AUC up to 0.853. Across three stratified resamplings, paired non-parametric tests (Wilcoxon signed-rank) indicated directionally consistent improvements; given the small number of repetitions (), these results are interpreted conservatively. These results demonstrate the feasibility of practically viable keris recognition as a decision-support tool for cultural heritage curation, while also offering a transferable workflow for low-data fine-grained recognition tasks.
{"title":"KerisRDNet: Mask-aware augmentation and residual dilated networks for cultural heritage blade classification","authors":"Khafiizh Hastuti, Erwin Yudi Hidayat, Abu Salam, Usman Sudibyo","doi":"10.1016/j.mlwa.2026.100852","DOIUrl":"10.1016/j.mlwa.2026.100852","url":null,"abstract":"<div><div>Fine-grained recognition of cultural artifacts remains challenging because of the scarcity of annotated data, subtle intra-class differences, and heterogeneous imaging conditions. This study addresses these issues through a domain-specific deep learning pipeline, demonstrated on Indonesian keris classification across three tasks: <em>pamor</em> (27 classes), <em>dhapur</em> (42), and <em>tangguh</em> (5). The pipeline integrates background homogenization, orientation normalization, and YOLOv8-based blade cropping with mask-aware augmentation restricted to the blade regions. For classification, we propose KerisRDNet, which extends InceptionResNetV2 with Inception-Residual-Dilated (IRD) blocks and squeeze-and-excitation to model the elongated geometries and subtle forging motifs. Experiments show that baseline networks collapse under fine-grained settings, with macro-F1 near zero, whereas the proposed approach achieves 0.268 (<em>pamor</em>), 0.276 (<em>dhapur</em>), and 0.635 (<em>tangguh</em>) with Top-3 accuracy above 0.5 and AUC up to 0.853. Across three stratified resamplings, paired non-parametric tests (Wilcoxon signed-rank) indicated directionally consistent improvements; given the small number of repetitions (<span><math><mrow><mi>n</mi><mo>=</mo><mn>3</mn></mrow></math></span>), these results are interpreted conservatively. These results demonstrate the feasibility of practically viable keris recognition as a decision-support tool for cultural heritage curation, while also offering a transferable workflow for low-data fine-grained recognition tasks.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"24 ","pages":"Article 100852"},"PeriodicalIF":4.9,"publicationDate":"2026-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146161644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rise of online rental platforms has led to an overwhelming amount of user-generated content, making it difficult for prospective consumers to discern which reviews are helpful. Existing approaches often rely on raw helpfulness votes, which are sparse, subjective, and temporally inconsistent. Also, there is lack of labeled dataset in the field of rental review usefulness prediction. This paper introduces a novel dataset of apartment reviews collected from online website and proposes an intelligent machine learning framework to predict the helpfulness of rental reviews. To address the challenge of obtaining reliable labels from sparse and subjective user votes, a scoring-based labeling strategy is developed that uses helpful vote count and timeliness. A diverse set of features including TF–IDF vectors, sentiment polarity, rating deviation, and review length are used to capture both textual and behavioral aspects of the reviews. Multiple classifiers, including Logistic Regression, Naive Bayes, and XGBoost, are systematically evaluated under 5-fold cross-validation, along with a rule-based and deep learning models.
Experimental results show that XGBoost consistently achieves the best overall performance with an accuracy of 0.71 and ROC-AUC of 0.75 when leveraging all features. This research makes three key contributions: (i) the first large-scale dataset for rental review, (ii) auto annotation technique that uses clustering approach with score from user votes and time since posted, and (iii) comprehensive evaluation pipeline spanning rule-based, traditional, and deep learning classifiers. Together, these advances establish a foundation for intelligent rental review helpfulness estimation, with broader implications for e-commerce, hospitality, and user-generated content analysis.
{"title":"Towards an intelligent review helpfulness estimation: A novel dataset and machine learning framework","authors":"Rakibul Hassan, Shubhashish Kar, Jorge Fonseca Cacho, Shaikh Arifuzzaman","doi":"10.1016/j.mlwa.2026.100849","DOIUrl":"10.1016/j.mlwa.2026.100849","url":null,"abstract":"<div><div>The rise of online rental platforms has led to an overwhelming amount of user-generated content, making it difficult for prospective consumers to discern which reviews are helpful. Existing approaches often rely on raw helpfulness votes, which are sparse, subjective, and temporally inconsistent. Also, there is lack of labeled dataset in the field of rental review usefulness prediction. This paper introduces a novel dataset of apartment reviews collected from online website and proposes an intelligent machine learning framework to predict the helpfulness of rental reviews. To address the challenge of obtaining reliable labels from sparse and subjective user votes, a scoring-based labeling strategy is developed that uses helpful vote count and timeliness. A diverse set of features including TF–IDF vectors, sentiment polarity, rating deviation, and review length are used to capture both textual and behavioral aspects of the reviews. Multiple classifiers, including Logistic Regression, Naive Bayes, and XGBoost, are systematically evaluated under 5-fold cross-validation, along with a rule-based and deep learning models.</div><div>Experimental results show that XGBoost consistently achieves the best overall performance with an accuracy of 0.71 and ROC-AUC of 0.75 when leveraging all features. This research makes three key contributions: (i) the first large-scale dataset for rental review, (ii) auto annotation technique that uses clustering approach with score from user votes and time since posted, and (iii) comprehensive evaluation pipeline spanning rule-based, traditional, and deep learning classifiers. Together, these advances establish a foundation for intelligent rental review helpfulness estimation, with broader implications for e-commerce, hospitality, and user-generated content analysis.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100849"},"PeriodicalIF":4.9,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Conducting prior patent searches before developing technologies and filing patent applications in companies or universities is essential for understanding technological trends among competitors and academic institutions, as well as for increasing the likelihood of obtaining patent rights. In these searches, it is important not only to include relevant keywords in the search queries but also to incorporate related terms retrieved from a thesaurus. To support this, methods using word embeddings for automatically extracting such synonyms have recently been proposed. However, patent documents often contain unique expressions and compound terms, such as specialized technical terminology and abstract conceptual terms, which are difficult to accurately capture using existing large language models trained at the token level.
In this study, we investigate a method for extracting synonyms from patent documents by embedding the definition sentences that explain technical terms. The experimental results demonstrate that the proposed method achieves more precise synonym extraction than conventional word embedding approaches, and it can contribute to the expansion of existing thesauri.
Thus, this research is expected to improve the recall of prior art searches and support the automatic extraction of technical elements for identifying technological trends.
{"title":"Synonym extraction from Japanese patent documents using term definition sentences","authors":"Koji Marusaki , Seiya Kawano , Asahi Hentona , Hirofumi Nonaka","doi":"10.1016/j.mlwa.2026.100848","DOIUrl":"10.1016/j.mlwa.2026.100848","url":null,"abstract":"<div><div>Conducting prior patent searches before developing technologies and filing patent applications in companies or universities is essential for understanding technological trends among competitors and academic institutions, as well as for increasing the likelihood of obtaining patent rights. In these searches, it is important not only to include relevant keywords in the search queries but also to incorporate related terms retrieved from a thesaurus. To support this, methods using word embeddings for automatically extracting such synonyms have recently been proposed. However, patent documents often contain unique expressions and compound terms, such as specialized technical terminology and abstract conceptual terms, which are difficult to accurately capture using existing large language models trained at the token level.</div><div>In this study, we investigate a method for extracting synonyms from patent documents by embedding the definition sentences that explain technical terms. The experimental results demonstrate that the proposed method achieves more precise synonym extraction than conventional word embedding approaches, and it can contribute to the expansion of existing thesauri.</div><div>Thus, this research is expected to improve the recall of prior art searches and support the automatic extraction of technical elements for identifying technological trends.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100848"},"PeriodicalIF":4.9,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1016/j.mlwa.2026.100850
Antoine Oriou , Philipp Krah , Julian Koellermeier
This paper introduces the Intrinsic Dimension Estimating Autoencoder (IDEA), which identifies the underlying intrinsic dimension of a wide range of datasets whose samples lie on either linear or nonlinear manifolds. Beyond estimating the intrinsic dimension, IDEA is also able to reconstruct the original dataset after projecting it onto the corresponding latent space, which is structured using re-weighted double CancelOut layers. Our key contribution is the introduction of the projected reconstruction loss term, guiding the training of the model by continuously assessing the reconstruction quality under the removal of an additional latent dimension.
We first assess the performance of IDEA on a series of theoretical benchmarks to validate its robustness. These experiments allow us to test its reconstruction ability and compare its performance with state-of-the-art intrinsic dimension estimators. The benchmarks show good accuracy and high versatility of our approach. Subsequently, we apply our model to data generated from the numerical solution of a vertically resolved one-dimensional free-surface flow, following a pointwise discretization of the vertical velocity profile in the horizontal direction, vertical direction, and time. IDEA succeeds in estimating the dataset’s intrinsic dimension and then reconstructs the original solution by working directly within the projection space identified by the network.
{"title":"Intrinsic Dimension Estimating Autoencoder (IDEA) using CancelOut layer and a projected loss","authors":"Antoine Oriou , Philipp Krah , Julian Koellermeier","doi":"10.1016/j.mlwa.2026.100850","DOIUrl":"10.1016/j.mlwa.2026.100850","url":null,"abstract":"<div><div>This paper introduces the Intrinsic Dimension Estimating Autoencoder (IDEA), which identifies the underlying intrinsic dimension of a wide range of datasets whose samples lie on either linear or nonlinear manifolds. Beyond estimating the intrinsic dimension, IDEA is also able to reconstruct the original dataset after projecting it onto the corresponding latent space, which is structured using re-weighted double CancelOut layers. Our key contribution is the introduction of the <em>projected reconstruction loss</em> term, guiding the training of the model by continuously assessing the reconstruction quality under the removal of an additional latent dimension.</div><div>We first assess the performance of IDEA on a series of theoretical benchmarks to validate its robustness. These experiments allow us to test its reconstruction ability and compare its performance with state-of-the-art intrinsic dimension estimators. The benchmarks show good accuracy and high versatility of our approach. Subsequently, we apply our model to data generated from the numerical solution of a vertically resolved one-dimensional free-surface flow, following a pointwise discretization of the vertical velocity profile in the horizontal direction, vertical direction, and time. IDEA succeeds in estimating the dataset’s intrinsic dimension and then reconstructs the original solution by working directly within the projection space identified by the network.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100850"},"PeriodicalIF":4.9,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-20DOI: 10.1016/j.mlwa.2026.100851
Sayeh Gholipour Picha, Dawood Al Chanti, Alice Caplier
Large language and vision-language models have greatly advanced automated chest X-ray report generation (RRG),. yet current evaluation practices remain largely text-based and detached from image evidence. Traditional machine translation metrics fail to determine whether generated findings are clinically correct or visually grounded, limiting their suitability for medical applications.
This study introduces a comprehensive, image-aware evaluation framework that integrates the VICCA (Visual Interpretation and Comprehension of Chest X-ray Anomalies) protocol with the domain-specific semantic metric MCSE (Medical Corpus Similarity Evaluation). VICCA combines visual grounding and text-guided image generation to assess visual-textual consistency, while MCSE measures semantic and factual fidelity through clinically meaningful entities, negations, and modifiers. Together, they provide a unified, semi-reference-free assessment of pathology-level accuracy, semantic coherence, and visual consistency.
Five representative RRG models, R2Gen, M2Trans, CXR-RePaiR, RGRG, and MedGemma, are benchmarked on 2461 MIMIC-CXR studies using a standardized pipeline. Results reveal systematic trade-offs: models with high pathology agreement often generate semantically weak or visually inconsistent reports, whereas textually fluent models may lack proper image grounding. By integrating clinical semantics and visual reliability within a single multimodal framework, VICCA establishes a robust paradigm for evaluating the trustworthiness and interpretability of AI-generated radiology reports.
{"title":"Trust but verify: Image-aware evaluation of radiology report generators","authors":"Sayeh Gholipour Picha, Dawood Al Chanti, Alice Caplier","doi":"10.1016/j.mlwa.2026.100851","DOIUrl":"10.1016/j.mlwa.2026.100851","url":null,"abstract":"<div><div>Large language and vision-language models have greatly advanced automated chest X-ray report generation (RRG),. yet current evaluation practices remain largely text-based and detached from image evidence. Traditional machine translation metrics fail to determine whether generated findings are clinically correct or visually grounded, limiting their suitability for medical applications.</div><div>This study introduces a comprehensive, image-aware evaluation framework that integrates the VICCA (<em>Visual Interpretation and Comprehension of Chest X-ray Anomalies</em>) protocol with the domain-specific semantic metric MCSE (<em>Medical Corpus Similarity Evaluation</em>). VICCA combines visual grounding and text-guided image generation to assess visual-textual consistency, while MCSE measures semantic and factual fidelity through clinically meaningful entities, negations, and modifiers. Together, they provide a unified, semi-reference-free assessment of pathology-level accuracy, semantic coherence, and visual consistency.</div><div>Five representative RRG models, R2Gen, M2Trans, CXR-RePaiR, RGRG, and MedGemma, are benchmarked on 2461 MIMIC-CXR studies using a standardized pipeline. Results reveal systematic trade-offs: models with high pathology agreement often generate semantically weak or visually inconsistent reports, whereas textually fluent models may lack proper image grounding. By integrating clinical semantics and visual reliability within a single multimodal framework, VICCA establishes a robust paradigm for evaluating the trustworthiness and interpretability of AI-generated radiology reports.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100851"},"PeriodicalIF":4.9,"publicationDate":"2026-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-19DOI: 10.1016/j.mlwa.2026.100843
Mekhla Sarkar , Yen-Chu Huang , Tsong-Hai Lee , Jiann-Der Lee , Prasan Kumar Sahoo
Intracranial arterial stenosis (ICAS) is a leading cause of cerebrovascular accidents, and accurate morphological assessment of intracranial arteries is critical for diagnosis and treatment planning. Complex vascular structures, imaging noise, and variability in time-of-flight magnetic resonance angiography (TOF-MRA) images are challenging issues for the manual delineation that motivates the use of deep learning (DL) for automatic segmentation of the intracranial arteries. DL based automatic segmentation offers a promising solution by providing consistent and noise-reduced vessel delineation. However, selecting an optimal segmentation architecture remains challenging due to the diversity of network designs and encoder backbones. Therefore, this study presents a systematic benchmarking of five widely used DL segmentation architectures, UNet, LinkNet, Feature Pyramid Networks (FPN), Pyramid Scene Parsing Network (PSPNet), and DeepLabV3+, each combined with nine backbone networks, yielding 45 model variants, including previously unexplored configurations for intracranial artery segmentation in TOF-MRA. Models were trained and cross-validated on four datasets: in-house, CereVessMRA, IXI and ADAM, and evaluated on held-out independent test set. Performance metrics included Intersection over Union (IoU), Dice Similarity Coefficient (DSC), and a Stability Score, combining the coefficient of variation of IoU and DSC to quantify segmentation consistency and reproducibility. Experimental results demonstrated highest DSC score was achieved with UNet–SE-ResNeXt50, LinkNet-SE-ResNeXt50, FPN-DenseNet169, FPN-SENet154. The most stable configurations were LinkNet–EfficientNetB6, LinkNet–SENet154, UNet–DenseNet169, and UNet–EfficientNetB6. Conversely, DeepLabV3+ and PSPNet variants consistently underperformed. These findings provide actionable guidance for selecting backbone–segmentation pairs and highlight trade-offs between accuracy, robustness, and reproducibility for complex intracranial artery TOF-MRA segmentation tasks.
颅内动脉狭窄(ICAS)是脑血管意外的主要原因,准确的颅内动脉形态学评估对诊断和治疗计划至关重要。复杂的血管结构、成像噪声和飞行时间磁共振血管造影(TOF-MRA)图像的可变性是人工描绘的挑战问题,这促使使用深度学习(DL)来自动分割颅内动脉。基于深度学习的自动分割提供了一个很有前途的解决方案,它提供了一致的、降噪的血管描绘。然而,由于网络设计和编码器主干网的多样性,选择一个最佳的分割架构仍然是一个挑战。因此,本研究对五种广泛使用的深度学习分割架构UNet、LinkNet、特征金字塔网络(FPN)、金字塔场景解析网络(PSPNet)和DeepLabV3+进行了系统的基准测试,每一种都与9个骨干网络相结合,产生45种模型变体,包括以前未开发的TOF-MRA颅内动脉分割配置。模型在内部、CereVessMRA、IXI和ADAM四个数据集上进行训练和交叉验证,并在独立测试集上进行评估。性能指标包括交联(Intersection over Union, IoU)、骰子相似系数(Dice Similarity Coefficient, DSC)和稳定性评分,结合IoU和DSC的变异系数来量化分割的一致性和可重复性。实验结果表明,UNet-SE-ResNeXt50、LinkNet-SE-ResNeXt50、FPN-DenseNet169、FPN-SENet154的DSC得分最高。最稳定的配置是LinkNet-EfficientNetB6、LinkNet-SENet154、UNet-DenseNet169和UNet-EfficientNetB6。相反,DeepLabV3+和PSPNet变体一直表现不佳。这些发现为选择骨干分割对提供了可操作的指导,并突出了复杂颅内动脉TOF-MRA分割任务的准确性,稳健性和可重复性之间的权衡。
{"title":"Analysis of major segmentation models for intracranial artery time-of-flight magnetic resonance angiography images","authors":"Mekhla Sarkar , Yen-Chu Huang , Tsong-Hai Lee , Jiann-Der Lee , Prasan Kumar Sahoo","doi":"10.1016/j.mlwa.2026.100843","DOIUrl":"10.1016/j.mlwa.2026.100843","url":null,"abstract":"<div><div>Intracranial arterial stenosis (ICAS) is a leading cause of cerebrovascular accidents, and accurate morphological assessment of intracranial arteries is critical for diagnosis and treatment planning. Complex vascular structures, imaging noise, and variability in time-of-flight magnetic resonance angiography (TOF-MRA) images are challenging issues for the manual delineation that motivates the use of deep learning (DL) for automatic segmentation of the intracranial arteries. DL based automatic segmentation offers a promising solution by providing consistent and noise-reduced vessel delineation. However, selecting an optimal segmentation architecture remains challenging due to the diversity of network designs and encoder backbones. Therefore, this study presents a systematic benchmarking of five widely used DL segmentation architectures, UNet, LinkNet, Feature Pyramid Networks (FPN), Pyramid Scene Parsing Network (PSPNet), and DeepLabV3+, each combined with nine backbone networks, yielding 45 model variants, including previously unexplored configurations for intracranial artery segmentation in TOF-MRA. Models were trained and cross-validated on four datasets: in-house, CereVessMRA, IXI and ADAM, and evaluated on held-out independent test set. Performance metrics included Intersection over Union (IoU), Dice Similarity Coefficient (DSC), and a Stability Score, combining the coefficient of variation of IoU and DSC to quantify segmentation consistency and reproducibility. Experimental results demonstrated highest DSC score was achieved with UNet–SE-ResNeXt50, LinkNet-SE-ResNeXt50, FPN-DenseNet169, FPN-SENet154. The most stable configurations were LinkNet–EfficientNetB6, LinkNet–SENet154, UNet–DenseNet169, and UNet–EfficientNetB6. Conversely, DeepLabV3+ and PSPNet variants consistently underperformed. These findings provide actionable guidance for selecting backbone–segmentation pairs and highlight trade-offs between accuracy, robustness, and reproducibility for complex intracranial artery TOF-MRA segmentation tasks.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100843"},"PeriodicalIF":4.9,"publicationDate":"2026-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146077164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-18DOI: 10.1016/j.mlwa.2026.100845
Obaida AlHousrya, Aseel Bennagi, Petru A. Cotfas, Daniel T. Cotfas
Driver fatigue remains a critical factor in road accidents, particularly in long duration or cognitively demanding driving scenarios. This study presents a comprehensive, low cost, and real time system for monitoring driver health and electric vehicle status through physiological signal analysis. By integrating heart rate, eye movement, and breathing rate sensors, both simulated and real, this hybrid framework detects signs of fatigue using machine learning classifiers trained on publicly available datasets including OpenDriver, DriveDB, MAUS, YawDD, TinyML, and the Driver Respiration Dataset. The system architecture combines Arduino based hardware, cloud integration via Microsoft Azure, and advanced classification and anomaly detection algorithms such as Random Forest and Isolation Forest. Evaluation across diverse datasets revealed robust fatigue detection capabilities, with OpenDriver achieving 97.6% cross validation accuracy and 95.8% F1-score, while image and respiration-based models complemented the electrocardiogram-based analysis. These results demonstrate the feasibility of affordable, multimodal health monitoring in EVs, offering a scalable and deployable solution for enhancing road safety.
{"title":"A hybrid machine learning and IoT system for driver fatigue monitoring in connected electric vehicles","authors":"Obaida AlHousrya, Aseel Bennagi, Petru A. Cotfas, Daniel T. Cotfas","doi":"10.1016/j.mlwa.2026.100845","DOIUrl":"10.1016/j.mlwa.2026.100845","url":null,"abstract":"<div><div>Driver fatigue remains a critical factor in road accidents, particularly in long duration or cognitively demanding driving scenarios. This study presents a comprehensive, low cost, and real time system for monitoring driver health and electric vehicle status through physiological signal analysis. By integrating heart rate, eye movement, and breathing rate sensors, both simulated and real, this hybrid framework detects signs of fatigue using machine learning classifiers trained on publicly available datasets including OpenDriver, DriveDB, MAUS, YawDD, TinyML, and the Driver Respiration Dataset. The system architecture combines Arduino based hardware, cloud integration via Microsoft Azure, and advanced classification and anomaly detection algorithms such as Random Forest and Isolation Forest. Evaluation across diverse datasets revealed robust fatigue detection capabilities, with OpenDriver achieving 97.6% cross validation accuracy and 95.8% F1-score, while image and respiration-based models complemented the electrocardiogram-based analysis. These results demonstrate the feasibility of affordable, multimodal health monitoring in EVs, offering a scalable and deployable solution for enhancing road safety.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100845"},"PeriodicalIF":4.9,"publicationDate":"2026-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.mlwa.2026.100846
Cheng-Hong Yang , Tin-Ho Cheung , Yi-Ling Chen , Sin-Hua Moi , Li-Yeh Chuang
Clear Cell Renal Cell Carcinoma (ccRCC) is the most aggressive and metastatic subtype of renal cell carcinoma and also the type with the highest mortality rate. To enhance survival prediction accuracy and facilitate informed clinical decision-making, this study presents a hybrid model that combines the Flying Geese Optimization Algorithm (FGOA) with an attention-based Long Short-Term Memory (A-LSTM) network. The proposed framework is trained and evaluated using data from the Cancer Genome Atlas Kidney Clear Cell Carcinoma (TCGA-KIRC) database. The feature selection process employed seven representative optimization algorithms covering evolutionary, swarm intelligence, and bio-inspired paradigms. The selected features were then analyzed using the attention-based A-LSTM network to predict survival outcomes in patients with ccRCC. Evaluation metrics for model performance included accuracy, precision, recall, and F1 score. The results showed that the FGOA-A-LSTM model performed best, with an accuracy of 80.8%, precision of 81.5%, recall of 86.9%, and F1 score of 84.1%, outperforming the other models. This result also indicates that on imbalanced datasets, the F1 score may be higher than the accuracy. Furthermore, Cox proportional hazards regression analysis showed that survival outcomes were significantly correlated with factors such as gender, tumor stage, previous treatment, and treatment method. This study introduces an innovative FGOA-A-LSTM framework that improves survival prediction in ccRCC. By integrating optimization-driven feature selection with an attention-enhanced deep learning architecture, the work makes a contribution to improving clinical risk assessment.
{"title":"A novel hybrid model of flying geese optimization and attention-LSTM for predicting survival outcomes in clear cell renal cell carcinoma","authors":"Cheng-Hong Yang , Tin-Ho Cheung , Yi-Ling Chen , Sin-Hua Moi , Li-Yeh Chuang","doi":"10.1016/j.mlwa.2026.100846","DOIUrl":"10.1016/j.mlwa.2026.100846","url":null,"abstract":"<div><div>Clear Cell Renal Cell Carcinoma (ccRCC) is the most aggressive and metastatic subtype of renal cell carcinoma and also the type with the highest mortality rate. To enhance survival prediction accuracy and facilitate informed clinical decision-making, this study presents a hybrid model that combines the Flying Geese Optimization Algorithm (FGOA) with an attention-based Long Short-Term Memory (A-LSTM) network. The proposed framework is trained and evaluated using data from the Cancer Genome Atlas Kidney Clear Cell Carcinoma (TCGA-KIRC) database. The feature selection process employed seven representative optimization algorithms covering evolutionary, swarm intelligence, and bio-inspired paradigms. The selected features were then analyzed using the attention-based A-LSTM network to predict survival outcomes in patients with ccRCC. Evaluation metrics for model performance included accuracy, precision, recall, and F1 score. The results showed that the FGOA-A-LSTM model performed best, with an accuracy of 80.8%, precision of 81.5%, recall of 86.9%, and F1 score of 84.1%, outperforming the other models. This result also indicates that on imbalanced datasets, the F1 score may be higher than the accuracy. Furthermore, Cox proportional hazards regression analysis showed that survival outcomes were significantly correlated with factors such as gender, tumor stage, previous treatment, and treatment method. This study introduces an innovative FGOA-A-LSTM framework that improves survival prediction in ccRCC. By integrating optimization-driven feature selection with an attention-enhanced deep learning architecture, the work makes a contribution to improving clinical risk assessment.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100846"},"PeriodicalIF":4.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-16DOI: 10.1016/j.mlwa.2026.100847
Zihan Zhang, Aman Anand, Farhana Zulkernine
Human Activity Recognition (HAR) has become a prominent research topic in artificial intelligence, with applications in surveillance, healthcare, and human–computer interaction. Among various data modalities used for HAR, skeleton and point cloud data offer strong potential due to their privacy-preserving and environment-agnostic properties. However, point cloud-based HAR faces challenges like data sparsity, high computation cost, and a lack of large annotated datasets. In this paper, we propose a novel two-stage framework that first transforms radar-based point cloud data into skeleton data using a Skeletal Dynamic Graph Convolutional Neural Network (SK-DGCNN), and then classifies the estimated skeletons using an efficient Spatial Temporal Graph Convolutional Network++ (ST-GCN++). The SK-DGCNN leverages dynamic edge convolution, attention mechanisms, and a custom loss function that combines Mean Square Error and Kullback–Leibler divergence to preserve the structural integrity of the human pose. Our pipeline achieves state-of-the-art performance on the MMActivity and DGUHA datasets, with Top-1 accuracy of 99.73% and 99.25%, and F1-scores of 99.62% and 99.25%, respectively. The proposed method provides an effective, lightweight, and privacy-conscious solution for real-world HAR applications using radar point cloud data.
{"title":"SK-DGCNN: Human activity recognition from point cloud data with skeleton transformation","authors":"Zihan Zhang, Aman Anand, Farhana Zulkernine","doi":"10.1016/j.mlwa.2026.100847","DOIUrl":"10.1016/j.mlwa.2026.100847","url":null,"abstract":"<div><div>Human Activity Recognition (HAR) has become a prominent research topic in artificial intelligence, with applications in surveillance, healthcare, and human–computer interaction. Among various data modalities used for HAR, skeleton and point cloud data offer strong potential due to their privacy-preserving and environment-agnostic properties. However, point cloud-based HAR faces challenges like data sparsity, high computation cost, and a lack of large annotated datasets. In this paper, we propose a novel two-stage framework that first transforms radar-based point cloud data into skeleton data using a Skeletal Dynamic Graph Convolutional Neural Network (SK-DGCNN), and then classifies the estimated skeletons using an efficient Spatial Temporal Graph Convolutional Network++ (ST-GCN++). The SK-DGCNN leverages dynamic edge convolution, attention mechanisms, and a custom loss function that combines Mean Square Error and Kullback–Leibler divergence to preserve the structural integrity of the human pose. Our pipeline achieves state-of-the-art performance on the MMActivity and DGUHA datasets, with Top-1 accuracy of 99.73% and 99.25%, and F1-scores of 99.62% and 99.25%, respectively. The proposed method provides an effective, lightweight, and privacy-conscious solution for real-world HAR applications using radar point cloud data.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100847"},"PeriodicalIF":4.9,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1016/j.mlwa.2026.100841
Hajer Ghodhbani , Suvendi Rimer , Khmaies Ouahada , Adel M. Alimi
This paper investigates the convergence of generative modeling techniques across diverse image analysis tasks by examining their application in two data-intensive scientific domains: biomedical imaging and astronomy. In these two domains, which tend to be scientifically distinct due to their size and aims, they share common challenges, including noise corruption, limited availability of annotated data, and the demand for high-fidelity image reconstruction. This study provides a critical review of the various variants of generative models, with a particular focus on cross-domain applications. Unlike existing surveys that predominantly focus on a single discipline, this study emphasises the transferability and adaptability of generative models across biomedical and astronomical imaging. The proposed review highlights the potential offered by generative models, particularly Generative Adversarial Networks (GANS), in enhancing data generation, image restoration, and analysis in both biomedical and astronomical studies.
{"title":"Cross-domain convergence of generative models: From biomedical to astronomical applications","authors":"Hajer Ghodhbani , Suvendi Rimer , Khmaies Ouahada , Adel M. Alimi","doi":"10.1016/j.mlwa.2026.100841","DOIUrl":"10.1016/j.mlwa.2026.100841","url":null,"abstract":"<div><div>This paper investigates the convergence of generative modeling techniques across diverse image analysis tasks by examining their application in two data-intensive scientific domains: biomedical imaging and astronomy. In these two domains, which tend to be scientifically distinct due to their size and aims, they share common challenges, including noise corruption, limited availability of annotated data, and the demand for high-fidelity image reconstruction. This study provides a critical review of the various variants of generative models, with a particular focus on cross-domain applications. Unlike existing surveys that predominantly focus on a single discipline, this study emphasises the transferability and adaptability of generative models across biomedical and astronomical imaging. The proposed review highlights the potential offered by generative models, particularly Generative Adversarial Networks (GANS), in enhancing data generation, image restoration, and analysis in both biomedical and astronomical studies.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"23 ","pages":"Article 100841"},"PeriodicalIF":4.9,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}