Pub Date : 2025-04-18DOI: 10.1016/j.mlwa.2025.100654
Sho Kuno , Hiroshi Kori
We tested the performance of reservoir computing (RC) in predicting the dynamics of a specific nonautonomous dynamical system. Specifically, we considered a van der Pol oscillator subjected to a periodic external force with frequent phase shifts. The reservoir computer, trained and optimized using simulation data generated for a specific phase shift, was designed to predict the oscillation dynamics under periodic external forces with different phase shifts. The results suggest that if the training data exhibit sufficient complexity, it is possible to quantitatively predict the oscillation dynamics subjected to different phase shifts. This study was motivated by the challenge of predicting the circadian rhythm of shift workers and optimizing their shift schedules individually. Our results suggest that RC could be utilized for such applications.
{"title":"Forecasting the forced van der Pol equation with frequent phase shifts using Reservoir Computing","authors":"Sho Kuno , Hiroshi Kori","doi":"10.1016/j.mlwa.2025.100654","DOIUrl":"10.1016/j.mlwa.2025.100654","url":null,"abstract":"<div><div>We tested the performance of reservoir computing (RC) in predicting the dynamics of a specific nonautonomous dynamical system. Specifically, we considered a van der Pol oscillator subjected to a periodic external force with frequent phase shifts. The reservoir computer, trained and optimized using simulation data generated for a specific phase shift, was designed to predict the oscillation dynamics under periodic external forces with different phase shifts. The results suggest that if the training data exhibit sufficient complexity, it is possible to quantitatively predict the oscillation dynamics subjected to different phase shifts. This study was motivated by the challenge of predicting the circadian rhythm of shift workers and optimizing their shift schedules individually. Our results suggest that RC could be utilized for such applications.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100654"},"PeriodicalIF":0.0,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143860446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The proliferation of harmful information, such as hate speech and online harassment, has increased in recent years due to social media's explosive expansion. Using the Facebook Hate Meme Dataset (FBHM), we create a reliable model in this work for identifying multimodal hate speech on online platforms. To effectively address class imbalance and improve classification accuracy, our hybrid model combines ResNet for image processing with RoBERTa for text analysis, leveraging Synthetic Minority Over-sampling Technique (SMOTE) and Incremental Principal Component Analysis (PCA) combined with adversarial machine learning techniques. The combination of Incremental PCA's dimensionality reduction and SMOTE's synthetic sample creation produces a potent combination that enhances the training dataset and maximizes feature representation, resulting in improved online content moderation techniques. We achieved an accuracy of 81.80 %, and a Macro-F1 score of 81.53 % on the FBHM dataset which represents an 18 % improvement in accuracy over the base model. These results provide significant novel insights into this important field of study by demonstrating the potential of adversarial approaches in creating reliable models for automated hate speech identification that can help create a safer online environment and can significantly reduce the emotional burden on human content moderators by handling the contents quickly and accurately. This study highlights the mutually beneficial effect of combining SMOTE and incremental PCA, demonstrating how they improve the model's ability to correct class imbalance and boost performance. The source code and dataset are publicly available on GitHub to facilitate reproducibility and further research. Link to the code and dataset below:
{"title":"Effective multimodal hate speech detection on Facebook hate memes dataset using incremental PCA, SMOTE, and adversarial learning","authors":"Emmanuel Ludivin Tchuindjang Tchokote , Elie Fute Tagne","doi":"10.1016/j.mlwa.2025.100647","DOIUrl":"10.1016/j.mlwa.2025.100647","url":null,"abstract":"<div><div>The proliferation of harmful information, such as hate speech and online harassment, has increased in recent years due to social media's explosive expansion. Using the Facebook Hate Meme Dataset (FBHM), we create a reliable model in this work for identifying multimodal hate speech on online platforms. To effectively address class imbalance and improve classification accuracy, our hybrid model combines ResNet for image processing with RoBERTa for text analysis, leveraging Synthetic Minority Over-sampling Technique (SMOTE) and Incremental Principal Component Analysis (PCA) combined with adversarial machine learning techniques. The combination of Incremental PCA's dimensionality reduction and SMOTE's synthetic sample creation produces a potent combination that enhances the training dataset and maximizes feature representation, resulting in improved online content moderation techniques. We achieved an accuracy of 81.80 %, and a Macro-F1 score of 81.53 % on the FBHM dataset which represents an 18 % improvement in accuracy over the base model. These results provide significant novel insights into this important field of study by demonstrating the potential of adversarial approaches in creating reliable models for automated hate speech identification that can help create a safer online environment and can significantly reduce the emotional burden on human content moderators by handling the contents quickly and accurately. This study highlights the mutually beneficial effect of combining SMOTE and incremental PCA, demonstrating how they improve the model's ability to correct class imbalance and boost performance. The source code and dataset are publicly available on GitHub to facilitate reproducibility and further research. Link to the code and dataset below:</div><div><span><span>https://github.com/ludivintchokote/HatePostDetection</span><svg><path></path></svg></span></div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100647"},"PeriodicalIF":0.0,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143855951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-11DOI: 10.1016/j.mlwa.2025.100648
Shahram Ghahremani, Uyen Trang Nguyen
We propose an attention-based LSTM model for predicting forex rates (ALFA). The prediction process consists of three stages. First, an LSTM model captures temporal dependencies within the forex time series. Next, an attention mechanism assigns different weights (importance scores) to the features of the LSTM model’s output. Finally, a fully connected layer generates predictions of forex rates. We conducted comprehensive experiments to evaluate and compare the performance of ALFA against several models used in previous work and against state-of-the-art deep learning models such as temporal convolutional networks (TCN) and Transformer. Experimental results show that ALFA outperforms the baseline models in most cases, across different currency pairs and feature sets, thanks to its attention mechanism that filters out irrelevant or redundant data to focus on important features. ALFA consistently ranks among the top three of the seven models evaluated and ranks first in most cases. We validated the effectiveness of ALFA by applying it to actual trading scenarios using several currency pairs. In these evaluations, ALFA achieves estimated annual return rates comparable to those of professional traders.
{"title":"Prediction of foreign currency exchange rates using an attention-based long short-term memory network","authors":"Shahram Ghahremani, Uyen Trang Nguyen","doi":"10.1016/j.mlwa.2025.100648","DOIUrl":"10.1016/j.mlwa.2025.100648","url":null,"abstract":"<div><div>We propose an <u>a</u>ttention-based <u>L</u>STM model for predicting <u>f</u>orex r<u>a</u>tes (ALFA). The prediction process consists of three stages. First, an LSTM model captures temporal dependencies within the forex time series. Next, an attention mechanism assigns different weights (importance scores) to the features of the LSTM model’s output. Finally, a fully connected layer generates predictions of forex rates. We conducted comprehensive experiments to evaluate and compare the performance of ALFA against several models used in previous work and against state-of-the-art deep learning models such as temporal convolutional networks (TCN) and Transformer. Experimental results show that ALFA outperforms the baseline models in most cases, across different currency pairs and feature sets, thanks to its attention mechanism that filters out irrelevant or redundant data to focus on important features. ALFA consistently ranks among the top three of the seven models evaluated and ranks first in most cases. We validated the effectiveness of ALFA by applying it to actual trading scenarios using several currency pairs. In these evaluations, ALFA achieves estimated annual return rates comparable to those of professional traders.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100648"},"PeriodicalIF":0.0,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143825949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-09DOI: 10.1016/j.mlwa.2025.100649
Pitso Walter Khoboko , Vukosi Marivate , Joseph Sefara
Training large language models (LLMs) can be prohibitively expensive. However, the emergence of new Parameter-Efficient Fine-Tuning (PEFT) strategies provides a cost-effective approach to unlocking the potential of LLMs across a variety of natural language processing (NLP) tasks. In this study, we selected the Mistral 7B language model as our primary LLM due to its superior performance, which surpasses that of LLAMA 2 13B across multiple benchmarks. By leveraging PEFT methods, we aimed to significantly reduce the cost of fine-tuning while maintaining high levels of performance.
Despite their advancements, LLMs often struggle with translation tasks for low-resource languages, particularly morphologically rich African languages. To address this, we employed customized prompt engineering techniques to enhance LLM translation capabilities for these languages.
Our experimentation focused on fine-tuning the Mistral 7B model to identify the best-performing ensemble using a custom prompt strategy. The results obtained from the fine-tuned Mistral 7B model were compared against several models: Serengeti, Gemma, Google Translate, and No Language Left Behind (NLLB). Specifically, Serengeti and Gemma were fine-tuned using the same custom prompt strategy as the Mistral model, while Google Translate and NLLB Gemma, which are pre-trained to handle English-to-Zulu and English-to-Xhosa translations, were evaluated directly on the test data set. This comparative analysis allowed us to assess the efficacy of the fine-tuned Mistral 7B model against both custom-tuned and pre-trained translation models.
LLMs have traditionally struggled to produce high-quality translations, especially for low-resource languages. Our experiments revealed that the key to improving translation performance lies in using the correct prompt during fine-tuning. We used the Mistral 7B model to develop a custom prompt that significantly enhanced translation quality for English-to-Zulu and English-to-Xhosa language pairs. After fine-tuning the Mistral 7B model for 30 GPU days, we compared its performance to the No Language Left Behind (NLLB) model and Google Translator API on the same test dataset. While NLLB achieved the highest scores across BLEU, G-Eval (cosine similarity), and Chrf++ (F1-score), our results demonstrated that Mistral 7B, with the custom prompt, still performed competitively.
Additionally, we showed that our prompt template can improve the translation accuracy of other models, such as Gemma and Serengeti, when applied to high-quality bilingual datasets. This demonstrates that our custom prompt strategy is adaptable across different model architectures, bilingual settings, and is highly effective in accelerating learning for low-resource language translation.
{"title":"Optimizing translation for low-resource languages: Efficient fine-tuning with custom prompt engineering in large language models","authors":"Pitso Walter Khoboko , Vukosi Marivate , Joseph Sefara","doi":"10.1016/j.mlwa.2025.100649","DOIUrl":"10.1016/j.mlwa.2025.100649","url":null,"abstract":"<div><div>Training large language models (LLMs) can be prohibitively expensive. However, the emergence of new Parameter-Efficient Fine-Tuning (PEFT) strategies provides a cost-effective approach to unlocking the potential of LLMs across a variety of natural language processing (NLP) tasks. In this study, we selected the Mistral 7B language model as our primary LLM due to its superior performance, which surpasses that of LLAMA 2 13B across multiple benchmarks. By leveraging PEFT methods, we aimed to significantly reduce the cost of fine-tuning while maintaining high levels of performance.</div><div>Despite their advancements, LLMs often struggle with translation tasks for low-resource languages, particularly morphologically rich African languages. To address this, we employed customized prompt engineering techniques to enhance LLM translation capabilities for these languages.</div><div>Our experimentation focused on fine-tuning the Mistral 7B model to identify the best-performing ensemble using a custom prompt strategy. The results obtained from the fine-tuned Mistral 7B model were compared against several models: Serengeti, Gemma, Google Translate, and No Language Left Behind (NLLB). Specifically, Serengeti and Gemma were fine-tuned using the same custom prompt strategy as the Mistral model, while Google Translate and NLLB Gemma, which are pre-trained to handle English-to-Zulu and English-to-Xhosa translations, were evaluated directly on the test data set. This comparative analysis allowed us to assess the efficacy of the fine-tuned Mistral 7B model against both custom-tuned and pre-trained translation models.</div><div>LLMs have traditionally struggled to produce high-quality translations, especially for low-resource languages. Our experiments revealed that the key to improving translation performance lies in using the correct prompt during fine-tuning. We used the Mistral 7B model to develop a custom prompt that significantly enhanced translation quality for English-to-Zulu and English-to-Xhosa language pairs. After fine-tuning the Mistral 7B model for 30 GPU days, we compared its performance to the No Language Left Behind (NLLB) model and Google Translator API on the same test dataset. While NLLB achieved the highest scores across BLEU, G-Eval (cosine similarity), and Chrf++ (F1-score), our results demonstrated that Mistral 7B, with the custom prompt, still performed competitively.</div><div>Additionally, we showed that our prompt template can improve the translation accuracy of other models, such as Gemma and Serengeti, when applied to high-quality bilingual datasets. This demonstrates that our custom prompt strategy is adaptable across different model architectures, bilingual settings, and is highly effective in accelerating learning for low-resource language translation.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100649"},"PeriodicalIF":0.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143815723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-09DOI: 10.1016/j.mlwa.2025.100641
Arpit Bajgoti, Rishik Gupta, Rinky Dwivedi
Natural Language to SQL (NL2SQL) for database query and search has been a significant research focus in recent years. However, existing methods have predominantly concentrated on SQL query generation, overlooking critical aspects such as enterprise cost, latency, and the overall analytical search experience. This paper presents an end-to-end NL2SQL pipeline named ASKSQL that integrates optimized and adaptable query recommendation, entity-swapping module, and skeleton-based caching to enhance the search experience. The pipeline also incorporates an intelligent schema selector for efficiently handling large schema entity selection and a fast and scalable adapter-based query generator. The proposed pipeline emphasizes minimizing Large Language Model (LLM) costs by finding search patterns in previously requested or generated queries. The pipeline can also be tuned to adapt to trends and common patterns observed from the daily search analytics. Experimental results demonstrate an average increase in accuracy by 5.83% and an overall decrease in latency by 32.6% as the usage count of this search pipeline increases highlighting its effectiveness in improving the NL2SQL search experience.
{"title":"ASKSQL: Enabling cost-effective natural language to SQL conversion for enhanced analytics and search","authors":"Arpit Bajgoti, Rishik Gupta, Rinky Dwivedi","doi":"10.1016/j.mlwa.2025.100641","DOIUrl":"10.1016/j.mlwa.2025.100641","url":null,"abstract":"<div><div>Natural Language to SQL (NL2SQL) for database query and search has been a significant research focus in recent years. However, existing methods have predominantly concentrated on SQL query generation, overlooking critical aspects such as enterprise cost, latency, and the overall analytical search experience. This paper presents an end-to-end NL2SQL pipeline named ASKSQL that integrates optimized and adaptable query recommendation, entity-swapping module, and skeleton-based caching to enhance the search experience. The pipeline also incorporates an intelligent schema selector for efficiently handling large schema entity selection and a fast and scalable adapter-based query generator. The proposed pipeline emphasizes minimizing Large Language Model (LLM) costs by finding search patterns in previously requested or generated queries. The pipeline can also be tuned to adapt to trends and common patterns observed from the daily search analytics. Experimental results demonstrate an average increase in accuracy by 5.83% and an overall decrease in latency by 32.6% as the usage count of this search pipeline increases highlighting its effectiveness in improving the NL2SQL search experience.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100641"},"PeriodicalIF":0.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143815722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-08DOI: 10.1016/j.mlwa.2025.100651
Joseph Otoo , Suleman Nasiru , Irene Dekomwine Angbing
Activation functions (AFs) in deep learning significantly impacts model performance. In this study, we proposed Shifted Hexpo (SHexpo), an improved variant of the Hexpo AF, designed to address limitations such as vanishing gradients and parameter sensitivity. SHexpo introduces a shifting parameter, enhancing its adaptability and performance across diverse data distributions. Using ResNet 101, DenseNet 169, 5 and 10-layer lightweight Convolutional Neural Network (CNN) trained on the SIPaKMeD dataset for cervical cancer classification, we compared SHexpo against Hexpo, ReLU, Swish, Mish, GELU and PReLU under four pre-processing techniques: zero-mean centering, normalization, their combination and ImageNet weights. Our results demonstrate that SHexpo achieves higher classification accuracy and better gradient stability than Hexpo while performing competitively with state-of-the-art AFs. Our findings indicate that SHexpo can be effectively integrated into both lightweight and deep architectures. Additionally, Grad-CAM visualizations highlight SHexpo’s capability to enhance interpretability by localizing the most relevant image regions contributing to model predictions. These results demonstrate SHexpo’s potentials for medical image analysis in low-resource settings.
{"title":"Shifted Hexpo activation function: An improved vanishing gradient mitigation activation function for disease classification","authors":"Joseph Otoo , Suleman Nasiru , Irene Dekomwine Angbing","doi":"10.1016/j.mlwa.2025.100651","DOIUrl":"10.1016/j.mlwa.2025.100651","url":null,"abstract":"<div><div>Activation functions (AFs) in deep learning significantly impacts model performance. In this study, we proposed Shifted Hexpo (SHexpo), an improved variant of the Hexpo AF, designed to address limitations such as vanishing gradients and parameter sensitivity. SHexpo introduces a shifting parameter, enhancing its adaptability and performance across diverse data distributions. Using ResNet 101, DenseNet 169, 5 and 10-layer lightweight Convolutional Neural Network (CNN) trained on the SIPaKMeD dataset for cervical cancer classification, we compared SHexpo against Hexpo, ReLU, Swish, Mish, GELU and PReLU under four pre-processing techniques: zero-mean centering, normalization, their combination and ImageNet weights. Our results demonstrate that SHexpo achieves higher classification accuracy and better gradient stability than Hexpo while performing competitively with state-of-the-art AFs. Our findings indicate that SHexpo can be effectively integrated into both lightweight and deep architectures. Additionally, Grad-CAM visualizations highlight SHexpo’s capability to enhance interpretability by localizing the most relevant image regions contributing to model predictions. These results demonstrate SHexpo’s potentials for medical image analysis in low-resource settings.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100651"},"PeriodicalIF":0.0,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143799222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-08DOI: 10.1016/j.mlwa.2025.100652
Md Fahim Sikder , Resmi Ramachandranpillai , Fredrik Heintz
The generation of high-quality, long-sequenced time-series data is essential due to its wide range of applications. In the past, standalone Recurrent and Convolutional Neural Network-based Generative Adversarial Networks (GAN) were used to synthesize time-series data. However, they are inadequate for generating long sequences of time-series data due to limitations in the architecture, such as difficulties in capturing long-range dependencies, limited temporal coherence, and scalability challenges. Furthermore, GANs are well known for their training instability and mode collapse problem. To address this, we propose TransFusion, a diffusion, and transformers-based generative model to generate high-quality long-sequence time-series data. We extended the sequence length to 384, surpassing the previous limit, and successfully generated high-quality synthetic data. Also, we introduce two evaluation metrics to evaluate the quality of the synthetic data as well as its predictive characteristics. TransFusion is evaluated using a diverse set of visual and empirical metrics, consistently outperforming the previous state-of-the-art by a significant margin.
{"title":"TransFusion: Generating long, high fidelity time series using diffusion models with transformers","authors":"Md Fahim Sikder , Resmi Ramachandranpillai , Fredrik Heintz","doi":"10.1016/j.mlwa.2025.100652","DOIUrl":"10.1016/j.mlwa.2025.100652","url":null,"abstract":"<div><div>The generation of high-quality, long-sequenced time-series data is essential due to its wide range of applications. In the past, standalone Recurrent and Convolutional Neural Network-based Generative Adversarial Networks (GAN) were used to synthesize time-series data. However, they are inadequate for generating long sequences of time-series data due to limitations in the architecture, such as difficulties in capturing long-range dependencies, limited temporal coherence, and scalability challenges. Furthermore, GANs are well known for their training instability and mode collapse problem. To address this, we propose <em>TransFusion</em>, a diffusion, and transformers-based generative model to generate high-quality long-sequence time-series data. We extended the sequence length to 384, surpassing the previous limit, and successfully generated high-quality synthetic data. Also, we introduce two evaluation metrics to evaluate the quality of the synthetic data as well as its predictive characteristics. <em>TransFusion</em> is evaluated using a diverse set of visual and empirical metrics, consistently outperforming the previous state-of-the-art by a significant margin.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100652"},"PeriodicalIF":0.0,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143829232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-08DOI: 10.1016/j.mlwa.2025.100650
Jonas Rheiner , Daniel Kerger , Matthias Drüppel
We introduce a CPU-real-time American Sign Language (ASL) recognition system designed to bridge communication barriers between the deaf community and the broader public. Our multi-step pipeline includes preprocessing, a hand detection stage, and a classification model using a MobileNetV3 convolutional neural network backbone followed by a classification head. We train and evaluate our model using a combined dataset of 252k labeled images from two distinct ASL datasets. This increases generalization on unseen data and strengthens our evaluation. We employ a two-step training: The backbone is initialized through transfer learning and frozen for the initial training of the head. A second training phase with lower learning rate and unfrozen weights yields an exceptional test accuracy of 99.98% and 99.93% on the two datasets - setting new benchmarks for ASL detection. With an CPU-inference time under 500 ms, it ensures real-time performance on affordable hardware. We propose a straightforward method to determine the amount of data needed for validation and testing and to quantify the remaining statistical error. For this we calculate accuracy as a function of validation set size, and thus ensure sufficient data is allocated for evaluation. Model interpretability is enhanced using Gradient-weighted Class Activation Mapping (Grad-CAM), which provides visual explanations by highlighting key image regions influencing predictions. This transparency fosters trust and improves user understanding of the system’s decisions. Our system sets new benchmarks in ASL gesture recognition by closing the accuracy gap of state-of-the-art solutions, while offering broad applicability through CPU-real-time inference and interpretability of our predictions.
{"title":"From pixels to letters: A high-accuracy CPU-real-time American Sign Language detection pipeline","authors":"Jonas Rheiner , Daniel Kerger , Matthias Drüppel","doi":"10.1016/j.mlwa.2025.100650","DOIUrl":"10.1016/j.mlwa.2025.100650","url":null,"abstract":"<div><div>We introduce a CPU-real-time American Sign Language (ASL) recognition system designed to bridge communication barriers between the deaf community and the broader public. Our multi-step pipeline includes preprocessing, a hand detection stage, and a classification model using a MobileNetV3 convolutional neural network backbone followed by a classification head. We train and evaluate our model using a combined dataset of 252k labeled images from two distinct ASL datasets. This increases generalization on unseen data and strengthens our evaluation. We employ a two-step training: The backbone is initialized through transfer learning and frozen for the initial training of the head. A second training phase with lower learning rate and unfrozen weights yields an exceptional test accuracy of 99.98% and <span><math><mo>></mo></math></span>99.93% on the two datasets - setting new benchmarks for ASL detection. With an CPU-inference time under 500 ms, it ensures real-time performance on affordable hardware. We propose a straightforward method to determine the amount of data needed for validation and testing and to quantify the remaining statistical error. For this we calculate accuracy as a function of validation set size, and thus ensure sufficient data is allocated for evaluation. Model interpretability is enhanced using Gradient-weighted Class Activation Mapping (Grad-CAM), which provides visual explanations by highlighting key image regions influencing predictions. This transparency fosters trust and improves user understanding of the system’s decisions. Our system sets new benchmarks in ASL gesture recognition by closing the accuracy gap of state-of-the-art solutions, while offering broad applicability through CPU-real-time inference and interpretability of our predictions.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100650"},"PeriodicalIF":0.0,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143829233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-27DOI: 10.1016/j.mlwa.2025.100643
Zhinan Gou , Yuxin Chen , Yuchen Long , Mengyao Jia , Zhili Liu , Jun Zhu
Emotion Recognition in Conversations (ERC) is one of the most prominent research directions in the field of Natural Language Processing (NLP). It aims to accurately identify the emotional state expressed in conversations and is widely applied in psychology, education, and healthcare. However, ERC poses significant challenges due to various factors, such as conversational context, the experience of speaker, and subtle differences between similar emotion labels. Existing research primarily strives for effective sequence and graph structure to model utterance and interaction. Moreover, these methods lack comprehensive understanding of conversational contexts and precise distinction between similar emotions. To address the limitation, in this study, we propose a novel framework combining Multiple Prompt Learning and Generation of Auxiliary Labeled Utterances (MLAL). Firstly, a global prompt is constructed to facilitate the understanding of the conversational context. Specifically, utterances originating from the same speaker are identified and interactively processed. Simultaneously, taking into account the influence of speaker experience, an experience prompt is designed by retrieving and interacting with the historical utterances of speakers that display high similarity. Besides, we generate refined auxiliary labeled utterances by means of the label paraphrasing mechanism to distinguish between similar emotions. Results from experiments show that our proposed approach performs better on three datasets than the state-of-the-art techniques currently in use.
{"title":"MLAL: Multiple Prompt Learning and Generation of Auxiliary Labeled Utterances for Emotion Recognition in Conversations","authors":"Zhinan Gou , Yuxin Chen , Yuchen Long , Mengyao Jia , Zhili Liu , Jun Zhu","doi":"10.1016/j.mlwa.2025.100643","DOIUrl":"10.1016/j.mlwa.2025.100643","url":null,"abstract":"<div><div>Emotion Recognition in Conversations (ERC) is one of the most prominent research directions in the field of Natural Language Processing (NLP). It aims to accurately identify the emotional state expressed in conversations and is widely applied in psychology, education, and healthcare. However, ERC poses significant challenges due to various factors, such as conversational context, the experience of speaker, and subtle differences between similar emotion labels. Existing research primarily strives for effective sequence and graph structure to model utterance and interaction. Moreover, these methods lack comprehensive understanding of conversational contexts and precise distinction between similar emotions. To address the limitation, in this study, we propose a novel framework combining Multiple Prompt Learning and Generation of Auxiliary Labeled Utterances (MLAL). Firstly, a global prompt is constructed to facilitate the understanding of the conversational context. Specifically, utterances originating from the same speaker are identified and interactively processed. Simultaneously, taking into account the influence of speaker experience, an experience prompt is designed by retrieving and interacting with the historical utterances of speakers that display high similarity. Besides, we generate refined auxiliary labeled utterances by means of the label paraphrasing mechanism to distinguish between similar emotions. Results from experiments show that our proposed approach performs better on three datasets than the state-of-the-art techniques currently in use.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100643"},"PeriodicalIF":0.0,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143738843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-27DOI: 10.1016/j.mlwa.2025.100644
Blessing Ogbuokiri , Ali Ahmadi , Nidhi Tripathi , Laleh Seyyed-Kalantari , Woldergebriel Assefa Woldegerima , Bruce Mellado , Jiahong Wu , James Orbinski , Ali Asgary , Jude Dzevela Kong
The emergence of the Omicron variant triggered intense emotional reactions toward vaccination in South Africa, particularly evident on platforms like Twitter. These emotions have the potential to significantly influence vaccine confidence and uptake, posing a challenge for public health efforts. However, existing research lacks a detailed understanding of how emotional dynamics during variant-specific outbreaks, such as Omicron, impact vaccination rates, especially at a province level. This gap limits the ability of policymakers to design targeted interventions. Our study addresses this problem by analyzing emotional reactions to vaccination during the Omicron outbreak using geotagged Twitter data and the Text2emotion pre-trained model. We validated the model by hand-labeling a random 10% of tweets and comparing results with BERT-labeled tweets, finding no significant differences ( for hand-labeled, for BERT). Using statistical methods such as , Mann–Whitney U, Granger causality, and Jaccard similarity, we identified a strong association between emotional intensities in vaccine-related posts and vaccination rates during the Omicron period () in specific provinces. Additionally, Latent Dirichlet Allocation (LDA) was employed for topic modeling, revealing variations in emotional reactions across topics and provinces before and during the Omicron variant. Our findings provide actionable insights for health policy-making by highlighting the role of emotional dynamics in vaccine acceptance and offering a province-level analysis of Twitter discussions. This study demonstrates the potential of social media data to understand public sentiment during disease outbreaks and serves as a valuable reference for future academic research.
{"title":"Emotional reactions towards vaccination during the emergence of the Omicron variant: Insights from twitter analysis in South Africa","authors":"Blessing Ogbuokiri , Ali Ahmadi , Nidhi Tripathi , Laleh Seyyed-Kalantari , Woldergebriel Assefa Woldegerima , Bruce Mellado , Jiahong Wu , James Orbinski , Ali Asgary , Jude Dzevela Kong","doi":"10.1016/j.mlwa.2025.100644","DOIUrl":"10.1016/j.mlwa.2025.100644","url":null,"abstract":"<div><div>The emergence of the Omicron variant triggered intense emotional reactions toward vaccination in South Africa, particularly evident on platforms like Twitter. These emotions have the potential to significantly influence vaccine confidence and uptake, posing a challenge for public health efforts. However, existing research lacks a detailed understanding of how emotional dynamics during variant-specific outbreaks, such as Omicron, impact vaccination rates, especially at a province level. This gap limits the ability of policymakers to design targeted interventions. Our study addresses this problem by analyzing emotional reactions to vaccination during the Omicron outbreak using geotagged Twitter data and the Text2emotion pre-trained model. We validated the model by hand-labeling a random 10% of tweets and comparing results with BERT-labeled tweets, finding no significant differences (<span><math><mrow><mi>p</mi><mo><</mo><mn>0</mn><mo>.</mo><mn>001</mn></mrow></math></span> for hand-labeled, <span><math><mrow><mi>p</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>002</mn></mrow></math></span> for BERT). Using statistical methods such as <span><math><msup><mrow><mi>χ</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>, Mann–Whitney U, Granger causality, and Jaccard similarity, we identified a strong association between emotional intensities in vaccine-related posts and vaccination rates during the Omicron period (<span><math><mrow><mi>p</mi><mo><</mo><mn>0</mn><mo>.</mo><mn>04</mn></mrow></math></span>) in specific provinces. Additionally, Latent Dirichlet Allocation (LDA) was employed for topic modeling, revealing variations in emotional reactions across topics and provinces before and during the Omicron variant. Our findings provide actionable insights for health policy-making by highlighting the role of emotional dynamics in vaccine acceptance and offering a province-level analysis of Twitter discussions. This study demonstrates the potential of social media data to understand public sentiment during disease outbreaks and serves as a valuable reference for future academic research.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100644"},"PeriodicalIF":0.0,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143785816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}