Machine learning with applications最新文献

Forecasting the forced van der Pol equation with frequent phase shifts using Reservoir Computing

Machine learning with applications

Pub Date : 2025-04-18 DOI: 10.1016/j.mlwa.2025.100654

Sho Kuno , Hiroshi Kori

We tested the performance of reservoir computing (RC) in predicting the dynamics of a specific nonautonomous dynamical system. Specifically, we considered a van der Pol oscillator subjected to a periodic external force with frequent phase shifts. The reservoir computer, trained and optimized using simulation data generated for a specific phase shift, was designed to predict the oscillation dynamics under periodic external forces with different phase shifts. The results suggest that if the training data exhibit sufficient complexity, it is possible to quantitatively predict the oscillation dynamics subjected to different phase shifts. This study was motivated by the challenge of predicting the circadian rhythm of shift workers and optimizing their shift schedules individually. Our results suggest that RC could be utilized for such applications.

引用次数: 0

Effective multimodal hate speech detection on Facebook hate memes dataset using incremental PCA, SMOTE, and adversarial learning

Machine learning with applications

Pub Date : 2025-04-17 DOI: 10.1016/j.mlwa.2025.100647

Emmanuel Ludivin Tchuindjang Tchokote , Elie Fute Tagne

The proliferation of harmful information, such as hate speech and online harassment, has increased in recent years due to social media's explosive expansion. Using the Facebook Hate Meme Dataset (FBHM), we create a reliable model in this work for identifying multimodal hate speech on online platforms. To effectively address class imbalance and improve classification accuracy, our hybrid model combines ResNet for image processing with RoBERTa for text analysis, leveraging Synthetic Minority Over-sampling Technique (SMOTE) and Incremental Principal Component Analysis (PCA) combined with adversarial machine learning techniques. The combination of Incremental PCA's dimensionality reduction and SMOTE's synthetic sample creation produces a potent combination that enhances the training dataset and maximizes feature representation, resulting in improved online content moderation techniques. We achieved an accuracy of 81.80 %, and a Macro-F1 score of 81.53 % on the FBHM dataset which represents an 18 % improvement in accuracy over the base model. These results provide significant novel insights into this important field of study by demonstrating the potential of adversarial approaches in creating reliable models for automated hate speech identification that can help create a safer online environment and can significantly reduce the emotional burden on human content moderators by handling the contents quickly and accurately. This study highlights the mutually beneficial effect of combining SMOTE and incremental PCA, demonstrating how they improve the model's ability to correct class imbalance and boost performance. The source code and dataset are publicly available on GitHub to facilitate reproducibility and further research. Link to the code and dataset below:

https://github.com/ludivintchokote/HatePostDetection

{"title":"Effective multimodal hate speech detection on Facebook hate memes dataset using incremental PCA, SMOTE, and adversarial learning","authors":"Emmanuel Ludivin Tchuindjang Tchokote , Elie Fute Tagne","doi":"10.1016/j.mlwa.2025.100647","DOIUrl":"10.1016/j.mlwa.2025.100647","url":null,"abstract":"<div><div>The proliferation of harmful information, such as hate speech and online harassment, has increased in recent years due to social media's explosive expansion. Using the Facebook Hate Meme Dataset (FBHM), we create a reliable model in this work for identifying multimodal hate speech on online platforms. To effectively address class imbalance and improve classification accuracy, our hybrid model combines ResNet for image processing with RoBERTa for text analysis, leveraging Synthetic Minority Over-sampling Technique (SMOTE) and Incremental Principal Component Analysis (PCA) combined with adversarial machine learning techniques. The combination of Incremental PCA's dimensionality reduction and SMOTE's synthetic sample creation produces a potent combination that enhances the training dataset and maximizes feature representation, resulting in improved online content moderation techniques. We achieved an accuracy of 81.80 %, and a Macro-F1 score of 81.53 % on the FBHM dataset which represents an 18 % improvement in accuracy over the base model. These results provide significant novel insights into this important field of study by demonstrating the potential of adversarial approaches in creating reliable models for automated hate speech identification that can help create a safer online environment and can significantly reduce the emotional burden on human content moderators by handling the contents quickly and accurately. This study highlights the mutually beneficial effect of combining SMOTE and incremental PCA, demonstrating how they improve the model's ability to correct class imbalance and boost performance. The source code and dataset are publicly available on GitHub to facilitate reproducibility and further research. Link to the code and dataset below:</div><div><span><span>https://github.com/ludivintchokote/HatePostDetection</span><svg><path></path></svg></span></div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100647"},"PeriodicalIF":0.0,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143855951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prediction of foreign currency exchange rates using an attention-based long short-term memory network

Machine learning with applications

Pub Date : 2025-04-11 DOI: 10.1016/j.mlwa.2025.100648

Shahram Ghahremani, Uyen Trang Nguyen

We propose an attention-based LSTM model for predicting forex rates (ALFA). The prediction process consists of three stages. First, an LSTM model captures temporal dependencies within the forex time series. Next, an attention mechanism assigns different weights (importance scores) to the features of the LSTM model’s output. Finally, a fully connected layer generates predictions of forex rates. We conducted comprehensive experiments to evaluate and compare the performance of ALFA against several models used in previous work and against state-of-the-art deep learning models such as temporal convolutional networks (TCN) and Transformer. Experimental results show that ALFA outperforms the baseline models in most cases, across different currency pairs and feature sets, thanks to its attention mechanism that filters out irrelevant or redundant data to focus on important features. ALFA consistently ranks among the top three of the seven models evaluated and ranks first in most cases. We validated the effectiveness of ALFA by applying it to actual trading scenarios using several currency pairs. In these evaluations, ALFA achieves estimated annual return rates comparable to those of professional traders.

我们提出了一种用于预测外汇汇率的基于注意力的 LSTM 模型（ALFA）。预测过程包括三个阶段。首先，LSTM 模型捕捉外汇时间序列中的时间依赖性。接下来，注意力机制为 LSTM 模型输出的特征分配不同的权重（重要性分数）。最后，全连接层生成外汇汇率预测。我们进行了全面的实验，对 ALFA 的性能进行了评估，并与之前工作中使用的几个模型以及时序卷积网络（TCN）和 Transformer 等最先进的深度学习模型进行了比较。实验结果表明，在不同货币对和特征集的大多数情况下，ALFA 的性能都优于基线模型，这要归功于它的注意力机制，该机制可以过滤掉无关或冗余数据，从而将注意力集中在重要特征上。在评估的七个模型中，ALFA 一直名列前三，并在大多数情况下名列第一。我们将 ALFA 应用于多个货币对的实际交易场景，验证了它的有效性。在这些评估中，ALFA 实现了与专业交易员相当的估计年收益率。

{"title":"Prediction of foreign currency exchange rates using an attention-based long short-term memory network","authors":"Shahram Ghahremani, Uyen Trang Nguyen","doi":"10.1016/j.mlwa.2025.100648","DOIUrl":"10.1016/j.mlwa.2025.100648","url":null,"abstract":"<div><div>We propose an <u>a</u>ttention-based <u>L</u>STM model for predicting <u>f</u>orex r<u>a</u>tes (ALFA). The prediction process consists of three stages. First, an LSTM model captures temporal dependencies within the forex time series. Next, an attention mechanism assigns different weights (importance scores) to the features of the LSTM model’s output. Finally, a fully connected layer generates predictions of forex rates. We conducted comprehensive experiments to evaluate and compare the performance of ALFA against several models used in previous work and against state-of-the-art deep learning models such as temporal convolutional networks (TCN) and Transformer. Experimental results show that ALFA outperforms the baseline models in most cases, across different currency pairs and feature sets, thanks to its attention mechanism that filters out irrelevant or redundant data to focus on important features. ALFA consistently ranks among the top three of the seven models evaluated and ranks first in most cases. We validated the effectiveness of ALFA by applying it to actual trading scenarios using several currency pairs. In these evaluations, ALFA achieves estimated annual return rates comparable to those of professional traders.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100648"},"PeriodicalIF":0.0,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143825949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing translation for low-resource languages: Efficient fine-tuning with custom prompt engineering in large language models

Machine learning with applications

Pub Date : 2025-04-09 DOI: 10.1016/j.mlwa.2025.100649

Pitso Walter Khoboko , Vukosi Marivate , Joseph Sefara

Training large language models (LLMs) can be prohibitively expensive. However, the emergence of new Parameter-Efficient Fine-Tuning (PEFT) strategies provides a cost-effective approach to unlocking the potential of LLMs across a variety of natural language processing (NLP) tasks. In this study, we selected the Mistral 7B language model as our primary LLM due to its superior performance, which surpasses that of LLAMA 2 13B across multiple benchmarks. By leveraging PEFT methods, we aimed to significantly reduce the cost of fine-tuning while maintaining high levels of performance.

Despite their advancements, LLMs often struggle with translation tasks for low-resource languages, particularly morphologically rich African languages. To address this, we employed customized prompt engineering techniques to enhance LLM translation capabilities for these languages.

Our experimentation focused on fine-tuning the Mistral 7B model to identify the best-performing ensemble using a custom prompt strategy. The results obtained from the fine-tuned Mistral 7B model were compared against several models: Serengeti, Gemma, Google Translate, and No Language Left Behind (NLLB). Specifically, Serengeti and Gemma were fine-tuned using the same custom prompt strategy as the Mistral model, while Google Translate and NLLB Gemma, which are pre-trained to handle English-to-Zulu and English-to-Xhosa translations, were evaluated directly on the test data set. This comparative analysis allowed us to assess the efficacy of the fine-tuned Mistral 7B model against both custom-tuned and pre-trained translation models.

LLMs have traditionally struggled to produce high-quality translations, especially for low-resource languages. Our experiments revealed that the key to improving translation performance lies in using the correct prompt during fine-tuning. We used the Mistral 7B model to develop a custom prompt that significantly enhanced translation quality for English-to-Zulu and English-to-Xhosa language pairs. After fine-tuning the Mistral 7B model for 30 GPU days, we compared its performance to the No Language Left Behind (NLLB) model and Google Translator API on the same test dataset. While NLLB achieved the highest scores across BLEU, G-Eval (cosine similarity), and Chrf++ (F1-score), our results demonstrated that Mistral 7B, with the custom prompt, still performed competitively.

Additionally, we showed that our prompt template can improve the translation accuracy of other models, such as Gemma and Serengeti, when applied to high-quality bilingual datasets. This demonstrates that our custom prompt strategy is adaptable across different model architectures, bilingual settings, and is highly effective in accelerating learning for low-resource language translation.

{"title":"Optimizing translation for low-resource languages: Efficient fine-tuning with custom prompt engineering in large language models","authors":"Pitso Walter Khoboko , Vukosi Marivate , Joseph Sefara","doi":"10.1016/j.mlwa.2025.100649","DOIUrl":"10.1016/j.mlwa.2025.100649","url":null,"abstract":"<div><div>Training large language models (LLMs) can be prohibitively expensive. However, the emergence of new Parameter-Efficient Fine-Tuning (PEFT) strategies provides a cost-effective approach to unlocking the potential of LLMs across a variety of natural language processing (NLP) tasks. In this study, we selected the Mistral 7B language model as our primary LLM due to its superior performance, which surpasses that of LLAMA 2 13B across multiple benchmarks. By leveraging PEFT methods, we aimed to significantly reduce the cost of fine-tuning while maintaining high levels of performance.</div><div>Despite their advancements, LLMs often struggle with translation tasks for low-resource languages, particularly morphologically rich African languages. To address this, we employed customized prompt engineering techniques to enhance LLM translation capabilities for these languages.</div><div>Our experimentation focused on fine-tuning the Mistral 7B model to identify the best-performing ensemble using a custom prompt strategy. The results obtained from the fine-tuned Mistral 7B model were compared against several models: Serengeti, Gemma, Google Translate, and No Language Left Behind (NLLB). Specifically, Serengeti and Gemma were fine-tuned using the same custom prompt strategy as the Mistral model, while Google Translate and NLLB Gemma, which are pre-trained to handle English-to-Zulu and English-to-Xhosa translations, were evaluated directly on the test data set. This comparative analysis allowed us to assess the efficacy of the fine-tuned Mistral 7B model against both custom-tuned and pre-trained translation models.</div><div>LLMs have traditionally struggled to produce high-quality translations, especially for low-resource languages. Our experiments revealed that the key to improving translation performance lies in using the correct prompt during fine-tuning. We used the Mistral 7B model to develop a custom prompt that significantly enhanced translation quality for English-to-Zulu and English-to-Xhosa language pairs. After fine-tuning the Mistral 7B model for 30 GPU days, we compared its performance to the No Language Left Behind (NLLB) model and Google Translator API on the same test dataset. While NLLB achieved the highest scores across BLEU, G-Eval (cosine similarity), and Chrf++ (F1-score), our results demonstrated that Mistral 7B, with the custom prompt, still performed competitively.</div><div>Additionally, we showed that our prompt template can improve the translation accuracy of other models, such as Gemma and Serengeti, when applied to high-quality bilingual datasets. This demonstrates that our custom prompt strategy is adaptable across different model architectures, bilingual settings, and is highly effective in accelerating learning for low-resource language translation.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100649"},"PeriodicalIF":0.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143815723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ASKSQL: Enabling cost-effective natural language to SQL conversion for enhanced analytics and search

Machine learning with applications

Pub Date : 2025-04-09 DOI: 10.1016/j.mlwa.2025.100641

Arpit Bajgoti, Rishik Gupta, Rinky Dwivedi

Natural Language to SQL (NL2SQL) for database query and search has been a significant research focus in recent years. However, existing methods have predominantly concentrated on SQL query generation, overlooking critical aspects such as enterprise cost, latency, and the overall analytical search experience. This paper presents an end-to-end NL2SQL pipeline named ASKSQL that integrates optimized and adaptable query recommendation, entity-swapping module, and skeleton-based caching to enhance the search experience. The pipeline also incorporates an intelligent schema selector for efficiently handling large schema entity selection and a fast and scalable adapter-based query generator. The proposed pipeline emphasizes minimizing Large Language Model (LLM) costs by finding search patterns in previously requested or generated queries. The pipeline can also be tuned to adapt to trends and common patterns observed from the daily search analytics. Experimental results demonstrate an average increase in accuracy by 5.83% and an overall decrease in latency by 32.6% as the usage count of this search pipeline increases highlighting its effectiveness in improving the NL2SQL search experience.

{"title":"ASKSQL: Enabling cost-effective natural language to SQL conversion for enhanced analytics and search","authors":"Arpit Bajgoti, Rishik Gupta, Rinky Dwivedi","doi":"10.1016/j.mlwa.2025.100641","DOIUrl":"10.1016/j.mlwa.2025.100641","url":null,"abstract":"<div><div>Natural Language to SQL (NL2SQL) for database query and search has been a significant research focus in recent years. However, existing methods have predominantly concentrated on SQL query generation, overlooking critical aspects such as enterprise cost, latency, and the overall analytical search experience. This paper presents an end-to-end NL2SQL pipeline named ASKSQL that integrates optimized and adaptable query recommendation, entity-swapping module, and skeleton-based caching to enhance the search experience. The pipeline also incorporates an intelligent schema selector for efficiently handling large schema entity selection and a fast and scalable adapter-based query generator. The proposed pipeline emphasizes minimizing Large Language Model (LLM) costs by finding search patterns in previously requested or generated queries. The pipeline can also be tuned to adapt to trends and common patterns observed from the daily search analytics. Experimental results demonstrate an average increase in accuracy by 5.83% and an overall decrease in latency by 32.6% as the usage count of this search pipeline increases highlighting its effectiveness in improving the NL2SQL search experience.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100641"},"PeriodicalIF":0.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143815722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Shifted Hexpo activation function: An improved vanishing gradient mitigation activation function for disease classification

Machine learning with applications

Pub Date : 2025-04-08 DOI: 10.1016/j.mlwa.2025.100651

Joseph Otoo , Suleman Nasiru , Irene Dekomwine Angbing

Activation functions (AFs) in deep learning significantly impacts model performance. In this study, we proposed Shifted Hexpo (SHexpo), an improved variant of the Hexpo AF, designed to address limitations such as vanishing gradients and parameter sensitivity. SHexpo introduces a shifting parameter, enhancing its adaptability and performance across diverse data distributions. Using ResNet 101, DenseNet 169, 5 and 10-layer lightweight Convolutional Neural Network (CNN) trained on the SIPaKMeD dataset for cervical cancer classification, we compared SHexpo against Hexpo, ReLU, Swish, Mish, GELU and PReLU under four pre-processing techniques: zero-mean centering, normalization, their combination and ImageNet weights. Our results demonstrate that SHexpo achieves higher classification accuracy and better gradient stability than Hexpo while performing competitively with state-of-the-art AFs. Our findings indicate that SHexpo can be effectively integrated into both lightweight and deep architectures. Additionally, Grad-CAM visualizations highlight SHexpo’s capability to enhance interpretability by localizing the most relevant image regions contributing to model predictions. These results demonstrate SHexpo’s potentials for medical image analysis in low-resource settings.

{"title":"Shifted Hexpo activation function: An improved vanishing gradient mitigation activation function for disease classification","authors":"Joseph Otoo , Suleman Nasiru , Irene Dekomwine Angbing","doi":"10.1016/j.mlwa.2025.100651","DOIUrl":"10.1016/j.mlwa.2025.100651","url":null,"abstract":"<div><div>Activation functions (AFs) in deep learning significantly impacts model performance. In this study, we proposed Shifted Hexpo (SHexpo), an improved variant of the Hexpo AF, designed to address limitations such as vanishing gradients and parameter sensitivity. SHexpo introduces a shifting parameter, enhancing its adaptability and performance across diverse data distributions. Using ResNet 101, DenseNet 169, 5 and 10-layer lightweight Convolutional Neural Network (CNN) trained on the SIPaKMeD dataset for cervical cancer classification, we compared SHexpo against Hexpo, ReLU, Swish, Mish, GELU and PReLU under four pre-processing techniques: zero-mean centering, normalization, their combination and ImageNet weights. Our results demonstrate that SHexpo achieves higher classification accuracy and better gradient stability than Hexpo while performing competitively with state-of-the-art AFs. Our findings indicate that SHexpo can be effectively integrated into both lightweight and deep architectures. Additionally, Grad-CAM visualizations highlight SHexpo’s capability to enhance interpretability by localizing the most relevant image regions contributing to model predictions. These results demonstrate SHexpo’s potentials for medical image analysis in low-resource settings.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100651"},"PeriodicalIF":0.0,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143799222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TransFusion: Generating long, high fidelity time series using diffusion models with transformers

Machine learning with applications

Pub Date : 2025-04-08 DOI: 10.1016/j.mlwa.2025.100652

Md Fahim Sikder , Resmi Ramachandranpillai , Fredrik Heintz

The generation of high-quality, long-sequenced time-series data is essential due to its wide range of applications. In the past, standalone Recurrent and Convolutional Neural Network-based Generative Adversarial Networks (GAN) were used to synthesize time-series data. However, they are inadequate for generating long sequences of time-series data due to limitations in the architecture, such as difficulties in capturing long-range dependencies, limited temporal coherence, and scalability challenges. Furthermore, GANs are well known for their training instability and mode collapse problem. To address this, we propose TransFusion, a diffusion, and transformers-based generative model to generate high-quality long-sequence time-series data. We extended the sequence length to 384, surpassing the previous limit, and successfully generated high-quality synthetic data. Also, we introduce two evaluation metrics to evaluate the quality of the synthetic data as well as its predictive characteristics. TransFusion is evaluated using a diverse set of visual and empirical metrics, consistently outperforming the previous state-of-the-art by a significant margin.

{"title":"TransFusion: Generating long, high fidelity time series using diffusion models with transformers","authors":"Md Fahim Sikder , Resmi Ramachandranpillai , Fredrik Heintz","doi":"10.1016/j.mlwa.2025.100652","DOIUrl":"10.1016/j.mlwa.2025.100652","url":null,"abstract":"<div><div>The generation of high-quality, long-sequenced time-series data is essential due to its wide range of applications. In the past, standalone Recurrent and Convolutional Neural Network-based Generative Adversarial Networks (GAN) were used to synthesize time-series data. However, they are inadequate for generating long sequences of time-series data due to limitations in the architecture, such as difficulties in capturing long-range dependencies, limited temporal coherence, and scalability challenges. Furthermore, GANs are well known for their training instability and mode collapse problem. To address this, we propose <em>TransFusion</em>, a diffusion, and transformers-based generative model to generate high-quality long-sequence time-series data. We extended the sequence length to 384, surpassing the previous limit, and successfully generated high-quality synthetic data. Also, we introduce two evaluation metrics to evaluate the quality of the synthetic data as well as its predictive characteristics. <em>TransFusion</em> is evaluated using a diverse set of visual and empirical metrics, consistently outperforming the previous state-of-the-art by a significant margin.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100652"},"PeriodicalIF":0.0,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143829232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

From pixels to letters: A high-accuracy CPU-real-time American Sign Language detection pipeline 从像素到字母：高精度 CPU 实时美国手语检测管道

Machine learning with applications

Pub Date : 2025-04-08 DOI: 10.1016/j.mlwa.2025.100650

Jonas Rheiner , Daniel Kerger , Matthias Drüppel

We introduce a CPU-real-time American Sign Language (ASL) recognition system designed to bridge communication barriers between the deaf community and the broader public. Our multi-step pipeline includes preprocessing, a hand detection stage, and a classification model using a MobileNetV3 convolutional neural network backbone followed by a classification head. We train and evaluate our model using a combined dataset of 252k labeled images from two distinct ASL datasets. This increases generalization on unseen data and strengthens our evaluation. We employ a two-step training: The backbone is initialized through transfer learning and frozen for the initial training of the head. A second training phase with lower learning rate and unfrozen weights yields an exceptional test accuracy of 99.98% and

>

99.93% on the two datasets - setting new benchmarks for ASL detection. With an CPU-inference time under 500 ms, it ensures real-time performance on affordable hardware. We propose a straightforward method to determine the amount of data needed for validation and testing and to quantify the remaining statistical error. For this we calculate accuracy as a function of validation set size, and thus ensure sufficient data is allocated for evaluation. Model interpretability is enhanced using Gradient-weighted Class Activation Mapping (Grad-CAM), which provides visual explanations by highlighting key image regions influencing predictions. This transparency fosters trust and improves user understanding of the system’s decisions. Our system sets new benchmarks in ASL gesture recognition by closing the accuracy gap of state-of-the-art solutions, while offering broad applicability through CPU-real-time inference and interpretability of our predictions.

{"title":"From pixels to letters: A high-accuracy CPU-real-time American Sign Language detection pipeline","authors":"Jonas Rheiner , Daniel Kerger , Matthias Drüppel","doi":"10.1016/j.mlwa.2025.100650","DOIUrl":"10.1016/j.mlwa.2025.100650","url":null,"abstract":"<div><div>We introduce a CPU-real-time American Sign Language (ASL) recognition system designed to bridge communication barriers between the deaf community and the broader public. Our multi-step pipeline includes preprocessing, a hand detection stage, and a classification model using a MobileNetV3 convolutional neural network backbone followed by a classification head. We train and evaluate our model using a combined dataset of 252k labeled images from two distinct ASL datasets. This increases generalization on unseen data and strengthens our evaluation. We employ a two-step training: The backbone is initialized through transfer learning and frozen for the initial training of the head. A second training phase with lower learning rate and unfrozen weights yields an exceptional test accuracy of 99.98% and <span><math><mo>></mo></math></span>99.93% on the two datasets - setting new benchmarks for ASL detection. With an CPU-inference time under 500 ms, it ensures real-time performance on affordable hardware. We propose a straightforward method to determine the amount of data needed for validation and testing and to quantify the remaining statistical error. For this we calculate accuracy as a function of validation set size, and thus ensure sufficient data is allocated for evaluation. Model interpretability is enhanced using Gradient-weighted Class Activation Mapping (Grad-CAM), which provides visual explanations by highlighting key image regions influencing predictions. This transparency fosters trust and improves user understanding of the system’s decisions. Our system sets new benchmarks in ASL gesture recognition by closing the accuracy gap of state-of-the-art solutions, while offering broad applicability through CPU-real-time inference and interpretability of our predictions.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100650"},"PeriodicalIF":0.0,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143829233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MLAL: Multiple Prompt Learning and Generation of Auxiliary Labeled Utterances for Emotion Recognition in Conversations

Machine learning with applications

Pub Date : 2025-03-27 DOI: 10.1016/j.mlwa.2025.100643

Zhinan Gou , Yuxin Chen , Yuchen Long , Mengyao Jia , Zhili Liu , Jun Zhu

Emotion Recognition in Conversations (ERC) is one of the most prominent research directions in the field of Natural Language Processing (NLP). It aims to accurately identify the emotional state expressed in conversations and is widely applied in psychology, education, and healthcare. However, ERC poses significant challenges due to various factors, such as conversational context, the experience of speaker, and subtle differences between similar emotion labels. Existing research primarily strives for effective sequence and graph structure to model utterance and interaction. Moreover, these methods lack comprehensive understanding of conversational contexts and precise distinction between similar emotions. To address the limitation, in this study, we propose a novel framework combining Multiple Prompt Learning and Generation of Auxiliary Labeled Utterances (MLAL). Firstly, a global prompt is constructed to facilitate the understanding of the conversational context. Specifically, utterances originating from the same speaker are identified and interactively processed. Simultaneously, taking into account the influence of speaker experience, an experience prompt is designed by retrieving and interacting with the historical utterances of speakers that display high similarity. Besides, we generate refined auxiliary labeled utterances by means of the label paraphrasing mechanism to distinguish between similar emotions. Results from experiments show that our proposed approach performs better on three datasets than the state-of-the-art techniques currently in use.

对话中的情绪识别（ERC）是自然语言处理（NLP）领域最突出的研究方向之一。它旨在准确识别对话中表达的情感状态，并被广泛应用于心理学、教育学和医疗保健领域。然而，由于会话语境、说话者的经验以及相似情绪标签之间的细微差别等各种因素，ERC 面临着巨大的挑战。现有的研究主要致力于通过有效的序列和图结构来建立语篇和交互模型。此外，这些方法缺乏对会话语境的全面理解和对相似情绪的精确区分。针对这一局限，我们在本研究中提出了一个结合多重提示学习和生成辅助标签语篇（MLAL）的新框架。首先，我们构建了一个全局提示，以促进对对话语境的理解。具体来说，来自同一说话人的语句会被识别并进行交互式处理。同时，考虑到说话人经验的影响，我们设计了一个经验提示，通过检索和交互显示出高度相似性的说话人的历史语篇。此外，我们还通过标签解析机制生成精炼的辅助标签语句，以区分相似情绪。实验结果表明，我们提出的方法在三个数据集上的表现优于目前使用的最先进技术。

{"title":"MLAL: Multiple Prompt Learning and Generation of Auxiliary Labeled Utterances for Emotion Recognition in Conversations","authors":"Zhinan Gou , Yuxin Chen , Yuchen Long , Mengyao Jia , Zhili Liu , Jun Zhu","doi":"10.1016/j.mlwa.2025.100643","DOIUrl":"10.1016/j.mlwa.2025.100643","url":null,"abstract":"<div><div>Emotion Recognition in Conversations (ERC) is one of the most prominent research directions in the field of Natural Language Processing (NLP). It aims to accurately identify the emotional state expressed in conversations and is widely applied in psychology, education, and healthcare. However, ERC poses significant challenges due to various factors, such as conversational context, the experience of speaker, and subtle differences between similar emotion labels. Existing research primarily strives for effective sequence and graph structure to model utterance and interaction. Moreover, these methods lack comprehensive understanding of conversational contexts and precise distinction between similar emotions. To address the limitation, in this study, we propose a novel framework combining Multiple Prompt Learning and Generation of Auxiliary Labeled Utterances (MLAL). Firstly, a global prompt is constructed to facilitate the understanding of the conversational context. Specifically, utterances originating from the same speaker are identified and interactively processed. Simultaneously, taking into account the influence of speaker experience, an experience prompt is designed by retrieving and interacting with the historical utterances of speakers that display high similarity. Besides, we generate refined auxiliary labeled utterances by means of the label paraphrasing mechanism to distinguish between similar emotions. Results from experiments show that our proposed approach performs better on three datasets than the state-of-the-art techniques currently in use.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100643"},"PeriodicalIF":0.0,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143738843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Emotional reactions towards vaccination during the emergence of the Omicron variant: Insights from twitter analysis in South Africa

Machine learning with applications

Pub Date : 2025-03-27 DOI: 10.1016/j.mlwa.2025.100644

Blessing Ogbuokiri , Ali Ahmadi , Nidhi Tripathi , Laleh Seyyed-Kalantari , Woldergebriel Assefa Woldegerima , Bruce Mellado , Jiahong Wu , James Orbinski , Ali Asgary , Jude Dzevela Kong

The emergence of the Omicron variant triggered intense emotional reactions toward vaccination in South Africa, particularly evident on platforms like Twitter. These emotions have the potential to significantly influence vaccine confidence and uptake, posing a challenge for public health efforts. However, existing research lacks a detailed understanding of how emotional dynamics during variant-specific outbreaks, such as Omicron, impact vaccination rates, especially at a province level. This gap limits the ability of policymakers to design targeted interventions. Our study addresses this problem by analyzing emotional reactions to vaccination during the Omicron outbreak using geotagged Twitter data and the Text2emotion pre-trained model. We validated the model by hand-labeling a random 10% of tweets and comparing results with BERT-labeled tweets, finding no significant differences (

p < 0.001

for hand-labeled,

p = 0.002

for BERT). Using statistical methods such as

χ^{2}

, Mann–Whitney U, Granger causality, and Jaccard similarity, we identified a strong association between emotional intensities in vaccine-related posts and vaccination rates during the Omicron period (

p < 0.04

) in specific provinces. Additionally, Latent Dirichlet Allocation (LDA) was employed for topic modeling, revealing variations in emotional reactions across topics and provinces before and during the Omicron variant. Our findings provide actionable insights for health policy-making by highlighting the role of emotional dynamics in vaccine acceptance and offering a province-level analysis of Twitter discussions. This study demonstrates the potential of social media data to understand public sentiment during disease outbreaks and serves as a valuable reference for future academic research.

{"title":"Emotional reactions towards vaccination during the emergence of the Omicron variant: Insights from twitter analysis in South Africa","authors":"Blessing Ogbuokiri , Ali Ahmadi , Nidhi Tripathi , Laleh Seyyed-Kalantari , Woldergebriel Assefa Woldegerima , Bruce Mellado , Jiahong Wu , James Orbinski , Ali Asgary , Jude Dzevela Kong","doi":"10.1016/j.mlwa.2025.100644","DOIUrl":"10.1016/j.mlwa.2025.100644","url":null,"abstract":"<div><div>The emergence of the Omicron variant triggered intense emotional reactions toward vaccination in South Africa, particularly evident on platforms like Twitter. These emotions have the potential to significantly influence vaccine confidence and uptake, posing a challenge for public health efforts. However, existing research lacks a detailed understanding of how emotional dynamics during variant-specific outbreaks, such as Omicron, impact vaccination rates, especially at a province level. This gap limits the ability of policymakers to design targeted interventions. Our study addresses this problem by analyzing emotional reactions to vaccination during the Omicron outbreak using geotagged Twitter data and the Text2emotion pre-trained model. We validated the model by hand-labeling a random 10% of tweets and comparing results with BERT-labeled tweets, finding no significant differences (<span><math><mrow><mi>p</mi><mo><</mo><mn>0</mn><mo>.</mo><mn>001</mn></mrow></math></span> for hand-labeled, <span><math><mrow><mi>p</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>002</mn></mrow></math></span> for BERT). Using statistical methods such as <span><math><msup><mrow><mi>χ</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>, Mann–Whitney U, Granger causality, and Jaccard similarity, we identified a strong association between emotional intensities in vaccine-related posts and vaccination rates during the Omicron period (<span><math><mrow><mi>p</mi><mo><</mo><mn>0</mn><mo>.</mo><mn>04</mn></mrow></math></span>) in specific provinces. Additionally, Latent Dirichlet Allocation (LDA) was employed for topic modeling, revealing variations in emotional reactions across topics and provinces before and during the Omicron variant. Our findings provide actionable insights for health policy-making by highlighting the role of emotional dynamics in vaccine acceptance and offering a province-level analysis of Twitter discussions. This study demonstrates the potential of social media data to understand public sentiment during disease outbreaks and serves as a valuable reference for future academic research.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100644"},"PeriodicalIF":0.0,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143785816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0