Pub Date : 2025-12-01DOI: 10.1016/j.nlp.2025.100188
Xuechun Wang , Rodney Beard , Rohitash Chandra
Machine translation using large language models (LLMs) is having a significant global impact, making communication easier. Mandarin Chinese is the official language used for communication by the government and media in China. In this study, we provide an automated assessment of the translation quality of Google Translate with human experts using sentiment and semantic analysis. In order to demonstrate our framework, we select the classic early twentieth-century novel ’The True Story of Ah Q’ with selected Mandarin Chinese to English translations. We use Google Translate to translate the given text into English and then conduct a chapter-wise sentiment analysis and semantic analysis to compare the extracted sentiments across the different translations. Our results indicate that the precision of Google Translate differs both in terms of semantic and sentiment analysis when compared to human expert translations. We find that Google Translate is unable to translate some of the specific words or phrases in Chinese, such as Chinese traditional idiomatic expressions. The mistranslations may be due to a lack of contextual significance and historical knowledge of China.
{"title":"Evaluation of google translate for Mandarin Chinese translation using sentiment and semantic analysis","authors":"Xuechun Wang , Rodney Beard , Rohitash Chandra","doi":"10.1016/j.nlp.2025.100188","DOIUrl":"10.1016/j.nlp.2025.100188","url":null,"abstract":"<div><div>Machine translation using large language models (LLMs) is having a significant global impact, making communication easier. Mandarin Chinese is the official language used for communication by the government and media in China. In this study, we provide an automated assessment of the translation quality of Google Translate with human experts using sentiment and semantic analysis. In order to demonstrate our framework, we select the classic early twentieth-century novel ’The True Story of Ah Q’ with selected Mandarin Chinese to English translations. We use Google Translate to translate the given text into English and then conduct a chapter-wise sentiment analysis and semantic analysis to compare the extracted sentiments across the different translations. Our results indicate that the precision of Google Translate differs both in terms of semantic and sentiment analysis when compared to human expert translations. We find that Google Translate is unable to translate some of the specific words or phrases in Chinese, such as Chinese traditional idiomatic expressions. The mistranslations may be due to a lack of contextual significance and historical knowledge of China.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100188"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-09DOI: 10.1016/j.nlp.2025.100194
Toheeb Aduramomi Jimoh, Tabea De Wille, Nikola S. Nikolov
Natural Language Processing (NLP) is becoming a dominant subset of artificial intelligence as the need to help machines understand human language becomes indispensable. Several NLP applications are ubiquitous, partly due to the myriad datasets being churned out daily through mediums like social networking sites. However, the growing development has not been evident in most African languages due to the persisting resource limitations, among other issues. Yorùbá language, a tonal and morphologically rich African language, suffers a similar fate, resulting in limited NLP usage. To encourage further research towards improving this situation, this systematic literature review aims to comprehensively analyse studies addressing NLP development for Yorùbá, identifying challenges, resources, techniques, and applications. A well-defined search string from a structured protocol was employed to search, select, and analyse 105 primary studies between 2014 and 2024 from reputable databases. The review highlights the scarcity of annotated corpora, the limited availability of pre-trained language models (PLMs), and linguistic challenges like tonal complexity and diacritic dependency as significant obstacles. It also revealed the prominent techniques, including rule-based methods, statistical methods, deep learning, and transfer learning, which were implemented alongside datasets of Yorùbá speech corpora, among others. The findings reveal a growing body of multilingual and monolingual resources, even though the field is constrained by socio-cultural factors such as code-switching and the desertion of language for digital usage. This review synthesises existing research, providing a foundation for advancing NLP for Yorùbá and in African languages generally. It aims to guide future research by identifying gaps and opportunities, thereby contributing to the broader inclusion of Yorùbá and other under-resourced African languages in global NLP advancements.
{"title":"Bridging gaps in natural language processing for Yorùbá: A systematic review of a decade of progress and prospects","authors":"Toheeb Aduramomi Jimoh, Tabea De Wille, Nikola S. Nikolov","doi":"10.1016/j.nlp.2025.100194","DOIUrl":"10.1016/j.nlp.2025.100194","url":null,"abstract":"<div><div>Natural Language Processing (NLP) is becoming a dominant subset of artificial intelligence as the need to help machines understand human language becomes indispensable. Several NLP applications are ubiquitous, partly due to the myriad datasets being churned out daily through mediums like social networking sites. However, the growing development has not been evident in most African languages due to the persisting resource limitations, among other issues. Yorùbá language, a tonal and morphologically rich African language, suffers a similar fate, resulting in limited NLP usage. To encourage further research towards improving this situation, this systematic literature review aims to comprehensively analyse studies addressing NLP development for Yorùbá, identifying challenges, resources, techniques, and applications. A well-defined search string from a structured protocol was employed to search, select, and analyse 105 primary studies between 2014 and 2024 from reputable databases. The review highlights the scarcity of annotated corpora, the limited availability of pre-trained language models (PLMs), and linguistic challenges like tonal complexity and diacritic dependency as significant obstacles. It also revealed the prominent techniques, including rule-based methods, statistical methods, deep learning, and transfer learning, which were implemented alongside datasets of Yorùbá speech corpora, among others. The findings reveal a growing body of multilingual and monolingual resources, even though the field is constrained by socio-cultural factors such as code-switching and the desertion of language for digital usage. This review synthesises existing research, providing a foundation for advancing NLP for Yorùbá and in African languages generally. It aims to guide future research by identifying gaps and opportunities, thereby contributing to the broader inclusion of Yorùbá and other under-resourced African languages in global NLP advancements.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100194"},"PeriodicalIF":0.0,"publicationDate":"2025-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-08DOI: 10.1016/j.nlp.2025.100189
Juan Camilo Sepúlveda Montoya , Nicole Tatiana Ríos Gómez , José A. Jaramillo Villegas
Effort estimation remains a major challenge in Agile software development. Inaccurate story point forecasts can lead to budget overruns, schedule delays, and diminished stakeholder trust. Widely used approaches, such as story point estimation, are helpful for planning but rely heavily on subjective human judgment, making them prone to inconsistency and bias. Prior efforts applying machine learning and natural language processing (e.g. Deep-SE, GPT2SP) to automate story point prediction have achieved only limited success, often suffering from accuracy issues, poor cross-project adaptability, and high computational costs. To address these challenges, we introduce Llama3SP, which fine-tunes Meta’s LLaMA 3.2 language model using QLoRA, a resource-efficient adaptation technique. This combination enables training of a high-performance model on standard GPUs without sacrificing prediction quality. Experiments show that Llama3SP provides precise and consistent story point estimates, outperforming or matching previous models like GPT2SP and other comparably sized alternatives, all while operating under significantly lower hardware constraints. These findings highlight how combining advanced NLP models with efficient training techniques can make accurate effort estimation more accessible and practical for agile teams.
{"title":"Llama3SP: A resource-Efficient large language model for agile story point estimation","authors":"Juan Camilo Sepúlveda Montoya , Nicole Tatiana Ríos Gómez , José A. Jaramillo Villegas","doi":"10.1016/j.nlp.2025.100189","DOIUrl":"10.1016/j.nlp.2025.100189","url":null,"abstract":"<div><div>Effort estimation remains a major challenge in Agile software development. Inaccurate story point forecasts can lead to budget overruns, schedule delays, and diminished stakeholder trust. Widely used approaches, such as story point estimation, are helpful for planning but rely heavily on subjective human judgment, making them prone to inconsistency and bias. Prior efforts applying machine learning and natural language processing (e.g. Deep-SE, GPT2SP) to automate story point prediction have achieved only limited success, often suffering from accuracy issues, poor cross-project adaptability, and high computational costs. To address these challenges, we introduce <strong>Llama3SP</strong>, which fine-tunes Meta’s LLaMA 3.2 language model using QLoRA, a resource-efficient adaptation technique. This combination enables training of a high-performance model on standard GPUs without sacrificing prediction quality. Experiments show that Llama3SP provides precise and consistent story point estimates, outperforming or matching previous models like GPT2SP and other comparably sized alternatives, all while operating under significantly lower hardware constraints. These findings highlight how combining advanced NLP models with efficient training techniques can make accurate effort estimation more accessible and practical for agile teams.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100189"},"PeriodicalIF":0.0,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145519630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-04DOI: 10.1016/j.nlp.2025.100192
Zouheir Banou, Sanaa El Filali, El Habib Benlahmar, Fatima-Zahra Alaoui, Laila El Jiani, Hasnae Sakhi
Figurative language detection has emerged as a critical task in natural language processing (NLP), enabling machines to comprehend non-literal expressions such as metaphor, irony, and sarcasm. This study presents a systematic literature review with a multilevel analytical framework, examining figurative language across lexical, syntactic, semantic, discourse, and pragmatic levels. We investigate the interplay between feature engineering, model architectures, and annotation strategies across different languages, analyzing datasets, linguistic resources, and evaluation metrics. Special attention is given to morphologically rich and low-resource languages, where deep learning dominates but rule-based and hybrid approaches remain relevant. Our findings indicate that deep learning models–especially transformer-based architectures like BERT and RoBERTa–consistently outperform other approaches, particularly in semantic and discourse-level tasks, due to their ability to capture context-rich and abstract patterns. However, these models often lack interpretability, raising concerns about transparency. Additional challenges include inconsistencies in annotation practices, class imbalance between figurative and literal instances, and limited data coverage for under-resourced languages. The absence of standardized evaluation metrics further complicates cross-study comparison, especially when diverse figurative language styles are involved. By structuring our analysis through linguistic and computational dimensions, this review aims to facilitate the development of more robust, inclusive, and explainable figurative language detection systems.
{"title":"A systematic review of figurative language detection: Methods, challenges, and multilingual perspectives","authors":"Zouheir Banou, Sanaa El Filali, El Habib Benlahmar, Fatima-Zahra Alaoui, Laila El Jiani, Hasnae Sakhi","doi":"10.1016/j.nlp.2025.100192","DOIUrl":"10.1016/j.nlp.2025.100192","url":null,"abstract":"<div><div>Figurative language detection has emerged as a critical task in natural language processing (NLP), enabling machines to comprehend non-literal expressions such as metaphor, irony, and sarcasm. This study presents a systematic literature review with a multilevel analytical framework, examining figurative language across lexical, syntactic, semantic, discourse, and pragmatic levels. We investigate the interplay between feature engineering, model architectures, and annotation strategies across different languages, analyzing datasets, linguistic resources, and evaluation metrics. Special attention is given to morphologically rich and low-resource languages, where deep learning dominates but rule-based and hybrid approaches remain relevant. Our findings indicate that deep learning models–especially transformer-based architectures like BERT and RoBERTa–consistently outperform other approaches, particularly in semantic and discourse-level tasks, due to their ability to capture context-rich and abstract patterns. However, these models often lack interpretability, raising concerns about transparency. Additional challenges include inconsistencies in annotation practices, class imbalance between figurative and literal instances, and limited data coverage for under-resourced languages. The absence of standardized evaluation metrics further complicates cross-study comparison, especially when diverse figurative language styles are involved. By structuring our analysis through linguistic and computational dimensions, this review aims to facilitate the development of more robust, inclusive, and explainable figurative language detection systems.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100192"},"PeriodicalIF":0.0,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01DOI: 10.1016/j.nlp.2025.100193
Shaowu Bao , Jiajia Wang
The exponential increase in learning materials has occasioned a greater need for personalized learning experiences, yet conventional unimodal recommender systems are not effective in addressing students' diversified demands.In this research, a personalized recommendation system is advocated, backed by a multimodal knowledge graph that consolidates text, image, and video knowledge to improve accuracy, interpretability, and adaptability.The system uses different algorithms for relationship and entity extractions and includes a graph attention module with hierarchical subgraphs to build a semantic network among "Knowledge Points–Students–Resources." A dual-path embedding module that fuses Node2vec for structural semantics with LSTM for learning temporal behavior provides explainable recommendations using path confidence.Experimental results demonstrate that the model's core performance comprehensively outperforms traditional methods and newly introduced comparison models: Entity alignment accuracy (Hits@10=62.7 %) improved by 13.4 % over traditional Node2vec, 6.8 % over KGAT, and 4.2 % over M3KGR; cross-modal similarity (0.76) increased by 11.8 % over traditional Node2vec and 5.6 % over M3KGR. Learning engagement (effective duration 65 %, completion rate 78 %) and knowledge acquisition efficiency (coverage 67 %, cycle reduction 30 %) are significantly optimized, improving by 8.3 %-11.4 % and 8.1 %-20 % over M3KGR respectively; Achieved an explainability score of 4.3 (34.4 %-104.8 % improvement over traditional methods, 22.9 % improvement over KGAT, and 13.2 % improvement over M3KGR), with a response time of 98 ms (40.6 % reduction compared to KGAT and 25.8 % reduction compared to M3KGR).This indicates that multimodal knowledge graph significantly improves recommendation performance through structured semantics and dynamic fusion,providing a new path for personalized education.
{"title":"Research on the methodology of personalized recommender systems based on multimodal knowledge graphs","authors":"Shaowu Bao , Jiajia Wang","doi":"10.1016/j.nlp.2025.100193","DOIUrl":"10.1016/j.nlp.2025.100193","url":null,"abstract":"<div><div>The exponential increase in learning materials has occasioned a greater need for personalized learning experiences, yet conventional unimodal recommender systems are not effective in addressing students' diversified demands.In this research, a personalized recommendation system is advocated, backed by a multimodal knowledge graph that consolidates text, image, and video knowledge to improve accuracy, interpretability, and adaptability.The system uses different algorithms for relationship and entity extractions and includes a graph attention module with hierarchical subgraphs to build a semantic network among \"Knowledge Points–Students–Resources.\" A dual-path embedding module that fuses Node2vec for structural semantics with LSTM for learning temporal behavior provides explainable recommendations using path confidence.Experimental results demonstrate that the model's core performance comprehensively outperforms traditional methods and newly introduced comparison models: Entity alignment accuracy (Hits@10=62.7 %) improved by 13.4 % over traditional Node2vec, 6.8 % over KGAT, and 4.2 % over M3KGR; cross-modal similarity (0.76) increased by 11.8 % over traditional Node2vec and 5.6 % over M3KGR. Learning engagement (effective duration 65 %, completion rate 78 %) and knowledge acquisition efficiency (coverage 67 %, cycle reduction 30 %) are significantly optimized, improving by 8.3 %-11.4 % and 8.1 %-20 % over M3KGR respectively; Achieved an explainability score of 4.3 (34.4 %-104.8 % improvement over traditional methods, 22.9 % improvement over KGAT, and 13.2 % improvement over M3KGR), with a response time of 98 ms (40.6 % reduction compared to KGAT and 25.8 % reduction compared to M3KGR).This indicates that multimodal knowledge graph significantly improves recommendation performance through structured semantics and dynamic fusion,providing a new path for personalized education.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100193"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145519629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-30DOI: 10.1016/j.nlp.2025.100191
Animesh Chandra Roy , Tanvir Mahmud , Tahlil Abrar
Social media platforms like Facebook, Instagram, and Twitter are widely used; users frequently share their daily lives by uploading pictures, posts, and videos, which gain significant popularity. However, social media posts often receive a mix of reactions, ranging from positive to negative, and in some instances, negative comments escalate into cyberbullying. Numerous studies have addressed this issue by focusing on cyberbullying classification, primarily through binary classification using multimodal data or targeting either text or image data. This study investigates the identification of multi-class images like No-bullying, Religious, Sexual, and Others using the deep learning pre-trained model MobileNetV2 to detect multiple image labels and achieved an F1-score of 0.86. For categorizing hate comments, we consider multiple classes, including Not Hate, Slang, Sexual, Racial, and Religious-related content. Extensive experiments were conducted on a novel Bengali-English code-mixed dataset, utilizing a combination of advanced transformer models, traditional machine learning techniques, and deep learning approaches to detect multiple hate comment labels. Bangla BERT achieved the highest F1-score of 0.79, followed closely by SVM at 0.78 and BiLSTM with attention at 0.73. These findings underscore the effectiveness of these models in capturing the complexities of code-mixed Bengali-English, offering valuable insights into cyberbullying detection in diverse linguistic contexts. This research contributes essential strategies for improving online safety and fostering respectful digital interactions.
{"title":"A multi-class cyberbullying classification on image and text in code-mixed Bangla-English social media content","authors":"Animesh Chandra Roy , Tanvir Mahmud , Tahlil Abrar","doi":"10.1016/j.nlp.2025.100191","DOIUrl":"10.1016/j.nlp.2025.100191","url":null,"abstract":"<div><div>Social media platforms like Facebook, Instagram, and Twitter are widely used; users frequently share their daily lives by uploading pictures, posts, and videos, which gain significant popularity. However, social media posts often receive a mix of reactions, ranging from positive to negative, and in some instances, negative comments escalate into cyberbullying. Numerous studies have addressed this issue by focusing on cyberbullying classification, primarily through binary classification using multimodal data or targeting either text or image data. This study investigates the identification of multi-class images like No-bullying, Religious, Sexual, and Others using the deep learning pre-trained model MobileNetV2 to detect multiple image labels and achieved an F1-score of 0.86. For categorizing hate comments, we consider multiple classes, including Not Hate, Slang, Sexual, Racial, and Religious-related content. Extensive experiments were conducted on a novel Bengali-English code-mixed dataset, utilizing a combination of advanced transformer models, traditional machine learning techniques, and deep learning approaches to detect multiple hate comment labels. Bangla BERT achieved the highest F1-score of 0.79, followed closely by SVM at 0.78 and BiLSTM with attention at 0.73. These findings underscore the effectiveness of these models in capturing the complexities of code-mixed Bengali-English, offering valuable insights into cyberbullying detection in diverse linguistic contexts. This research contributes essential strategies for improving online safety and fostering respectful digital interactions.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100191"},"PeriodicalIF":0.0,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-29DOI: 10.1016/j.nlp.2025.100190
Mohammad Rifat Ahmmad Rashid, Aritra Das, Kazi Ferdous Hasan, Md. Rakibul Hasan, Mithila Sultana, Mahamudul Hasan, Raihan Ul Islam, Rashedul Amin Tuhin, M. Saddam Hossain Khan
Sentiment analysis of code-mixed reviews poses unique challenges due to linguistic variability and contex- tual ambiguity, particularly in multilingual e-commerce environments. In this paper, we introduce BERT- KAN, a novel hybrid architecture that enhances bilingual sentiment analysis in Bangladeshi e-commerce by integrating the deep contextual representations of Bidirectional Encoder Representations from Transform- ers(BERT) with a Kolmogorov-Arnold Network (KAN) layer. The KAN component employs a polynomial expansion to capture complex non-linear relationships within code-mixed Bengali-English text, while an innovative polynomial attention mechanism further refines feature extraction. Extensive ablation studies were conducted on two base models—bert-base-multilingual-uncased and BanglaBERT—using polynomial degrees of 2 and 3. Notably, the best configuration for bert-base-multilingual-uncased (employing KAN, polynomial attention, and feature fusion with polynomial degree 2) achieved a precision of 95.3 %, recall of 97.0 %, and an F1-score of 96.1 %. Comparable performance was observed for polynomial degree 3 (precision 96.2 %, recall 95.8 %, and F1-score 96.0 %), while cross-validation experiments yielded average accuracies exceeding 90 % across multiple folds. Detailed error analyses, supported by confusion matrices and sam- ple predictions, as well as discussions on computational requirements and deployment challenges, further validate the robustness of our approach.
{"title":"BERT-KAN: Enhancing bilingual sentiment analysis in bangladeshi E-commerce through fine-tuned large language models","authors":"Mohammad Rifat Ahmmad Rashid, Aritra Das, Kazi Ferdous Hasan, Md. Rakibul Hasan, Mithila Sultana, Mahamudul Hasan, Raihan Ul Islam, Rashedul Amin Tuhin, M. Saddam Hossain Khan","doi":"10.1016/j.nlp.2025.100190","DOIUrl":"10.1016/j.nlp.2025.100190","url":null,"abstract":"<div><div>Sentiment analysis of code-mixed reviews poses unique challenges due to linguistic variability and contex- tual ambiguity, particularly in multilingual e-commerce environments. In this paper, we introduce BERT- KAN, a novel hybrid architecture that enhances bilingual sentiment analysis in Bangladeshi e-commerce by integrating the deep contextual representations of Bidirectional Encoder Representations from Transform- ers(BERT) with a Kolmogorov-Arnold Network (KAN) layer. The KAN component employs a polynomial expansion to capture complex non-linear relationships within code-mixed Bengali-English text, while an innovative polynomial attention mechanism further refines feature extraction. Extensive ablation studies were conducted on two base models—bert-base-multilingual-uncased and BanglaBERT—using polynomial degrees of 2 and 3. Notably, the best configuration for bert-base-multilingual-uncased (employing KAN, polynomial attention, and feature fusion with polynomial degree 2) achieved a precision of 95.3 %, recall of 97.0 %, and an F1-score of 96.1 %. Comparable performance was observed for polynomial degree 3 (precision 96.2 %, recall 95.8 %, and F1-score 96.0 %), while cross-validation experiments yielded average accuracies exceeding 90 % across multiple folds. Detailed error analyses, supported by confusion matrices and sam- ple predictions, as well as discussions on computational requirements and deployment challenges, further validate the robustness of our approach.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100190"},"PeriodicalIF":0.0,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-16DOI: 10.1016/j.nlp.2025.100185
Tasnim Ahmed, Salimur Choudhury
Mathematical optimization drives decisions across domains such as supply chains, energy grids, and financial systems, among others. Linear programming (LP), a tool for optimizing objectives under constraints, requires domain expertise to translate real-world problems into executable models. We explore automating this translation using Large Language Models (LLMs), generating solver-ready code from textual descriptions to reduce reliance on specialized knowledge. We propose OPT2CODE, a Retrieval-Augmented Generation (RAG) framework that utilizes compact LLMs to transform problem descriptions into optimization solver executable code. OPT2CODE utilizes code documentation for document retrieval and incorporates multiple LLM-as-a-Judge components to improve baseline performance. In addition, OPT2CODE is solver flexible and LLM flexible, and it can generate code for a broad range of mathematical optimization problems such as linear, integer linear, and mixed-integer linear, across different solvers as long as the corresponding solver documentation is available. We show empirical results on two datasets, NL4Opt and EOR, and across two solvers, Gurobi and FICO Xpress, using Llama-3.1-8B and Qwen-2.5-Coder-7B. OPT2CODE consistently improves code generation accuracy, reaching up to on NL4Opt with FICO Xpress and on EOR with Gurobi. Finally, our energy analysis shows that these improvements come at reasonable computational cost: OPT2CODE consumes 2,732.91 joules/sample (Llama-3.1-8B) and 1,759.95 joules/sample (Qwen-2.5-Coder-7B).
{"title":"OPT2CODE: A retrieval-augmented framework for solving linear programming problems","authors":"Tasnim Ahmed, Salimur Choudhury","doi":"10.1016/j.nlp.2025.100185","DOIUrl":"10.1016/j.nlp.2025.100185","url":null,"abstract":"<div><div>Mathematical optimization drives decisions across domains such as supply chains, energy grids, and financial systems, among others. Linear programming (LP), a tool for optimizing objectives under constraints, requires domain expertise to translate real-world problems into executable models. We explore automating this translation using Large Language Models (LLMs), generating solver-ready code from textual descriptions to reduce reliance on specialized knowledge. We propose OPT2CODE, a Retrieval-Augmented Generation (RAG) framework that utilizes compact LLMs to transform problem descriptions into optimization solver executable code. OPT2CODE utilizes code documentation for document retrieval and incorporates multiple LLM-as-a-Judge components to improve baseline performance. In addition, OPT2CODE is solver flexible and LLM flexible, and it can generate code for a broad range of mathematical optimization problems such as linear, integer linear, and mixed-integer linear, across different solvers as long as the corresponding solver documentation is available. We show empirical results on two datasets, NL4Opt and EOR, and across two solvers, Gurobi and FICO Xpress, using Llama-3.1-8B and Qwen-2.5-Coder-7B. OPT2CODE consistently improves code generation accuracy, reaching up to <span><math><mrow><mn>67.13</mn><mspace></mspace><mo>%</mo></mrow></math></span> on NL4Opt with FICO Xpress and <span><math><mrow><mn>80.00</mn><mspace></mspace><mo>%</mo></mrow></math></span> on EOR with Gurobi. Finally, our energy analysis shows that these improvements come at reasonable computational cost: OPT2CODE consumes 2,732.91 joules/sample (Llama-3.1-8B) and 1,759.95 joules/sample (Qwen-2.5-Coder-7B).</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100185"},"PeriodicalIF":0.0,"publicationDate":"2025-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145519631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-13DOI: 10.1016/j.nlp.2025.100187
Kathleen P. Mealey, Jonathan A. Karr Jr, Priscila Saboia Moreira, Paul R. Brenner, Charles F. Vardeman II
Deriving operational intelligence from organizational data repositories is a key challenge due to the dichotomy of data confidentiality vs data integration objectives, as well as the limitations of Natural Language Processing (NLP) tools relative to the specific knowledge structure of domains such as operations and maintenance. In this work, we discuss Knowledge Graph construction and break down the Knowledge Extraction process into its Named Entity Recognition, Coreference Resolution, Named Entity Linking, and Relation Extraction functional components. We then evaluate sixteen NLP tools in concert with or in comparison to the rapidly advancing capabilities of Large Language Models (LLMs). We focus on the operational and maintenance intelligence use case for trusted applications in the aircraft industry. A baseline dataset is derived from a rich public domain US Federal Aviation Administration dataset focused on equipment failures or maintenance requirements. We assess the zero-shot performance of NLP and LLM tools that can be operated within a controlled, confidential environment (no data is sent to third parties). Based on our observation of significant performance limitations, we discuss the challenges related to trusted NLP and LLM tools as well as their Technical Readiness Level for wider use in mission-critical industries such as aviation. We conclude with recommendations to enhance trust and provide our open-source curated dataset to support further baseline testing and evaluation.
{"title":"Trusted knowledge extraction for operations and maintenance intelligence","authors":"Kathleen P. Mealey, Jonathan A. Karr Jr, Priscila Saboia Moreira, Paul R. Brenner, Charles F. Vardeman II","doi":"10.1016/j.nlp.2025.100187","DOIUrl":"10.1016/j.nlp.2025.100187","url":null,"abstract":"<div><div>Deriving operational intelligence from organizational data repositories is a key challenge due to the dichotomy of data confidentiality vs data integration objectives, as well as the limitations of Natural Language Processing (NLP) tools relative to the specific knowledge structure of domains such as operations and maintenance. In this work, we discuss Knowledge Graph construction and break down the Knowledge Extraction process into its Named Entity Recognition, Coreference Resolution, Named Entity Linking, and Relation Extraction functional components. We then evaluate sixteen NLP tools in concert with or in comparison to the rapidly advancing capabilities of Large Language Models (LLMs). We focus on the operational and maintenance intelligence use case for trusted applications in the aircraft industry. A baseline dataset is derived from a rich public domain US Federal Aviation Administration dataset focused on equipment failures or maintenance requirements. We assess the zero-shot performance of NLP and LLM tools that can be operated within a controlled, confidential environment (no data is sent to third parties). Based on our observation of significant performance limitations, we discuss the challenges related to trusted NLP and LLM tools as well as their Technical Readiness Level for wider use in mission-critical industries such as aviation. We conclude with recommendations to enhance trust and provide our open-source curated dataset to support further baseline testing and evaluation.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100187"},"PeriodicalIF":0.0,"publicationDate":"2025-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145519628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-06DOI: 10.1016/j.nlp.2025.100182
Hoda Helmy , Ahmed Ibrahim , Maryam Arabi , Aamenah Sattar , Ahmed Serag
Unplanned readmissions to Intensive Care Units (ICUs) are associated with increased mortality, higher healthcare costs, and significant strain on limited medical resources. Accurate prediction of readmissions can improve patient outcomes and optimize resource allocation. This study investigates the use of large language models (LLMs) for ICU readmission prediction through both classification and explanation tasks. We compare a general-purpose model (Gemma2B) and a medical-domain model (Apollo2B), both open-source and fine-tuned for this task. The models were evaluated on their ability to classify readmission cases and generate clinically meaningful justifications. Gemma2B outperformed Apollo2B, achieving an AUC of 0.9, along with strong performance in explanatory outputs. Its ability to produce accurate, context-aware explanations without hallucinations underscores the value of fine-tuned general-purpose models in healthcare settings. These findings highlight the promise of interpretable LLMs in critical care and support their integration into clinical workflows to enhance patient safety and reduce the burden of unplanned ICU readmissions.
{"title":"Leveraging large language models to predict unplanned ICU readmissions from electronic health records","authors":"Hoda Helmy , Ahmed Ibrahim , Maryam Arabi , Aamenah Sattar , Ahmed Serag","doi":"10.1016/j.nlp.2025.100182","DOIUrl":"10.1016/j.nlp.2025.100182","url":null,"abstract":"<div><div>Unplanned readmissions to Intensive Care Units (ICUs) are associated with increased mortality, higher healthcare costs, and significant strain on limited medical resources. Accurate prediction of readmissions can improve patient outcomes and optimize resource allocation. This study investigates the use of large language models (LLMs) for ICU readmission prediction through both classification and explanation tasks. We compare a general-purpose model (Gemma2B) and a medical-domain model (Apollo2B), both open-source and fine-tuned for this task. The models were evaluated on their ability to classify readmission cases and generate clinically meaningful justifications. Gemma2B outperformed Apollo2B, achieving an AUC of 0.9, along with strong performance in explanatory outputs. Its ability to produce accurate, context-aware explanations without hallucinations underscores the value of fine-tuned general-purpose models in healthcare settings. These findings highlight the promise of interpretable LLMs in critical care and support their integration into clinical workflows to enhance patient safety and reduce the burden of unplanned ICU readmissions.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100182"},"PeriodicalIF":0.0,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145363596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}