Natural Language Processing Journal最新文献

Demystifying chatgpt: How it masters genre recognition 揭开话题的神秘面纱：它如何掌握类型识别

Natural Language Processing Journal

Pub Date : 2026-01-25 DOI: 10.1016/j.nlp.2026.100198

Subham Raj , Sriparna Saha , Brijraj Singh , Niranjan Pedanekar

The emergence of ChatGPT has drawn considerable attention in the NLP community for its impressive performance across a wide range of language tasks. However, its effectiveness in multi-label movie genre prediction remains underexplored. This study evaluates the genre prediction capabilities of multiple Large Language Models (LLMs), including ChatGPT, using the MovieLens-100K dataset comprising 1682 movies spanning 18 genres. We investigate zero-shot and few-shot prompting strategies based on movie trailer transcripts and subtitles, where each movie may belong to multiple genres. Our results show that ChatGPT consistently outperforms earlier LLM baselines under both zero-shot and few-shot settings, while instruction fine-tuning further improves recall and overall predictive coverage. To explore multimodal extensions, we augment textual prompts with visual cues extracted from movie posters using a Vision-Language Model (VLM). While the incorporation of visual information yields selective, genre-dependent benefits–particularly improving recall for visually distinctive genres–the overall gains in aggregate performance metrics remain limited. Overall, our findings highlight the robustness of prompt-based and fine-tuned LLMs for genre prediction, and suggest that multimodal information can provide complementary signals in specific cases, motivating future work on tighter task-aligned vision-language integration.

ChatGPT的出现在NLP社区中引起了相当大的关注，因为它在广泛的语言任务中具有令人印象深刻的性能。然而，它在多厂牌电影类型预测中的有效性仍有待探索。本研究使用包含18个类型的1682部电影的MovieLens-100K数据集，评估了包括ChatGPT在内的多个大型语言模型（llm）的类型预测能力。我们研究了基于电影预告片和字幕的零镜头和少镜头提示策略，其中每部电影可能属于多个类型。我们的结果表明，ChatGPT在零投篮和少投篮设置下始终优于早期的LLM基线，而指令微调进一步提高了召回率和总体预测覆盖率。为了探索多模态扩展，我们使用视觉语言模型（VLM）从电影海报中提取视觉线索来增强文本提示。虽然视觉信息的结合产生了选择性的、依赖于类型的好处——特别是提高了对视觉上不同类型的回忆——但总体性能指标的总体收益仍然有限。总的来说，我们的研究结果强调了基于提示和微调的llm在类型预测方面的稳健性，并表明多模态信息可以在特定情况下提供互补信号，激励未来更紧密的任务一致视觉语言整合工作。

{"title":"Demystifying chatgpt: How it masters genre recognition","authors":"Subham Raj , Sriparna Saha , Brijraj Singh , Niranjan Pedanekar","doi":"10.1016/j.nlp.2026.100198","DOIUrl":"10.1016/j.nlp.2026.100198","url":null,"abstract":"<div><div>The emergence of ChatGPT has drawn considerable attention in the NLP community for its impressive performance across a wide range of language tasks. However, its effectiveness in multi-label movie genre prediction remains underexplored. This study evaluates the genre prediction capabilities of multiple Large Language Models (LLMs), including ChatGPT, using the MovieLens-100K dataset comprising 1682 movies spanning 18 genres. We investigate zero-shot and few-shot prompting strategies based on movie trailer transcripts and subtitles, where each movie may belong to multiple genres. Our results show that ChatGPT consistently outperforms earlier LLM baselines under both zero-shot and few-shot settings, while instruction fine-tuning further improves recall and overall predictive coverage. To explore multimodal extensions, we augment textual prompts with visual cues extracted from movie posters using a Vision-Language Model (VLM). While the incorporation of visual information yields selective, genre-dependent benefits–particularly improving recall for visually distinctive genres–the overall gains in aggregate performance metrics remain limited. Overall, our findings highlight the robustness of prompt-based and fine-tuned LLMs for genre prediction, and suggest that multimodal information can provide complementary signals in specific cases, motivating future work on tighter task-aligned vision-language integration.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"14 ","pages":"Article 100198"},"PeriodicalIF":0.0,"publicationDate":"2026-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fuse-MD: A culturally-aware multimodal model for detecting misogyny memes 融合- md：一个文化感知的多模态模型，用于检测厌女症模因

Natural Language Processing Journal

Pub Date : 2026-01-08 DOI: 10.1016/j.nlp.2026.100197

Rahul Ponnusamy , Saranya Rajiakodi , Bhuvaneswari Sivagnanam , Anshid Kizhakkeparambil , Dhruv Sharma , Paul Buitelaar , Bharathi Raja Chakravarthi

Warning: This study involves an analysis of meme content that may contain offensive or upsetting material, used only for research and illustrative purposes. The data presented does not represent the views or convictions of the authors or their associated organizations. The emergence of social media has transformed global communication, enabling cultural expression and the exchange of ideas across platforms such as Facebook, X, and Instagram. This openness has facilitated the dissemination of harmful content, including misogyny, which surprisingly appears in the form of memes, widely circulated online images embedded in text that convey humor or commentary. Misogynistic memes often mock or criticize women’s experiences, perpetuate negative stereotypes, reinforce gender discrimination, and promote violence. Detecting such content is particularly challenging in low-resource languages such as Tamil and Malayalam, due to the limited availability of linguistic resources and tools. This study introduces the Misogyny Detection Meme Dataset (MDMD), the first multimodal dataset specifically curated for detecting misogyny memes in Tamil and Malayalam. We further conducted a shared task using MDMD and established baselines for both languages. This study proposes the Fusion-based Multimodal Framework for Misogyny Meme Detection (Fuse-MD), which employs a transfer learning approach to identify misogynistic memes in low-resource languages. Comparative analysis shows that element-wise fusion performs best for Tamil, while gated fusion is optimal for Malayalam, highlighting the role of cultural and linguistic factors in fusion design. We also introduce a “Threshold Optimization Technique via Macro-F1 Calibration,” which calibrates prediction thresholds on the development set and helps balance the accuracy loss caused by quantization. Experimental results demonstrate that Fuse-MD outperforms the shared task baseline, providing a robust framework for detecting misogyny memes in low-resource languages and laying the foundation for future research.

警告：本研究涉及对可能包含冒犯性或令人不快的材料的模因内容的分析，仅用于研究和说明目的。所提供的数据不代表作者或其相关组织的观点或信念。社交媒体的出现改变了全球交流，使文化表达和思想交流得以在Facebook、X和Instagram等平台上进行。这种开放性促进了有害内容的传播，包括厌女症，它出人意料地以表情包的形式出现，即广泛传播的嵌入幽默或评论文字的网络图片。厌女表情包经常嘲笑或批评女性的经历，延续负面的刻板印象，强化性别歧视，并促进暴力。在泰米尔语和马拉雅拉姆语等资源匮乏的语言中，由于语言资源和工具的可用性有限，检测此类内容尤其具有挑战性。本研究介绍了厌女症检测模因数据集（MDMD），这是第一个专门用于检测泰米尔语和马拉雅拉姆语厌女症模因的多模态数据集。我们进一步使用MDMD进行了一项共享任务，并为两种语言建立了基线。本研究提出了基于融合的厌女模因检测多模态框架（Fuse-MD），该框架采用迁移学习方法识别低资源语言中的厌女模因。对比分析表明，元素型融合在泰米尔语中表现最佳，而门控型融合在马拉雅拉姆语中表现最佳，突出了文化和语言因素在融合设计中的作用。我们还介绍了一种“通过宏观f1校准的阈值优化技术”，该技术可以在开发集上校准预测阈值，并有助于平衡量化造成的精度损失。实验结果表明，Fuse-MD优于共享任务基线，为低资源语言中厌女模因的检测提供了一个稳健的框架，为未来的研究奠定了基础。

{"title":"Fuse-MD: A culturally-aware multimodal model for detecting misogyny memes","authors":"Rahul Ponnusamy , Saranya Rajiakodi , Bhuvaneswari Sivagnanam , Anshid Kizhakkeparambil , Dhruv Sharma , Paul Buitelaar , Bharathi Raja Chakravarthi","doi":"10.1016/j.nlp.2026.100197","DOIUrl":"10.1016/j.nlp.2026.100197","url":null,"abstract":"<div><div><strong>Warning:</strong> This study involves an analysis of meme content that may contain offensive or upsetting material, used only for research and illustrative purposes. The data presented does not represent the views or convictions of the authors or their associated organizations. The emergence of social media has transformed global communication, enabling cultural expression and the exchange of ideas across platforms such as Facebook, X, and Instagram. This openness has facilitated the dissemination of harmful content, including misogyny, which surprisingly appears in the form of memes, widely circulated online images embedded in text that convey humor or commentary. Misogynistic memes often mock or criticize women’s experiences, perpetuate negative stereotypes, reinforce gender discrimination, and promote violence. Detecting such content is particularly challenging in low-resource languages such as Tamil and Malayalam, due to the limited availability of linguistic resources and tools. This study introduces the Misogyny Detection Meme Dataset (MDMD), the first multimodal dataset specifically curated for detecting misogyny memes in Tamil and Malayalam. We further conducted a shared task using MDMD and established baselines for both languages. This study proposes the Fusion-based Multimodal Framework for Misogyny Meme Detection (Fuse-MD), which employs a transfer learning approach to identify misogynistic memes in low-resource languages. Comparative analysis shows that element-wise fusion performs best for Tamil, while gated fusion is optimal for Malayalam, highlighting the role of cultural and linguistic factors in fusion design. We also introduce a “Threshold Optimization Technique via Macro-F1 Calibration,” which calibrates prediction thresholds on the development set and helps balance the accuracy loss caused by quantization. Experimental results demonstrate that Fuse-MD outperforms the shared task baseline, providing a robust framework for detecting misogyny memes in low-resource languages and laying the foundation for future research.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"14 ","pages":"Article 100197"},"PeriodicalIF":0.0,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Uzbek language morphology analyser 乌兹别克语词法分析器

Natural Language Processing Journal

Pub Date : 2025-12-14 DOI: 10.1016/j.nlp.2025.100195

Nikita Murzintcev , Shakhlo Shukurlaevna Yuldasheva

Although Uzbek is the official language of Uzbekistan, it remains one of the low-resource languages. In this paper, we propose a morphology analyser based on the Hunspell library. Such a tool cannot be developed without a detailed description of the Uzbek language, so the main part of the paper is dedicated to studying Uzbek morphology in a way sufficient for building morphology-analysis and spell-checking software. A special emphasis is placed on listing all possible word forms that can be produced with inflectional affixes, accounting for irregular forms, and phonetic assimilation. The orthography of the language is provided in accordance with the latest reforms in the script system and supplemented with transliteration rules and recommendations for text normalization for computer processing. The produced lemmatizer is compatible with a wide range of existing software. The results of this project include a dictionary of Uzbek lemmas annotated with parts of speech tags.

虽然乌兹别克语是乌兹别克斯坦的官方语言，但它仍然是资源匮乏的语言之一。本文提出了一种基于Hunspell库的形态学分析器。如果没有乌兹别克语的详细描述，就无法开发这样的工具，因此本文的主要部分致力于研究乌兹别克语的形态学，以建立形态学分析和拼写检查软件。特别强调的是列出所有可能由屈折词缀产生的词形，考虑不规则形式和语音同化。该语言的正字法是根据文字系统的最新改革提供的，并补充了音译规则和供计算机处理的文本规范化建议。生产的lemmatizer与广泛的现有软件兼容。这个项目的成果包括一本乌兹别克语引词词典，用词性标签加以注释。

引用次数: 0

Evaluation of google translate for Mandarin Chinese translation using sentiment and semantic analysis 用情感和语义分析评价谷歌翻译对普通话翻译的影响

Natural Language Processing Journal

Pub Date : 2025-12-01 DOI: 10.1016/j.nlp.2025.100188

Xuechun Wang , Rodney Beard , Rohitash Chandra

Machine translation using large language models (LLMs) is having a significant global impact, making communication easier. Mandarin Chinese is the official language used for communication by the government and media in China. In this study, we provide an automated assessment of the translation quality of Google Translate with human experts using sentiment and semantic analysis. In order to demonstrate our framework, we select the classic early twentieth-century novel ’The True Story of Ah Q’ with selected Mandarin Chinese to English translations. We use Google Translate to translate the given text into English and then conduct a chapter-wise sentiment analysis and semantic analysis to compare the extracted sentiments across the different translations. Our results indicate that the precision of Google Translate differs both in terms of semantic and sentiment analysis when compared to human expert translations. We find that Google Translate is unable to translate some of the specific words or phrases in Chinese, such as Chinese traditional idiomatic expressions. The mistranslations may be due to a lack of contextual significance and historical knowledge of China.

使用大型语言模型（llm）的机器翻译正在产生重大的全球影响，使交流更容易。普通话是中国政府和媒体交流使用的官方语言。在这项研究中，我们与人类专家一起使用情感和语义分析对b谷歌翻译的翻译质量进行了自动评估。为了证明我们的框架，我们选择了20世纪早期的经典小说《阿Q正传》，并选择了中文翻译成英文。我们使用谷歌Translate将给定的文本翻译成英文，然后进行逐章情感分析和语义分析，比较不同翻译中提取的情感。我们的研究结果表明，谷歌翻译在语义和情感分析方面的精度与人类专家翻译相比有所不同。我们发现谷歌翻译无法翻译中文中的一些特定单词或短语，例如中文传统成语。这些误译可能是由于缺乏语境意义和对中国历史的了解。

{"title":"Evaluation of google translate for Mandarin Chinese translation using sentiment and semantic analysis","authors":"Xuechun Wang , Rodney Beard , Rohitash Chandra","doi":"10.1016/j.nlp.2025.100188","DOIUrl":"10.1016/j.nlp.2025.100188","url":null,"abstract":"<div><div>Machine translation using large language models (LLMs) is having a significant global impact, making communication easier. Mandarin Chinese is the official language used for communication by the government and media in China. In this study, we provide an automated assessment of the translation quality of Google Translate with human experts using sentiment and semantic analysis. In order to demonstrate our framework, we select the classic early twentieth-century novel ’The True Story of Ah Q’ with selected Mandarin Chinese to English translations. We use Google Translate to translate the given text into English and then conduct a chapter-wise sentiment analysis and semantic analysis to compare the extracted sentiments across the different translations. Our results indicate that the precision of Google Translate differs both in terms of semantic and sentiment analysis when compared to human expert translations. We find that Google Translate is unable to translate some of the specific words or phrases in Chinese, such as Chinese traditional idiomatic expressions. The mistranslations may be due to a lack of contextual significance and historical knowledge of China.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100188"},"PeriodicalIF":0.0,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bridging gaps in natural language processing for Yorùbá: A systematic review of a decade of progress and prospects 弥合Yorùbá自然语言处理的差距：十年来进展和前景的系统回顾

Natural Language Processing Journal

Pub Date : 2025-11-09 DOI: 10.1016/j.nlp.2025.100194

Toheeb Aduramomi Jimoh, Tabea De Wille, Nikola S. Nikolov

Natural Language Processing (NLP) is becoming a dominant subset of artificial intelligence as the need to help machines understand human language becomes indispensable. Several NLP applications are ubiquitous, partly due to the myriad datasets being churned out daily through mediums like social networking sites. However, the growing development has not been evident in most African languages due to the persisting resource limitations, among other issues. Yorùbá language, a tonal and morphologically rich African language, suffers a similar fate, resulting in limited NLP usage. To encourage further research towards improving this situation, this systematic literature review aims to comprehensively analyse studies addressing NLP development for Yorùbá, identifying challenges, resources, techniques, and applications. A well-defined search string from a structured protocol was employed to search, select, and analyse 105 primary studies between 2014 and 2024 from reputable databases. The review highlights the scarcity of annotated corpora, the limited availability of pre-trained language models (PLMs), and linguistic challenges like tonal complexity and diacritic dependency as significant obstacles. It also revealed the prominent techniques, including rule-based methods, statistical methods, deep learning, and transfer learning, which were implemented alongside datasets of Yorùbá speech corpora, among others. The findings reveal a growing body of multilingual and monolingual resources, even though the field is constrained by socio-cultural factors such as code-switching and the desertion of language for digital usage. This review synthesises existing research, providing a foundation for advancing NLP for Yorùbá and in African languages generally. It aims to guide future research by identifying gaps and opportunities, thereby contributing to the broader inclusion of Yorùbá and other under-resourced African languages in global NLP advancements.

随着帮助机器理解人类语言的需求变得不可或缺，自然语言处理（NLP）正在成为人工智能的一个主要子集。一些NLP应用程序无处不在，部分原因是每天通过社交网站等媒介大量产生数据集。然而，除其他问题外，由于持续的资源限制，大多数非洲语言的发展并不明显。Yorùbá语言，一种音调和形态丰富的非洲语言，遭受类似的命运，导致有限的NLP使用。为了鼓励进一步的研究以改善这种情况，本系统的文献综述旨在全面分析Yorùbá的NLP发展研究，确定挑战，资源，技术和应用。从结构化协议中使用定义良好的搜索字符串来搜索、选择和分析2014年至2024年间来自知名数据库的105项主要研究。该综述强调了标注语料库的稀缺性，预训练语言模型（PLMs）的有限可用性，以及音调复杂性和变音符依赖等语言挑战是重大障碍。它还揭示了突出的技术，包括基于规则的方法、统计方法、深度学习和迁移学习，这些技术与Yorùbá语音语料库等数据集一起实现。研究结果显示，尽管该领域受到社会文化因素（如代码转换和为数字使用而放弃语言）的限制，但多语言和单语言资源的数量仍在不断增长。本综述综合了现有的研究，为推进Yorùbá和非洲语言的自然语言处理提供了基础。它旨在通过确定差距和机会来指导未来的研究，从而促进将Yorùbá和其他资源不足的非洲语言更广泛地纳入全球自然语言处理进程。

{"title":"Bridging gaps in natural language processing for Yorùbá: A systematic review of a decade of progress and prospects","authors":"Toheeb Aduramomi Jimoh, Tabea De Wille, Nikola S. Nikolov","doi":"10.1016/j.nlp.2025.100194","DOIUrl":"10.1016/j.nlp.2025.100194","url":null,"abstract":"<div><div>Natural Language Processing (NLP) is becoming a dominant subset of artificial intelligence as the need to help machines understand human language becomes indispensable. Several NLP applications are ubiquitous, partly due to the myriad datasets being churned out daily through mediums like social networking sites. However, the growing development has not been evident in most African languages due to the persisting resource limitations, among other issues. Yorùbá language, a tonal and morphologically rich African language, suffers a similar fate, resulting in limited NLP usage. To encourage further research towards improving this situation, this systematic literature review aims to comprehensively analyse studies addressing NLP development for Yorùbá, identifying challenges, resources, techniques, and applications. A well-defined search string from a structured protocol was employed to search, select, and analyse 105 primary studies between 2014 and 2024 from reputable databases. The review highlights the scarcity of annotated corpora, the limited availability of pre-trained language models (PLMs), and linguistic challenges like tonal complexity and diacritic dependency as significant obstacles. It also revealed the prominent techniques, including rule-based methods, statistical methods, deep learning, and transfer learning, which were implemented alongside datasets of Yorùbá speech corpora, among others. The findings reveal a growing body of multilingual and monolingual resources, even though the field is constrained by socio-cultural factors such as code-switching and the desertion of language for digital usage. This review synthesises existing research, providing a foundation for advancing NLP for Yorùbá and in African languages generally. It aims to guide future research by identifying gaps and opportunities, thereby contributing to the broader inclusion of Yorùbá and other under-resourced African languages in global NLP advancements.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100194"},"PeriodicalIF":0.0,"publicationDate":"2025-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Llama3SP: A resource-Efficient large language model for agile story point estimation Llama3SP：用于敏捷故事点估计的资源高效的大型语言模型

Natural Language Processing Journal

Pub Date : 2025-11-08 DOI: 10.1016/j.nlp.2025.100189

Juan Camilo Sepúlveda Montoya , Nicole Tatiana Ríos Gómez , José A. Jaramillo Villegas

Effort estimation remains a major challenge in Agile software development. Inaccurate story point forecasts can lead to budget overruns, schedule delays, and diminished stakeholder trust. Widely used approaches, such as story point estimation, are helpful for planning but rely heavily on subjective human judgment, making them prone to inconsistency and bias. Prior efforts applying machine learning and natural language processing (e.g. Deep-SE, GPT2SP) to automate story point prediction have achieved only limited success, often suffering from accuracy issues, poor cross-project adaptability, and high computational costs. To address these challenges, we introduce Llama3SP, which fine-tunes Meta’s LLaMA 3.2 language model using QLoRA, a resource-efficient adaptation technique. This combination enables training of a high-performance model on standard GPUs without sacrificing prediction quality. Experiments show that Llama3SP provides precise and consistent story point estimates, outperforming or matching previous models like GPT2SP and other comparably sized alternatives, all while operating under significantly lower hardware constraints. These findings highlight how combining advanced NLP models with efficient training techniques can make accurate effort estimation more accessible and practical for agile teams.

工作量估算仍然是敏捷软件开发中的一个主要挑战。不准确的故事点预测会导致预算超支、进度延迟和减少涉众的信任。广泛使用的方法，如故事点估计，对计划有帮助，但严重依赖于主观的人类判断，使它们容易产生不一致和偏见。先前应用机器学习和自然语言处理（例如Deep-SE， GPT2SP）来自动化故事点预测的努力只取得了有限的成功，经常受到准确性问题、跨项目适应性差和高计算成本的困扰。为了应对这些挑战，我们引入了Llama3SP，它使用QLoRA（一种资源高效的自适应技术）对Meta的LLaMA 3.2语言模型进行微调。这种组合可以在不牺牲预测质量的情况下在标准gpu上训练高性能模型。实验表明，Llama3SP提供了精确和一致的故事点估计，优于或匹配以前的模型，如GPT2SP和其他相当大小的替代方案，同时在更低的硬件约束下运行。这些发现强调了如何将先进的NLP模型与有效的培训技术相结合，使敏捷团队更容易获得和实用准确的工作量估计。

{"title":"Llama3SP: A resource-Efficient large language model for agile story point estimation","authors":"Juan Camilo Sepúlveda Montoya , Nicole Tatiana Ríos Gómez , José A. Jaramillo Villegas","doi":"10.1016/j.nlp.2025.100189","DOIUrl":"10.1016/j.nlp.2025.100189","url":null,"abstract":"<div><div>Effort estimation remains a major challenge in Agile software development. Inaccurate story point forecasts can lead to budget overruns, schedule delays, and diminished stakeholder trust. Widely used approaches, such as story point estimation, are helpful for planning but rely heavily on subjective human judgment, making them prone to inconsistency and bias. Prior efforts applying machine learning and natural language processing (e.g. Deep-SE, GPT2SP) to automate story point prediction have achieved only limited success, often suffering from accuracy issues, poor cross-project adaptability, and high computational costs. To address these challenges, we introduce <strong>Llama3SP</strong>, which fine-tunes Meta’s LLaMA 3.2 language model using QLoRA, a resource-efficient adaptation technique. This combination enables training of a high-performance model on standard GPUs without sacrificing prediction quality. Experiments show that Llama3SP provides precise and consistent story point estimates, outperforming or matching previous models like GPT2SP and other comparably sized alternatives, all while operating under significantly lower hardware constraints. These findings highlight how combining advanced NLP models with efficient training techniques can make accurate effort estimation more accessible and practical for agile teams.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100189"},"PeriodicalIF":0.0,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145519630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A systematic review of figurative language detection: Methods, challenges, and multilingual perspectives 比喻语言检测的系统回顾：方法、挑战和多语言视角

Natural Language Processing Journal

Pub Date : 2025-11-04 DOI: 10.1016/j.nlp.2025.100192

Zouheir Banou, Sanaa El Filali, El Habib Benlahmar, Fatima-Zahra Alaoui, Laila El Jiani, Hasnae Sakhi

Figurative language detection has emerged as a critical task in natural language processing (NLP), enabling machines to comprehend non-literal expressions such as metaphor, irony, and sarcasm. This study presents a systematic literature review with a multilevel analytical framework, examining figurative language across lexical, syntactic, semantic, discourse, and pragmatic levels. We investigate the interplay between feature engineering, model architectures, and annotation strategies across different languages, analyzing datasets, linguistic resources, and evaluation metrics. Special attention is given to morphologically rich and low-resource languages, where deep learning dominates but rule-based and hybrid approaches remain relevant. Our findings indicate that deep learning models–especially transformer-based architectures like BERT and RoBERTa–consistently outperform other approaches, particularly in semantic and discourse-level tasks, due to their ability to capture context-rich and abstract patterns. However, these models often lack interpretability, raising concerns about transparency. Additional challenges include inconsistencies in annotation practices, class imbalance between figurative and literal instances, and limited data coverage for under-resourced languages. The absence of standardized evaluation metrics further complicates cross-study comparison, especially when diverse figurative language styles are involved. By structuring our analysis through linguistic and computational dimensions, this review aims to facilitate the development of more robust, inclusive, and explainable figurative language detection systems.

比喻语言检测已经成为自然语言处理（NLP）中的一项关键任务，使机器能够理解非文字表达，如隐喻、反讽和讽刺。本研究采用多层次的分析框架，对比喻语言在词汇、句法、语义、语篇和语用等层面进行了系统的文献综述。我们研究了跨不同语言的特征工程、模型架构和注释策略之间的相互作用，分析了数据集、语言资源和评估指标。特别关注形态丰富和低资源的语言，其中深度学习占主导地位，但基于规则的方法和混合方法仍然相关。我们的研究结果表明，深度学习模型——尤其是像BERT和roberta这样基于转换的架构——一直优于其他方法，特别是在语义和话语级任务中，因为它们能够捕获丰富的上下文和抽象模式。然而，这些模型往往缺乏可解释性，引发了对透明度的担忧。其他的挑战包括注释实践中的不一致、形象化实例和文字实例之间的类不平衡，以及资源不足语言的有限数据覆盖。缺乏标准化的评估指标进一步使交叉研究比较复杂化，特别是当涉及到不同的比喻语言风格时。通过语言和计算维度构建我们的分析，本综述旨在促进更稳健、包容和可解释的比喻语言检测系统的发展。

{"title":"A systematic review of figurative language detection: Methods, challenges, and multilingual perspectives","authors":"Zouheir Banou, Sanaa El Filali, El Habib Benlahmar, Fatima-Zahra Alaoui, Laila El Jiani, Hasnae Sakhi","doi":"10.1016/j.nlp.2025.100192","DOIUrl":"10.1016/j.nlp.2025.100192","url":null,"abstract":"<div><div>Figurative language detection has emerged as a critical task in natural language processing (NLP), enabling machines to comprehend non-literal expressions such as metaphor, irony, and sarcasm. This study presents a systematic literature review with a multilevel analytical framework, examining figurative language across lexical, syntactic, semantic, discourse, and pragmatic levels. We investigate the interplay between feature engineering, model architectures, and annotation strategies across different languages, analyzing datasets, linguistic resources, and evaluation metrics. Special attention is given to morphologically rich and low-resource languages, where deep learning dominates but rule-based and hybrid approaches remain relevant. Our findings indicate that deep learning models–especially transformer-based architectures like BERT and RoBERTa–consistently outperform other approaches, particularly in semantic and discourse-level tasks, due to their ability to capture context-rich and abstract patterns. However, these models often lack interpretability, raising concerns about transparency. Additional challenges include inconsistencies in annotation practices, class imbalance between figurative and literal instances, and limited data coverage for under-resourced languages. The absence of standardized evaluation metrics further complicates cross-study comparison, especially when diverse figurative language styles are involved. By structuring our analysis through linguistic and computational dimensions, this review aims to facilitate the development of more robust, inclusive, and explainable figurative language detection systems.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100192"},"PeriodicalIF":0.0,"publicationDate":"2025-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Research on the methodology of personalized recommender systems based on multimodal knowledge graphs 基于多模态知识图的个性化推荐系统方法研究

Natural Language Processing Journal

Pub Date : 2025-11-01 DOI: 10.1016/j.nlp.2025.100193

Shaowu Bao , Jiajia Wang

The exponential increase in learning materials has occasioned a greater need for personalized learning experiences, yet conventional unimodal recommender systems are not effective in addressing students' diversified demands.In this research, a personalized recommendation system is advocated, backed by a multimodal knowledge graph that consolidates text, image, and video knowledge to improve accuracy, interpretability, and adaptability.The system uses different algorithms for relationship and entity extractions and includes a graph attention module with hierarchical subgraphs to build a semantic network among "Knowledge Points–Students–Resources." A dual-path embedding module that fuses Node2vec for structural semantics with LSTM for learning temporal behavior provides explainable recommendations using path confidence.Experimental results demonstrate that the model's core performance comprehensively outperforms traditional methods and newly introduced comparison models: Entity alignment accuracy (Hits@10=62.7 %) improved by 13.4 % over traditional Node2vec, 6.8 % over KGAT, and 4.2 % over M3KGR; cross-modal similarity (0.76) increased by 11.8 % over traditional Node2vec and 5.6 % over M3KGR. Learning engagement (effective duration 65 %, completion rate 78 %) and knowledge acquisition efficiency (coverage 67 %, cycle reduction 30 %) are significantly optimized, improving by 8.3 %-11.4 % and 8.1 %-20 % over M3KGR respectively; Achieved an explainability score of 4.3 (34.4 %-104.8 % improvement over traditional methods, 22.9 % improvement over KGAT, and 13.2 % improvement over M3KGR), with a response time of 98 ms (40.6 % reduction compared to KGAT and 25.8 % reduction compared to M3KGR).This indicates that multimodal knowledge graph significantly improves recommendation performance through structured semantics and dynamic fusion,providing a new path for personalized education.

学习材料的指数级增长引发了对个性化学习体验的更大需求，然而传统的单模推荐系统并不能有效地满足学生的多样化需求。本研究提倡个性化推荐系统，以多模态知识图为支撑，整合文本、图像和视频知识，提高推荐的准确性、可解释性和适应性。该系统采用不同的算法对关系和实体进行提取，并包括一个带有分层子图的图形关注模块，构建了“知识点-学生-资源”之间的语义网络。双路径嵌入模块融合了用于结构语义的Node2vec和用于学习时态行为的LSTM，使用路径置信度提供了可解释的建议。实验结果表明，该模型的核心性能全面优于传统方法和新引入的比较模型：实体对准精度（Hits@10= 62.7%）比传统Node2vec提高13.4%，比KGAT提高6.8%，比M3KGR提高4.2%；跨模态相似性（0.76）比传统Node2vec增加11.8%，比M3KGR增加5.6%。学习参与（有效持续时间65%，完成率78%）和知识获取效率（覆盖率67%，周期减少30%）显著优化，分别比M3KGR提高8.3% - 11.4%和8.1% - 20%；可解释性得分为4.3分（比传统方法提高34.4% - 104.8%，比KGAT提高22.9%，比M3KGR提高13.2%），响应时间为98 ms（比KGAT减少40.6%，比M3KGR减少25.8%）。这表明多模态知识图通过结构化语义和动态融合显著提高了推荐性能，为个性化教育提供了新的路径。

{"title":"Research on the methodology of personalized recommender systems based on multimodal knowledge graphs","authors":"Shaowu Bao , Jiajia Wang","doi":"10.1016/j.nlp.2025.100193","DOIUrl":"10.1016/j.nlp.2025.100193","url":null,"abstract":"<div><div>The exponential increase in learning materials has occasioned a greater need for personalized learning experiences, yet conventional unimodal recommender systems are not effective in addressing students' diversified demands.In this research, a personalized recommendation system is advocated, backed by a multimodal knowledge graph that consolidates text, image, and video knowledge to improve accuracy, interpretability, and adaptability.The system uses different algorithms for relationship and entity extractions and includes a graph attention module with hierarchical subgraphs to build a semantic network among \"Knowledge Points–Students–Resources.\" A dual-path embedding module that fuses Node2vec for structural semantics with LSTM for learning temporal behavior provides explainable recommendations using path confidence.Experimental results demonstrate that the model's core performance comprehensively outperforms traditional methods and newly introduced comparison models: Entity alignment accuracy (Hits@10=62.7 %) improved by 13.4 % over traditional Node2vec, 6.8 % over KGAT, and 4.2 % over M3KGR; cross-modal similarity (0.76) increased by 11.8 % over traditional Node2vec and 5.6 % over M3KGR. Learning engagement (effective duration 65 %, completion rate 78 %) and knowledge acquisition efficiency (coverage 67 %, cycle reduction 30 %) are significantly optimized, improving by 8.3 %-11.4 % and 8.1 %-20 % over M3KGR respectively; Achieved an explainability score of 4.3 (34.4 %-104.8 % improvement over traditional methods, 22.9 % improvement over KGAT, and 13.2 % improvement over M3KGR), with a response time of 98 ms (40.6 % reduction compared to KGAT and 25.8 % reduction compared to M3KGR).This indicates that multimodal knowledge graph significantly improves recommendation performance through structured semantics and dynamic fusion,providing a new path for personalized education.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100193"},"PeriodicalIF":0.0,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145519629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multi-class cyberbullying classification on image and text in code-mixed Bangla-English social media content 代码混合孟加拉-英语社交媒体内容中图像和文本的多类网络欺凌分类

Natural Language Processing Journal

Pub Date : 2025-10-30 DOI: 10.1016/j.nlp.2025.100191

Animesh Chandra Roy , Tanvir Mahmud , Tahlil Abrar

Social media platforms like Facebook, Instagram, and Twitter are widely used; users frequently share their daily lives by uploading pictures, posts, and videos, which gain significant popularity. However, social media posts often receive a mix of reactions, ranging from positive to negative, and in some instances, negative comments escalate into cyberbullying. Numerous studies have addressed this issue by focusing on cyberbullying classification, primarily through binary classification using multimodal data or targeting either text or image data. This study investigates the identification of multi-class images like No-bullying, Religious, Sexual, and Others using the deep learning pre-trained model MobileNetV2 to detect multiple image labels and achieved an F1-score of 0.86. For categorizing hate comments, we consider multiple classes, including Not Hate, Slang, Sexual, Racial, and Religious-related content. Extensive experiments were conducted on a novel Bengali-English code-mixed dataset, utilizing a combination of advanced transformer models, traditional machine learning techniques, and deep learning approaches to detect multiple hate comment labels. Bangla BERT achieved the highest F1-score of 0.79, followed closely by SVM at 0.78 and BiLSTM with attention at 0.73. These findings underscore the effectiveness of these models in capturing the complexities of code-mixed Bengali-English, offering valuable insights into cyberbullying detection in diverse linguistic contexts. This research contributes essential strategies for improving online safety and fostering respectful digital interactions.

Facebook、Instagram和Twitter等社交媒体平台被广泛使用；用户通过上传图片、帖子、视频等方式，频繁地分享自己的日常生活，获得了极大的人气。然而，社交媒体上的帖子通常会收到褒贬不一的反应，在某些情况下，负面评论会升级为网络欺凌。许多研究通过关注网络欺凌分类来解决这个问题，主要是通过使用多模态数据或针对文本或图像数据的二元分类。本研究使用深度学习预训练模型MobileNetV2检测多个图像标签，研究了No-bullying、Religious、Sexual和Others等多类别图像的识别，并获得了f1 -得分0.86。为了对仇恨评论进行分类，我们考虑了多个类别，包括非仇恨、俚语、性、种族和宗教相关内容。在一个新的孟加拉语-英语代码混合数据集上进行了广泛的实验，利用先进的变压器模型、传统机器学习技术和深度学习方法的组合来检测多个仇恨评论标签。孟加拉语BERT的f1得分最高，为0.79，紧随其后的是SVM，为0.78，BiLSTM的关注度为0.73。这些发现强调了这些模型在捕捉代码混合孟加拉语-英语的复杂性方面的有效性，为在不同语言背景下检测网络欺凌提供了有价值的见解。这项研究为提高网络安全和促进相互尊重的数字互动提供了必要的策略。

{"title":"A multi-class cyberbullying classification on image and text in code-mixed Bangla-English social media content","authors":"Animesh Chandra Roy , Tanvir Mahmud , Tahlil Abrar","doi":"10.1016/j.nlp.2025.100191","DOIUrl":"10.1016/j.nlp.2025.100191","url":null,"abstract":"<div><div>Social media platforms like Facebook, Instagram, and Twitter are widely used; users frequently share their daily lives by uploading pictures, posts, and videos, which gain significant popularity. However, social media posts often receive a mix of reactions, ranging from positive to negative, and in some instances, negative comments escalate into cyberbullying. Numerous studies have addressed this issue by focusing on cyberbullying classification, primarily through binary classification using multimodal data or targeting either text or image data. This study investigates the identification of multi-class images like No-bullying, Religious, Sexual, and Others using the deep learning pre-trained model MobileNetV2 to detect multiple image labels and achieved an F1-score of 0.86. For categorizing hate comments, we consider multiple classes, including Not Hate, Slang, Sexual, Racial, and Religious-related content. Extensive experiments were conducted on a novel Bengali-English code-mixed dataset, utilizing a combination of advanced transformer models, traditional machine learning techniques, and deep learning approaches to detect multiple hate comment labels. Bangla BERT achieved the highest F1-score of 0.79, followed closely by SVM at 0.78 and BiLSTM with attention at 0.73. These findings underscore the effectiveness of these models in capturing the complexities of code-mixed Bengali-English, offering valuable insights into cyberbullying detection in diverse linguistic contexts. This research contributes essential strategies for improving online safety and fostering respectful digital interactions.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100191"},"PeriodicalIF":0.0,"publicationDate":"2025-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BERT-KAN: Enhancing bilingual sentiment analysis in bangladeshi E-commerce through fine-tuned large language models BERT-KAN：通过微调大型语言模型，加强孟加拉电子商务的双语情感分析

Natural Language Processing Journal

Pub Date : 2025-10-29 DOI: 10.1016/j.nlp.2025.100190

Mohammad Rifat Ahmmad Rashid, Aritra Das, Kazi Ferdous Hasan, Md. Rakibul Hasan, Mithila Sultana, Mahamudul Hasan, Raihan Ul Islam, Rashedul Amin Tuhin, M. Saddam Hossain Khan

Sentiment analysis of code-mixed reviews poses unique challenges due to linguistic variability and contex- tual ambiguity, particularly in multilingual e-commerce environments. In this paper, we introduce BERT- KAN, a novel hybrid architecture that enhances bilingual sentiment analysis in Bangladeshi e-commerce by integrating the deep contextual representations of Bidirectional Encoder Representations from Transform- ers(BERT) with a Kolmogorov-Arnold Network (KAN) layer. The KAN component employs a polynomial expansion to capture complex non-linear relationships within code-mixed Bengali-English text, while an innovative polynomial attention mechanism further refines feature extraction. Extensive ablation studies were conducted on two base models—bert-base-multilingual-uncased and BanglaBERT—using polynomial degrees of 2 and 3. Notably, the best configuration for bert-base-multilingual-uncased (employing KAN, polynomial attention, and feature fusion with polynomial degree 2) achieved a precision of 95.3 %, recall of 97.0 %, and an F1-score of 96.1 %. Comparable performance was observed for polynomial degree 3 (precision 96.2 %, recall 95.8 %, and F1-score 96.0 %), while cross-validation experiments yielded average accuracies exceeding 90 % across multiple folds. Detailed error analyses, supported by confusion matrices and sam- ple predictions, as well as discussions on computational requirements and deployment challenges, further validate the robustness of our approach.

由于语言的可变性和上下文的模糊性，特别是在多语言的电子商务环境中，代码混合评论的情感分析提出了独特的挑战。在本文中，我们介绍了BERT- KAN，这是一种新的混合架构，通过将来自Transform- ers（BERT）的双向编码器表示的深度上下文表示与Kolmogorov-Arnold网络（KAN）层集成，增强了孟加拉国电子商务中的双语情感分析。KAN组件采用多项式展开来捕获代码混合的孟加拉语-英语文本中复杂的非线性关系，而创新的多项式关注机制进一步改进了特征提取。使用多项式度为2和3的两个基本模型（bert-base-multilingual-uncase和banglabert）进行了广泛的消融研究。值得注意的是，bert-base- multilinguo -uncase的最佳配置（使用KAN、多项式注意和多项式度2的特征融合）达到了95.3%的准确率、97.0%的召回率和96.1%的f1分数。在多项式度为3的情况下（准确率为96.2%，召回率为95.8%，f1评分为96.0%），交叉验证实验的平均准确率超过90%。详细的误差分析，由混淆矩阵和样本预测支持，以及对计算需求和部署挑战的讨论，进一步验证了我们方法的鲁棒性。

{"title":"BERT-KAN: Enhancing bilingual sentiment analysis in bangladeshi E-commerce through fine-tuned large language models","authors":"Mohammad Rifat Ahmmad Rashid, Aritra Das, Kazi Ferdous Hasan, Md. Rakibul Hasan, Mithila Sultana, Mahamudul Hasan, Raihan Ul Islam, Rashedul Amin Tuhin, M. Saddam Hossain Khan","doi":"10.1016/j.nlp.2025.100190","DOIUrl":"10.1016/j.nlp.2025.100190","url":null,"abstract":"<div><div>Sentiment analysis of code-mixed reviews poses unique challenges due to linguistic variability and contex- tual ambiguity, particularly in multilingual e-commerce environments. In this paper, we introduce BERT- KAN, a novel hybrid architecture that enhances bilingual sentiment analysis in Bangladeshi e-commerce by integrating the deep contextual representations of Bidirectional Encoder Representations from Transform- ers(BERT) with a Kolmogorov-Arnold Network (KAN) layer. The KAN component employs a polynomial expansion to capture complex non-linear relationships within code-mixed Bengali-English text, while an innovative polynomial attention mechanism further refines feature extraction. Extensive ablation studies were conducted on two base models—bert-base-multilingual-uncased and BanglaBERT—using polynomial degrees of 2 and 3. Notably, the best configuration for bert-base-multilingual-uncased (employing KAN, polynomial attention, and feature fusion with polynomial degree 2) achieved a precision of 95.3 %, recall of 97.0 %, and an F1-score of 96.1 %. Comparable performance was observed for polynomial degree 3 (precision 96.2 %, recall 95.8 %, and F1-score 96.0 %), while cross-validation experiments yielded average accuracies exceeding 90 % across multiple folds. Detailed error analyses, supported by confusion matrices and sam- ple predictions, as well as discussions on computational requirements and deployment challenges, further validate the robustness of our approach.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"13 ","pages":"Article 100190"},"PeriodicalIF":0.0,"publicationDate":"2025-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145466719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0