Native vs Non-Native Language Prompting: A Comparative Analysis

Mohamed Bayan Kmainasi, Rakif Khan, Ali Ezzat Shahroor, Boushra Bendou, Maram Hasanain, Firoj Alam
{"title":"Native vs Non-Native Language Prompting: A Comparative Analysis","authors":"Mohamed Bayan Kmainasi, Rakif Khan, Ali Ezzat Shahroor, Boushra Bendou, Maram Hasanain, Firoj Alam","doi":"arxiv-2409.07054","DOIUrl":null,"url":null,"abstract":"Large language models (LLMs) have shown remarkable abilities in different\nfields, including standard Natural Language Processing (NLP) tasks. To elicit\nknowledge from LLMs, prompts play a key role, consisting of natural language\ninstructions. Most open and closed source LLMs are trained on available labeled\nand unlabeled resources--digital content such as text, images, audio, and\nvideos. Hence, these models have better knowledge for high-resourced languages\nbut struggle with low-resourced languages. Since prompts play a crucial role in\nunderstanding their capabilities, the language used for prompts remains an\nimportant research question. Although there has been significant research in\nthis area, it is still limited, and less has been explored for medium to\nlow-resourced languages. In this study, we investigate different prompting\nstrategies (native vs. non-native) on 11 different NLP tasks associated with 12\ndifferent Arabic datasets (9.7K data points). In total, we conducted 197\nexperiments involving 3 LLMs, 12 datasets, and 3 prompting strategies. Our\nfindings suggest that, on average, the non-native prompt performs the best,\nfollowed by mixed and native prompts.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) have shown remarkable abilities in different fields, including standard Natural Language Processing (NLP) tasks. To elicit knowledge from LLMs, prompts play a key role, consisting of natural language instructions. Most open and closed source LLMs are trained on available labeled and unlabeled resources--digital content such as text, images, audio, and videos. Hence, these models have better knowledge for high-resourced languages but struggle with low-resourced languages. Since prompts play a crucial role in understanding their capabilities, the language used for prompts remains an important research question. Although there has been significant research in this area, it is still limited, and less has been explored for medium to low-resourced languages. In this study, we investigate different prompting strategies (native vs. non-native) on 11 different NLP tasks associated with 12 different Arabic datasets (9.7K data points). In total, we conducted 197 experiments involving 3 LLMs, 12 datasets, and 3 prompting strategies. Our findings suggest that, on average, the non-native prompt performs the best, followed by mixed and native prompts.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
母语提示与非母语提示:比较分析
大型语言模型(LLM)在不同领域,包括标准的自然语言处理(NLP)任务中都表现出了非凡的能力。要从 LLMs 中获取知识,由自然语言指令组成的提示起着关键作用。大多数开放源代码和封闭源代码的 LLM 都是在有标签和无标签资源(如文本、图像、音频和视频等数字内容)上进行训练的。因此,这些模型对高资源语言有较好的了解,但对低资源语言则显得力不从心。由于提示在理解其能力方面起着至关重要的作用,因此提示所使用的语言仍然是一个重要的研究问题。尽管在这一领域已经有了大量的研究,但仍很有限,而且对中低资源语言的研究也较少。在本研究中,我们研究了与 12 个不同阿拉伯语数据集(9.7K 个数据点)相关的 11 个不同 NLP 任务中的不同提示策略(母语与非母语)。我们总共进行了 197 次实验,涉及 3 个 LLM、12 个数据集和 3 种提示策略。我们的研究结果表明,平均而言,非母语提示效果最好,其次是混合提示和母语提示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
LLMs + Persona-Plug = Personalized LLMs MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources Human-like Affective Cognition in Foundation Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1