基于seq2seq神经网络的海湾阿拉伯语会话代理

Tahani Alshareef, M. Siddiqui
{"title":"基于seq2seq神经网络的海湾阿拉伯语会话代理","authors":"Tahani Alshareef, M. Siddiqui","doi":"10.1109/ACIT50332.2020.9300059","DOIUrl":null,"url":null,"abstract":"A Conversational Agent (CA), or dialogue system, is a computer system that has the ability to respond to humans automatically using natural language. CAs offer instant responses and can concurrently assist a potentially unlimited number of users. The modeling of CAs in Arabic has so far received less attention when compared with other languages due to the complexity of the Arabic language, the existence of several dialects, and a lack of data resources. The literature contends that modeling a CA in Arabic mostly done using pattern-matching and information retrieval, employing classification approaches with a closed-domain data source. There is extremely limited research so far on modeling an open-domain CA in the Arabic dialect. This research has utilized a deep-learning architecture, known as the Seq2Seq neural network, to build a CA in the Arabic Gulf dialect. We formulated the CA problem as a machine translation problem and, therefore, built our corpus from tweets, in the post-reply format, to train and evaluate the model. We investigated the effects of pretrained embeddings on the performance of the CA. For our evaluation, a Bilingual Evaluation Understudy (BLEU) score and human evaluators were used. The performance of the model was found to be comparable with existing deep learning models that have been trained on much larger corpora and in other languages. Our results present a promising first step towards building an open-domain CA in the Gulf Arabic dialect.","PeriodicalId":193891,"journal":{"name":"2020 21st International Arab Conference on Information Technology (ACIT)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A seq2seq Neural Network based Conversational Agent for Gulf Arabic Dialect\",\"authors\":\"Tahani Alshareef, M. Siddiqui\",\"doi\":\"10.1109/ACIT50332.2020.9300059\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A Conversational Agent (CA), or dialogue system, is a computer system that has the ability to respond to humans automatically using natural language. CAs offer instant responses and can concurrently assist a potentially unlimited number of users. The modeling of CAs in Arabic has so far received less attention when compared with other languages due to the complexity of the Arabic language, the existence of several dialects, and a lack of data resources. The literature contends that modeling a CA in Arabic mostly done using pattern-matching and information retrieval, employing classification approaches with a closed-domain data source. There is extremely limited research so far on modeling an open-domain CA in the Arabic dialect. This research has utilized a deep-learning architecture, known as the Seq2Seq neural network, to build a CA in the Arabic Gulf dialect. We formulated the CA problem as a machine translation problem and, therefore, built our corpus from tweets, in the post-reply format, to train and evaluate the model. We investigated the effects of pretrained embeddings on the performance of the CA. For our evaluation, a Bilingual Evaluation Understudy (BLEU) score and human evaluators were used. The performance of the model was found to be comparable with existing deep learning models that have been trained on much larger corpora and in other languages. Our results present a promising first step towards building an open-domain CA in the Gulf Arabic dialect.\",\"PeriodicalId\":193891,\"journal\":{\"name\":\"2020 21st International Arab Conference on Information Technology (ACIT)\",\"volume\":\"101 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 21st International Arab Conference on Information Technology (ACIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACIT50332.2020.9300059\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 21st International Arab Conference on Information Technology (ACIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACIT50332.2020.9300059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

对话代理(CA)或对话系统是一种能够使用自然语言自动响应人类的计算机系统。ca提供即时响应,并且可以同时帮助可能无限数量的用户。由于阿拉伯语的复杂性、多种方言的存在以及数据资源的缺乏,迄今为止,与其他语言相比,阿拉伯语CAs的建模受到的关注较少。文献认为,阿拉伯语CA的建模主要使用模式匹配和信息检索,采用封闭域数据源的分类方法。到目前为止,对阿拉伯语方言开放域CA建模的研究非常有限。本研究利用一种称为Seq2Seq神经网络的深度学习架构来构建阿拉伯海湾方言的CA。我们将CA问题表述为机器翻译问题,因此,以回复后格式从tweet中构建语料库,以训练和评估模型。我们研究了预训练嵌入对CA性能的影响。在我们的评估中,使用了双语评估替补(BLEU)分数和人类评估员。研究发现,该模型的性能与现有的深度学习模型相当,这些模型已经在更大的语料库和其他语言上进行了训练。我们的结果为在海湾阿拉伯语方言中构建开放域CA提供了有希望的第一步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A seq2seq Neural Network based Conversational Agent for Gulf Arabic Dialect
A Conversational Agent (CA), or dialogue system, is a computer system that has the ability to respond to humans automatically using natural language. CAs offer instant responses and can concurrently assist a potentially unlimited number of users. The modeling of CAs in Arabic has so far received less attention when compared with other languages due to the complexity of the Arabic language, the existence of several dialects, and a lack of data resources. The literature contends that modeling a CA in Arabic mostly done using pattern-matching and information retrieval, employing classification approaches with a closed-domain data source. There is extremely limited research so far on modeling an open-domain CA in the Arabic dialect. This research has utilized a deep-learning architecture, known as the Seq2Seq neural network, to build a CA in the Arabic Gulf dialect. We formulated the CA problem as a machine translation problem and, therefore, built our corpus from tweets, in the post-reply format, to train and evaluate the model. We investigated the effects of pretrained embeddings on the performance of the CA. For our evaluation, a Bilingual Evaluation Understudy (BLEU) score and human evaluators were used. The performance of the model was found to be comparable with existing deep learning models that have been trained on much larger corpora and in other languages. Our results present a promising first step towards building an open-domain CA in the Gulf Arabic dialect.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Wireless Sensor Network MAC Energy - efficiency Protocols: A Survey Keystroke Identifier Using Fuzzy Logic to Increase Password Security A seq2seq Neural Network based Conversational Agent for Gulf Arabic Dialect Machine Learning and Soft Robotics Studying and Analyzing the Fog-based Internet of Robotic Things
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1