使用机器学习的基于检索的端到端泰米尔语封闭域会话代理

Kumaran Kugathasan, Uthayasanker Thayasivam
{"title":"使用机器学习的基于检索的端到端泰米尔语封闭域会话代理","authors":"Kumaran Kugathasan, Uthayasanker Thayasivam","doi":"10.1145/3508230.3508251","DOIUrl":null,"url":null,"abstract":"Businesses around the world have started to adopt text-based conversational agents to provide a great customer experience as an alternative to minimize expensive customer service agents. Coming up with a conversational agent is comparatively easier for businesses that serve customers who speak high resourced languages like English since there are enough and more paid as well as open-source chatbot frameworks available. But for a low resource language like Tamil, there is no such framework support. The approaches proposed in researches for building high resource language chatbots are not suitable for Tamil due to the lack of many language-related resources. This paper proposes a new approach for building a Tamil language conversational agent using the dataset scraped from the FAQ corpus and expanding it more to capture the morphological richness and high inflexional nature of the Tamil language. Each question is mapped to intent and a multiclass intent classifier was built to identify the intent of the user. CNN based classifier performed best with 98.72% accuracy.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Retrieval-based End-to-End Tamil language Conversational Agent for Closed Domain using Machine Learning\",\"authors\":\"Kumaran Kugathasan, Uthayasanker Thayasivam\",\"doi\":\"10.1145/3508230.3508251\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Businesses around the world have started to adopt text-based conversational agents to provide a great customer experience as an alternative to minimize expensive customer service agents. Coming up with a conversational agent is comparatively easier for businesses that serve customers who speak high resourced languages like English since there are enough and more paid as well as open-source chatbot frameworks available. But for a low resource language like Tamil, there is no such framework support. The approaches proposed in researches for building high resource language chatbots are not suitable for Tamil due to the lack of many language-related resources. This paper proposes a new approach for building a Tamil language conversational agent using the dataset scraped from the FAQ corpus and expanding it more to capture the morphological richness and high inflexional nature of the Tamil language. Each question is mapped to intent and a multiclass intent classifier was built to identify the intent of the user. CNN based classifier performed best with 98.72% accuracy.\",\"PeriodicalId\":252146,\"journal\":{\"name\":\"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3508230.3508251\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508230.3508251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

世界各地的企业已经开始采用基于文本的会话代理来提供出色的客户体验,作为最小化昂贵的客户服务代理的替代方案。对于那些为讲英语等资源丰富的语言的客户提供服务的企业来说,想出一个对话代理相对容易,因为有足够多的、更多的付费和开源聊天机器人框架可用。但是对于像泰米尔语这样的低资源语言,没有这样的框架支持。由于缺乏大量的语言相关资源,研究中提出的构建高资源语言聊天机器人的方法并不适合泰米尔语。本文提出了一种利用从FAQ语料库中抓取的数据集构建泰米尔语会话代理的新方法,并对其进行扩展,以捕获泰米尔语的形态丰富性和高度非弹性特性。每个问题都映射到意图,并建立了一个多类意图分类器来识别用户的意图。基于CNN的分类器表现最好,准确率为98.72%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Retrieval-based End-to-End Tamil language Conversational Agent for Closed Domain using Machine Learning
Businesses around the world have started to adopt text-based conversational agents to provide a great customer experience as an alternative to minimize expensive customer service agents. Coming up with a conversational agent is comparatively easier for businesses that serve customers who speak high resourced languages like English since there are enough and more paid as well as open-source chatbot frameworks available. But for a low resource language like Tamil, there is no such framework support. The approaches proposed in researches for building high resource language chatbots are not suitable for Tamil due to the lack of many language-related resources. This paper proposes a new approach for building a Tamil language conversational agent using the dataset scraped from the FAQ corpus and expanding it more to capture the morphological richness and high inflexional nature of the Tamil language. Each question is mapped to intent and a multiclass intent classifier was built to identify the intent of the user. CNN based classifier performed best with 98.72% accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Pandemic rumor identification on social networking sites: A case study of COVID-19 Research on Domain Emotion Dictionary Construction Method based on Improved SO-PMI Algorithm Topic Segmentation for Interview Dialogue System Method of Graphical User Interface Adaptation Using Reinforcement Learning and Automated Testing Prediction of Number of Likes and Retweets based on the Features of Tweet Text and Images
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1