{"title":"Retrieval-based End-to-End Tamil language Conversational Agent for Closed Domain using Machine Learning","authors":"Kumaran Kugathasan, Uthayasanker Thayasivam","doi":"10.1145/3508230.3508251","DOIUrl":null,"url":null,"abstract":"Businesses around the world have started to adopt text-based conversational agents to provide a great customer experience as an alternative to minimize expensive customer service agents. Coming up with a conversational agent is comparatively easier for businesses that serve customers who speak high resourced languages like English since there are enough and more paid as well as open-source chatbot frameworks available. But for a low resource language like Tamil, there is no such framework support. The approaches proposed in researches for building high resource language chatbots are not suitable for Tamil due to the lack of many language-related resources. This paper proposes a new approach for building a Tamil language conversational agent using the dataset scraped from the FAQ corpus and expanding it more to capture the morphological richness and high inflexional nature of the Tamil language. Each question is mapped to intent and a multiclass intent classifier was built to identify the intent of the user. CNN based classifier performed best with 98.72% accuracy.","PeriodicalId":252146,"journal":{"name":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 5th International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508230.3508251","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Businesses around the world have started to adopt text-based conversational agents to provide a great customer experience as an alternative to minimize expensive customer service agents. Coming up with a conversational agent is comparatively easier for businesses that serve customers who speak high resourced languages like English since there are enough and more paid as well as open-source chatbot frameworks available. But for a low resource language like Tamil, there is no such framework support. The approaches proposed in researches for building high resource language chatbots are not suitable for Tamil due to the lack of many language-related resources. This paper proposes a new approach for building a Tamil language conversational agent using the dataset scraped from the FAQ corpus and expanding it more to capture the morphological richness and high inflexional nature of the Tamil language. Each question is mapped to intent and a multiclass intent classifier was built to identify the intent of the user. CNN based classifier performed best with 98.72% accuracy.