A Natural Language Understanding Model COVID-19 based for chatbots

2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE) Pub Date : 2021-10-25 DOI:10.1109/BIBE52308.2021.9635248

Valmir Ferreira dos Santos Junior, João Araújo Castelo Branco, Marcos Antonio de Oliveira, T. C. D. Silva, L. A. Cruz, R. P. Magalhães

{"title":"A Natural Language Understanding Model COVID-19 based for chatbots","authors":"Valmir Ferreira dos Santos Junior, João Araújo Castelo Branco, Marcos Antonio de Oliveira, T. C. D. Silva, L. A. Cruz, R. P. Magalhães","doi":"10.1109/BIBE52308.2021.9635248","DOIUrl":null,"url":null,"abstract":"It is increasingly common to use chatbots as an interface to use services. Making this experience more humanized requires the chatbot to understand natural language and express itself using natural language. One crucial step to achieve this is to label the data with intentions and entities. After labeling, one can use the labeled data to train a Natural Language Understanding (NLU) component. The NLU component interprets the text extracting the intentions and entities present in that text. Manually label the data is an onerous and impracticable process due to the high volume of data. Thus, an unsupervised machine learning technique, such as data clustering, is usually used to find patterns in the data and thereby label them. For this task, it is essential to have an effective vector embedding representation of texts that depicts the semantic information and helps the machine understand the context, intent, and other nuances of the entire text. In this paper, we perform an extensive evaluation of different text embedding models for clustering, labeling, and training an NLU model using the text of attendances from the Coronavirus Platform Service of Ceará, Brazil. We also show how different text embeddings result in different clustering, thus capturing different intentions of patients.","PeriodicalId":343724,"journal":{"name":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE52308.2021.9635248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

It is increasingly common to use chatbots as an interface to use services. Making this experience more humanized requires the chatbot to understand natural language and express itself using natural language. One crucial step to achieve this is to label the data with intentions and entities. After labeling, one can use the labeled data to train a Natural Language Understanding (NLU) component. The NLU component interprets the text extracting the intentions and entities present in that text. Manually label the data is an onerous and impracticable process due to the high volume of data. Thus, an unsupervised machine learning technique, such as data clustering, is usually used to find patterns in the data and thereby label them. For this task, it is essential to have an effective vector embedding representation of texts that depicts the semantic information and helps the machine understand the context, intent, and other nuances of the entire text. In this paper, we perform an extensive evaluation of different text embedding models for clustering, labeling, and training an NLU model using the text of attendances from the Coronavirus Platform Service of Ceará, Brazil. We also show how different text embeddings result in different clustering, thus capturing different intentions of patients.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于聊天机器人的自然语言理解模型COVID-19

使用聊天机器人作为使用服务的接口越来越普遍。让这种体验更加人性化，需要聊天机器人能够理解自然语言，并使用自然语言表达自己。实现这一目标的一个关键步骤是用意图和实体标记数据。标记后，可以使用标记的数据来训练自然语言理解(NLU)组件。NLU组件解释文本，提取文本中存在的意图和实体。由于数据量大，手动标记数据是一项繁重且不切实际的过程。因此，一种无监督的机器学习技术，如数据聚类，通常用于发现数据中的模式，从而标记它们。对于这项任务，必须有一个有效的文本向量嵌入表示来描述语义信息，并帮助机器理解整个文本的上下文、意图和其他细微差别。在本文中，我们对不同的文本嵌入模型进行了广泛的评估，用于聚类、标记和训练NLU模型，使用来自巴西塞雷冠状病毒平台服务的出席文本。我们还展示了不同的文本嵌入如何导致不同的聚类，从而捕获不同的患者意图。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)

自引率

0.00%

发文量