Valmir Ferreira dos Santos Junior, João Araújo Castelo Branco, Marcos Antonio de Oliveira, T. C. D. Silva, L. A. Cruz, R. P. Magalhães
{"title":"A Natural Language Understanding Model COVID-19 based for chatbots","authors":"Valmir Ferreira dos Santos Junior, João Araújo Castelo Branco, Marcos Antonio de Oliveira, T. C. D. Silva, L. A. Cruz, R. P. Magalhães","doi":"10.1109/BIBE52308.2021.9635248","DOIUrl":null,"url":null,"abstract":"It is increasingly common to use chatbots as an interface to use services. Making this experience more humanized requires the chatbot to understand natural language and express itself using natural language. One crucial step to achieve this is to label the data with intentions and entities. After labeling, one can use the labeled data to train a Natural Language Understanding (NLU) component. The NLU component interprets the text extracting the intentions and entities present in that text. Manually label the data is an onerous and impracticable process due to the high volume of data. Thus, an unsupervised machine learning technique, such as data clustering, is usually used to find patterns in the data and thereby label them. For this task, it is essential to have an effective vector embedding representation of texts that depicts the semantic information and helps the machine understand the context, intent, and other nuances of the entire text. In this paper, we perform an extensive evaluation of different text embedding models for clustering, labeling, and training an NLU model using the text of attendances from the Coronavirus Platform Service of Ceará, Brazil. We also show how different text embeddings result in different clustering, thus capturing different intentions of patients.","PeriodicalId":343724,"journal":{"name":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE52308.2021.9635248","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
It is increasingly common to use chatbots as an interface to use services. Making this experience more humanized requires the chatbot to understand natural language and express itself using natural language. One crucial step to achieve this is to label the data with intentions and entities. After labeling, one can use the labeled data to train a Natural Language Understanding (NLU) component. The NLU component interprets the text extracting the intentions and entities present in that text. Manually label the data is an onerous and impracticable process due to the high volume of data. Thus, an unsupervised machine learning technique, such as data clustering, is usually used to find patterns in the data and thereby label them. For this task, it is essential to have an effective vector embedding representation of texts that depicts the semantic information and helps the machine understand the context, intent, and other nuances of the entire text. In this paper, we perform an extensive evaluation of different text embedding models for clustering, labeling, and training an NLU model using the text of attendances from the Coronavirus Platform Service of Ceará, Brazil. We also show how different text embeddings result in different clustering, thus capturing different intentions of patients.