Thomaz Calasans, Anna Helena Reali Costa, Eduardo R. Hruschka
{"title":"基于迁移学习的语境化词嵌入对对话响应生成的建议与比较","authors":"Thomaz Calasans, Anna Helena Reali Costa, Eduardo R. Hruschka","doi":"10.1145/3459104.3459169","DOIUrl":null,"url":null,"abstract":"Contextualised word embeddings have recently become essential elements of Natural Language Processing (NLP) systems since these embedding models encode not only words but also their contexts to generate context-specific representations. Pre-trained models such as BERT, GPT, and derived architectures are increasingly present on NLP task benchmarks. Several comparative analyses of such models have been performed, but so far no one compares the most recent architectures in a dialogue generation dataset by considering multiple metrics relevant to the task. In this paper, we not only propose an encoder-decoder system that uses transfer learning with pre-trained word embeddings, but we also systematically compare various pretrained contextualised word embedding architectures on the DSTC-7 dataset, using metrics based on mutual information, dialogue length, and variety of answers. We use the word embeddings as a first layer of the encoder, making it possible to encode the texts in a latent space. As a decoder, we use an LSTM layer and a byte pair encoding tokenisation, aligned with state-of-the-art dialogue systems recently published. The networks are trained during the same amount of epochs, with the same optimisers and learning rates. Considering the quality of the dialogue, our results show that there is no superior technique on all metrics. However, there are relevant differences concerning the computational costs to encode the data.","PeriodicalId":142284,"journal":{"name":"2021 International Symposium on Electrical, Electronics and Information Engineering","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Contextualised Word Embeddings Based on Transfer Learning to Dialogue Response Generation: a Proposal and Comparisons\",\"authors\":\"Thomaz Calasans, Anna Helena Reali Costa, Eduardo R. Hruschka\",\"doi\":\"10.1145/3459104.3459169\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Contextualised word embeddings have recently become essential elements of Natural Language Processing (NLP) systems since these embedding models encode not only words but also their contexts to generate context-specific representations. Pre-trained models such as BERT, GPT, and derived architectures are increasingly present on NLP task benchmarks. Several comparative analyses of such models have been performed, but so far no one compares the most recent architectures in a dialogue generation dataset by considering multiple metrics relevant to the task. In this paper, we not only propose an encoder-decoder system that uses transfer learning with pre-trained word embeddings, but we also systematically compare various pretrained contextualised word embedding architectures on the DSTC-7 dataset, using metrics based on mutual information, dialogue length, and variety of answers. We use the word embeddings as a first layer of the encoder, making it possible to encode the texts in a latent space. As a decoder, we use an LSTM layer and a byte pair encoding tokenisation, aligned with state-of-the-art dialogue systems recently published. The networks are trained during the same amount of epochs, with the same optimisers and learning rates. Considering the quality of the dialogue, our results show that there is no superior technique on all metrics. However, there are relevant differences concerning the computational costs to encode the data.\",\"PeriodicalId\":142284,\"journal\":{\"name\":\"2021 International Symposium on Electrical, Electronics and Information Engineering\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Symposium on Electrical, Electronics and Information Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3459104.3459169\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Symposium on Electrical, Electronics and Information Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459104.3459169","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Contextualised Word Embeddings Based on Transfer Learning to Dialogue Response Generation: a Proposal and Comparisons
Contextualised word embeddings have recently become essential elements of Natural Language Processing (NLP) systems since these embedding models encode not only words but also their contexts to generate context-specific representations. Pre-trained models such as BERT, GPT, and derived architectures are increasingly present on NLP task benchmarks. Several comparative analyses of such models have been performed, but so far no one compares the most recent architectures in a dialogue generation dataset by considering multiple metrics relevant to the task. In this paper, we not only propose an encoder-decoder system that uses transfer learning with pre-trained word embeddings, but we also systematically compare various pretrained contextualised word embedding architectures on the DSTC-7 dataset, using metrics based on mutual information, dialogue length, and variety of answers. We use the word embeddings as a first layer of the encoder, making it possible to encode the texts in a latent space. As a decoder, we use an LSTM layer and a byte pair encoding tokenisation, aligned with state-of-the-art dialogue systems recently published. The networks are trained during the same amount of epochs, with the same optimisers and learning rates. Considering the quality of the dialogue, our results show that there is no superior technique on all metrics. However, there are relevant differences concerning the computational costs to encode the data.