A Multi-layer Bidirectional Transformer Encoder for Pre-trained Word Embedding: A Survey of BERT

2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) Pub Date : 2020-01-01 DOI:10.1109/Confluence47617.2020.9058044

Rohit Kumar Kaliyar

{"title":"A Multi-layer Bidirectional Transformer Encoder for Pre-trained Word Embedding: A Survey of BERT","authors":"Rohit Kumar Kaliyar","doi":"10.1109/Confluence47617.2020.9058044","DOIUrl":null,"url":null,"abstract":"Language modeling is the task of assigning a probability distribution over sequences of words that matches the distribution of a language. A language model is required to represent the text to a form understandable from the machine point of view. A language model is capable to predict the probability of a word occurring in the context-related text. Although it sounds formidable, in the existing research, most of the language models are based on unidirectional training. In this paper, we have investigated a bi-directional training model-BERT (Bidirectional Encoder Representations from Transformers). BERT builds on top of the bidirectional idea as compared to other word embedding models (like Elmo). It practices the comparatively new transformer encoder-based architecture to compute word embedding. In this paper, it has been described that how this model is to be producing or achieving state-of-the-art results on various NLP tasks. BERT has the capability to train the model in bi-directional over a large corpus. All the existing methods are based on unidirectional training (either the left or the right). This bi-directionality of the language model helps to obtain better results in the context-related classification tasks in which the word(s) was used as input vectors. Additionally, BERT is outlined to do multi-task learning using context-related datasets. It can perform different NLP tasks simultaneously. This survey focuses on the detailed representation of the BERT- based technique for word embedding, its architecture, and the importance of this model for pre-training purposes using a large corpus.","PeriodicalId":180005,"journal":{"name":"2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Confluence47617.2020.9058044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Language modeling is the task of assigning a probability distribution over sequences of words that matches the distribution of a language. A language model is required to represent the text to a form understandable from the machine point of view. A language model is capable to predict the probability of a word occurring in the context-related text. Although it sounds formidable, in the existing research, most of the language models are based on unidirectional training. In this paper, we have investigated a bi-directional training model-BERT (Bidirectional Encoder Representations from Transformers). BERT builds on top of the bidirectional idea as compared to other word embedding models (like Elmo). It practices the comparatively new transformer encoder-based architecture to compute word embedding. In this paper, it has been described that how this model is to be producing or achieving state-of-the-art results on various NLP tasks. BERT has the capability to train the model in bi-directional over a large corpus. All the existing methods are based on unidirectional training (either the left or the right). This bi-directionality of the language model helps to obtain better results in the context-related classification tasks in which the word(s) was used as input vectors. Additionally, BERT is outlined to do multi-task learning using context-related datasets. It can perform different NLP tasks simultaneously. This survey focuses on the detailed representation of the BERT- based technique for word embedding, its architecture, and the importance of this model for pre-training purposes using a large corpus.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种用于预训练词嵌入的多层双向变压器编码器:BERT综述

语言建模是在匹配语言分布的单词序列上分配概率分布的任务。语言模型需要将文本表示为机器可以理解的形式。语言模型能够预测单词在与上下文相关的文本中出现的概率。虽然听起来很可怕，但在现有的研究中，大多数语言模型都是基于单向训练的。本文研究了一种双向训练模型——bert (Bidirectional Encoder Representations from Transformers)。与其他词嵌入模型(如Elmo)相比，BERT是建立在双向思想之上的。它采用了相对较新的基于变压器编码器的架构来计算词嵌入。在本文中，已经描述了该模型如何在各种NLP任务上产生或实现最先进的结果。BERT具有在大型语料库上双向训练模型的能力。现有的方法都是基于单向训练(左或右)。语言模型的这种双向性有助于在使用单词作为输入向量的与上下文相关的分类任务中获得更好的结果。此外，BERT概述了使用与上下文相关的数据集进行多任务学习。它可以同时执行不同的NLP任务。本调查的重点是基于BERT的词嵌入技术的详细表示，它的体系结构，以及该模型在使用大型语料库进行预训练时的重要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)

自引率

0.00%

发文量