A Multi-layer Bidirectional Transformer Encoder for Pre-trained Word Embedding: A Survey of BERT

Rohit Kumar Kaliyar
{"title":"A Multi-layer Bidirectional Transformer Encoder for Pre-trained Word Embedding: A Survey of BERT","authors":"Rohit Kumar Kaliyar","doi":"10.1109/Confluence47617.2020.9058044","DOIUrl":null,"url":null,"abstract":"Language modeling is the task of assigning a probability distribution over sequences of words that matches the distribution of a language. A language model is required to represent the text to a form understandable from the machine point of view. A language model is capable to predict the probability of a word occurring in the context-related text. Although it sounds formidable, in the existing research, most of the language models are based on unidirectional training. In this paper, we have investigated a bi-directional training model-BERT (Bidirectional Encoder Representations from Transformers). BERT builds on top of the bidirectional idea as compared to other word embedding models (like Elmo). It practices the comparatively new transformer encoder-based architecture to compute word embedding. In this paper, it has been described that how this model is to be producing or achieving state-of-the-art results on various NLP tasks. BERT has the capability to train the model in bi-directional over a large corpus. All the existing methods are based on unidirectional training (either the left or the right). This bi-directionality of the language model helps to obtain better results in the context-related classification tasks in which the word(s) was used as input vectors. Additionally, BERT is outlined to do multi-task learning using context-related datasets. It can perform different NLP tasks simultaneously. This survey focuses on the detailed representation of the BERT- based technique for word embedding, its architecture, and the importance of this model for pre-training purposes using a large corpus.","PeriodicalId":180005,"journal":{"name":"2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Confluence47617.2020.9058044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

Language modeling is the task of assigning a probability distribution over sequences of words that matches the distribution of a language. A language model is required to represent the text to a form understandable from the machine point of view. A language model is capable to predict the probability of a word occurring in the context-related text. Although it sounds formidable, in the existing research, most of the language models are based on unidirectional training. In this paper, we have investigated a bi-directional training model-BERT (Bidirectional Encoder Representations from Transformers). BERT builds on top of the bidirectional idea as compared to other word embedding models (like Elmo). It practices the comparatively new transformer encoder-based architecture to compute word embedding. In this paper, it has been described that how this model is to be producing or achieving state-of-the-art results on various NLP tasks. BERT has the capability to train the model in bi-directional over a large corpus. All the existing methods are based on unidirectional training (either the left or the right). This bi-directionality of the language model helps to obtain better results in the context-related classification tasks in which the word(s) was used as input vectors. Additionally, BERT is outlined to do multi-task learning using context-related datasets. It can perform different NLP tasks simultaneously. This survey focuses on the detailed representation of the BERT- based technique for word embedding, its architecture, and the importance of this model for pre-training purposes using a large corpus.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种用于预训练词嵌入的多层双向变压器编码器:BERT综述
语言建模是在匹配语言分布的单词序列上分配概率分布的任务。语言模型需要将文本表示为机器可以理解的形式。语言模型能够预测单词在与上下文相关的文本中出现的概率。虽然听起来很可怕,但在现有的研究中,大多数语言模型都是基于单向训练的。本文研究了一种双向训练模型——bert (Bidirectional Encoder Representations from Transformers)。与其他词嵌入模型(如Elmo)相比,BERT是建立在双向思想之上的。它采用了相对较新的基于变压器编码器的架构来计算词嵌入。在本文中,已经描述了该模型如何在各种NLP任务上产生或实现最先进的结果。BERT具有在大型语料库上双向训练模型的能力。现有的方法都是基于单向训练(左或右)。语言模型的这种双向性有助于在使用单词作为输入向量的与上下文相关的分类任务中获得更好的结果。此外,BERT概述了使用与上下文相关的数据集进行多任务学习。它可以同时执行不同的NLP任务。本调查的重点是基于BERT的词嵌入技术的详细表示,它的体系结构,以及该模型在使用大型语料库进行预训练时的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Identification of the most efficient algorithm to find Hamiltonian Path in practical conditions Segmentation and Detection of Road Region in Aerial Images using Hybrid CNN-Random Field Algorithm A Novel Approach for Isolation of Sinkhole Attack in Wireless Sensor Networks Performance Analysis of various Information Platforms for recognizing the quality of Indian Roads Time Series Data Analysis And Prediction Of CO2 Emissions
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1