Stacked Language Models for an Optimized Next Word Generation

2022 IST-Africa Conference (IST-Africa) Pub Date : 2022-05-16 DOI:10.23919/IST-Africa56635.2022.9845545

E. O. Aliyu, E. Kotzé

{"title":"Stacked Language Models for an Optimized Next Word Generation","authors":"E. O. Aliyu, E. Kotzé","doi":"10.23919/IST-Africa56635.2022.9845545","DOIUrl":null,"url":null,"abstract":"Next word prediction task is the application of a language model in natural language generation that deals with generating words by repeatedly sampling the next word conditioned on the previous choices. This paper proposes a stacked language model for optimized next word generation using three models. In stage I, the meaning of a word is captured through learn embedding and the structure of the text sequence is encoded using a stacked Long Short Term Memory (LSTM). In stage II, a Bidirectional Long Short Term Memory (Bi-LSTM) stacking on top of the unidirectional LSTM encodes the structure of the text sequences, while in stage III, a two-layer Gated Recurrent Unit (GRU) is used to capture text sequences of data. The proposed system was implemented using Python 3.7, Tensorflow 2.6.0 with Keras and a Nvidia Graphical Processing Unit (GPU). The proposed deep learning models were trained using the Pride and Prejudice corpus from the Project Gutenberg library of ebooks. The evaluation was performed by predicting the next 3 words after considering 10 sets of text sequences. From the experiment carried out, the accuracy of the two-layer LSTM model measured 83%, the accuracy of the Bi-LSTM stacking on unidirectional LSTM model measured 79%, and the accuracy of the two-layer GRU model measured 81%. Regarding predictions, the two-layer LSTM predicted the 10 sequences correctly, the Bi-LSTM stacking on unidirectional LSTM predicted 8 sequences correctly and the two-layer GRU predicted 7 sequences correctly.","PeriodicalId":142887,"journal":{"name":"2022 IST-Africa Conference (IST-Africa)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IST-Africa Conference (IST-Africa)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/IST-Africa56635.2022.9845545","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Next word prediction task is the application of a language model in natural language generation that deals with generating words by repeatedly sampling the next word conditioned on the previous choices. This paper proposes a stacked language model for optimized next word generation using three models. In stage I, the meaning of a word is captured through learn embedding and the structure of the text sequence is encoded using a stacked Long Short Term Memory (LSTM). In stage II, a Bidirectional Long Short Term Memory (Bi-LSTM) stacking on top of the unidirectional LSTM encodes the structure of the text sequences, while in stage III, a two-layer Gated Recurrent Unit (GRU) is used to capture text sequences of data. The proposed system was implemented using Python 3.7, Tensorflow 2.6.0 with Keras and a Nvidia Graphical Processing Unit (GPU). The proposed deep learning models were trained using the Pride and Prejudice corpus from the Project Gutenberg library of ebooks. The evaluation was performed by predicting the next 3 words after considering 10 sets of text sequences. From the experiment carried out, the accuracy of the two-layer LSTM model measured 83%, the accuracy of the Bi-LSTM stacking on unidirectional LSTM model measured 79%, and the accuracy of the two-layer GRU model measured 81%. Regarding predictions, the two-layer LSTM predicted the 10 sequences correctly, the Bi-LSTM stacking on unidirectional LSTM predicted 8 sequences correctly and the two-layer GRU predicted 7 sequences correctly.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于优化下一代单词生成的堆叠语言模型

下一个单词预测任务是一种语言模型在自然语言生成中的应用，它处理的是在前一个选择的条件下，通过重复采样下一个单词来生成单词。本文提出了一种基于三个模型的层叠语言模型，用于优化下一代词的生成。在第一阶段，通过学习嵌入捕获单词的含义，并使用堆叠长短期记忆(LSTM)对文本序列的结构进行编码。在阶段II中，在单向LSTM之上叠加双向长短期记忆(Bi-LSTM)编码文本序列的结构，而在阶段III中，使用两层门控循环单元(GRU)捕获数据的文本序列。该系统使用Python 3.7, Tensorflow 2.6.0与Keras和Nvidia图形处理单元(GPU)实现。所提出的深度学习模型使用来自古腾堡计划电子书库的傲慢与偏见语料库进行训练。在考虑了10组文本序列后，通过预测接下来的3个单词来进行评估。从所进行的实验来看，两层LSTM模型的准确率为83%，双向LSTM叠加在单向LSTM模型上的准确率为79%，两层GRU模型的准确率为81%。在预测方面，双层LSTM正确预测了10个序列，在单向LSTM上叠加的Bi-LSTM正确预测了8个序列，双层GRU正确预测了7个序列。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IST-Africa Conference (IST-Africa)

自引率

0.00%

发文量