Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) Pub Date : 2019-07-01 DOI:10.1109/ISVLSI.2019.00033

M. Alawad, G. Tourassi

{"title":"Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing","authors":"M. Alawad, G. Tourassi","doi":"10.1109/ISVLSI.2019.00033","DOIUrl":null,"url":null,"abstract":"Deep learning (DL) has been used for many natural language processing (NLP) tasks due to its superior performance as compared to traditional machine learning approaches. In DL models for NLP, words are represented using word embeddings, which capture both semantic and syntactic information in text. However, 90-95% of the DL trainable parameters are associated with the word embeddings, resulting in a large storage or memory footprint. Therefore, reducing the number of word embedding parameters is critical, especially with the increase of vocabulary size. In this work, we propose a novel approximate word embeddings approach for convolutional neural networks (CNNs) used for text classification tasks. The proposed approach significantly reduces the number of model trainable parameters without noticeably sacrificing in computing performance accuracy. Compared to other techniques, our proposed word embeddings technique does not require modifications to the DL model architecture. We evaluate the performance of the the proposed word embeddings on three classification tasks using two datasets, composed of Yelp and Amazon reviews. The results show that the proposed method can reduce the number of word embeddings parameters by 98% and 99% for the Yelp and Amazon datasets respectively, with no drop in computing accuracy.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"9 1","pages":"134-139"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2019.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Deep learning (DL) has been used for many natural language processing (NLP) tasks due to its superior performance as compared to traditional machine learning approaches. In DL models for NLP, words are represented using word embeddings, which capture both semantic and syntactic information in text. However, 90-95% of the DL trainable parameters are associated with the word embeddings, resulting in a large storage or memory footprint. Therefore, reducing the number of word embedding parameters is critical, especially with the increase of vocabulary size. In this work, we propose a novel approximate word embeddings approach for convolutional neural networks (CNNs) used for text classification tasks. The proposed approach significantly reduces the number of model trainable parameters without noticeably sacrificing in computing performance accuracy. Compared to other techniques, our proposed word embeddings technique does not require modifications to the DL model architecture. We evaluate the performance of the the proposed word embeddings on three classification tasks using two datasets, composed of Yelp and Amazon reviews. The results show that the proposed method can reduce the number of word embeddings parameters by 98% and 99% for the Yelp and Amazon datasets respectively, with no drop in computing accuracy.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

自然语言处理中质量控制词嵌入的计算效率学习

由于与传统机器学习方法相比，深度学习(DL)具有优越的性能，因此已被用于许多自然语言处理(NLP)任务。在NLP的深度学习模型中，单词使用词嵌入来表示，它捕获文本中的语义和句法信息。然而，90-95%的深度学习可训练参数与词嵌入相关，导致大量存储或内存占用。因此，减少词嵌入参数的数量至关重要，尤其是随着词汇量的增加。在这项工作中，我们为用于文本分类任务的卷积神经网络(cnn)提出了一种新的近似词嵌入方法。该方法在不显著牺牲计算性能精度的前提下，显著减少了模型可训练参数的数量。与其他技术相比，我们提出的词嵌入技术不需要修改深度学习模型架构。我们使用由Yelp和Amazon评论组成的两个数据集来评估所提出的词嵌入在三个分类任务上的性能。结果表明，该方法在不影响计算精度的情况下，可以将Yelp和Amazon数据集的词嵌入参数数量分别减少98%和99%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

自引率

0.00%

发文量

期刊最新文献

Ferroelectric FET Based TCAM Designs for Energy Efficient Computing Evaluation of Compilers Effects on OpenMP Soft Error Resiliency Towards Efficient Compact Network Training on Edge-Devices PageCmp: Bandwidth Efficient Page Deduplication through In-memory Page Comparison Improving Logic Optimization in Sequential Circuits using Majority-inverter Graphs