Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing

M. Alawad, G. Tourassi
{"title":"Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing","authors":"M. Alawad, G. Tourassi","doi":"10.1109/ISVLSI.2019.00033","DOIUrl":null,"url":null,"abstract":"Deep learning (DL) has been used for many natural language processing (NLP) tasks due to its superior performance as compared to traditional machine learning approaches. In DL models for NLP, words are represented using word embeddings, which capture both semantic and syntactic information in text. However, 90-95% of the DL trainable parameters are associated with the word embeddings, resulting in a large storage or memory footprint. Therefore, reducing the number of word embedding parameters is critical, especially with the increase of vocabulary size. In this work, we propose a novel approximate word embeddings approach for convolutional neural networks (CNNs) used for text classification tasks. The proposed approach significantly reduces the number of model trainable parameters without noticeably sacrificing in computing performance accuracy. Compared to other techniques, our proposed word embeddings technique does not require modifications to the DL model architecture. We evaluate the performance of the the proposed word embeddings on three classification tasks using two datasets, composed of Yelp and Amazon reviews. The results show that the proposed method can reduce the number of word embeddings parameters by 98% and 99% for the Yelp and Amazon datasets respectively, with no drop in computing accuracy.","PeriodicalId":6703,"journal":{"name":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"9 1","pages":"134-139"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2019.00033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Deep learning (DL) has been used for many natural language processing (NLP) tasks due to its superior performance as compared to traditional machine learning approaches. In DL models for NLP, words are represented using word embeddings, which capture both semantic and syntactic information in text. However, 90-95% of the DL trainable parameters are associated with the word embeddings, resulting in a large storage or memory footprint. Therefore, reducing the number of word embedding parameters is critical, especially with the increase of vocabulary size. In this work, we propose a novel approximate word embeddings approach for convolutional neural networks (CNNs) used for text classification tasks. The proposed approach significantly reduces the number of model trainable parameters without noticeably sacrificing in computing performance accuracy. Compared to other techniques, our proposed word embeddings technique does not require modifications to the DL model architecture. We evaluate the performance of the the proposed word embeddings on three classification tasks using two datasets, composed of Yelp and Amazon reviews. The results show that the proposed method can reduce the number of word embeddings parameters by 98% and 99% for the Yelp and Amazon datasets respectively, with no drop in computing accuracy.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
自然语言处理中质量控制词嵌入的计算效率学习
由于与传统机器学习方法相比,深度学习(DL)具有优越的性能,因此已被用于许多自然语言处理(NLP)任务。在NLP的深度学习模型中,单词使用词嵌入来表示,它捕获文本中的语义和句法信息。然而,90-95%的深度学习可训练参数与词嵌入相关,导致大量存储或内存占用。因此,减少词嵌入参数的数量至关重要,尤其是随着词汇量的增加。在这项工作中,我们为用于文本分类任务的卷积神经网络(cnn)提出了一种新的近似词嵌入方法。该方法在不显著牺牲计算性能精度的前提下,显著减少了模型可训练参数的数量。与其他技术相比,我们提出的词嵌入技术不需要修改深度学习模型架构。我们使用由Yelp和Amazon评论组成的两个数据集来评估所提出的词嵌入在三个分类任务上的性能。结果表明,该方法在不影响计算精度的情况下,可以将Yelp和Amazon数据集的词嵌入参数数量分别减少98%和99%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Ferroelectric FET Based TCAM Designs for Energy Efficient Computing Evaluation of Compilers Effects on OpenMP Soft Error Resiliency Towards Efficient Compact Network Training on Edge-Devices PageCmp: Bandwidth Efficient Page Deduplication through In-memory Page Comparison Improving Logic Optimization in Sequential Circuits using Majority-inverter Graphs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1