Extremal GloVe : Theoretically Accurate Distributed Word Embedding by Tail Inference

Hao Wang
{"title":"Extremal GloVe : Theoretically Accurate Distributed Word Embedding by Tail Inference","authors":"Hao Wang","doi":"10.1145/3507971.3507972","DOIUrl":null,"url":null,"abstract":"Distributed word embeddings such as Word2Vec and GloVe have been widely adopted in industrial context settings. Major technical applications of GloVe include recommender systems and natural language processing. The fundamental theory behind GloVe relies on the selection of a weighting function in the weighted least squres formulation that computes the powered ratio of word occurrence count and the maximum word count in the corpus. However, the initial formulation of GloVe is not theoretically sound in two aspects, namely the selection of the weighting function and its power exponent is ad-hoc. In this paper, we utilize the theory of extreme value analysis and propose a theoretically accurate version of GloVe. By reformulating the weighted least squares loss function as the expected loss function and accurately choosing the power exponent, we create a theoretically accurate version of GloVe. We demonstrate the competitiveness of our algorithm and show that the initial formulation of GloVe with the suggested optimal parameter can be viewed as a special case of our paradigm.","PeriodicalId":439757,"journal":{"name":"Proceedings of the 7th International Conference on Communication and Information Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Communication and Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3507971.3507972","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Distributed word embeddings such as Word2Vec and GloVe have been widely adopted in industrial context settings. Major technical applications of GloVe include recommender systems and natural language processing. The fundamental theory behind GloVe relies on the selection of a weighting function in the weighted least squres formulation that computes the powered ratio of word occurrence count and the maximum word count in the corpus. However, the initial formulation of GloVe is not theoretically sound in two aspects, namely the selection of the weighting function and its power exponent is ad-hoc. In this paper, we utilize the theory of extreme value analysis and propose a theoretically accurate version of GloVe. By reformulating the weighted least squares loss function as the expected loss function and accurately choosing the power exponent, we create a theoretically accurate version of GloVe. We demonstrate the competitiveness of our algorithm and show that the initial formulation of GloVe with the suggested optimal parameter can be viewed as a special case of our paradigm.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
极端手套:基于尾部推理的理论上精确的分布式词嵌入
像Word2Vec和GloVe这样的分布式词嵌入已经在工业环境中被广泛采用。GloVe的主要技术应用包括推荐系统和自然语言处理。GloVe背后的基本理论依赖于加权最小二乘公式中权重函数的选择,该公式计算语料库中单词出现计数和最大单词计数的功率比。然而,GloVe的初始公式在理论上并不完善,主要表现在两个方面,即权重函数的选择及其幂指数的选取是临时的。本文运用极值分析理论,提出了一个理论上准确的GloVe版本。通过将加权最小二乘损失函数重新表述为期望损失函数,并准确地选择幂指数,我们创建了一个理论上准确的GloVe版本。我们证明了我们的算法的竞争力,并表明具有建议的最优参数的GloVe的初始公式可以被视为我们范式的一个特殊情况。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Dynamic Path Planning of UAV Based on Pheromone Diffusion Ant Colony Algorithm Access Control Design Based on User Role Type in Telemedicine System Using Ethereum Blockchain Identifying Giant Clams Species using Machine Learning Techniques Blockchain based Distributed Oracle in Time Sensitive Scenario A Reliable Digital Watermarking Algorithm Based On DCT-SVD Algorithm
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1