A log-linear weighting approach in the Word2vec space for spoken language understanding

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846289

Killian Janod, Mohamed Morchid, Richard Dufour, G. Linarès

引用次数: 3

Abstract

This paper proposes an original method which integrates contextual information of words into Word2vec neural networks that learn from words and their respective context windows. In the classical word embedding approach, context windows are represented as bag-of-words, i.e. every word in the context is treated equally. A log-linear weighting approach modeling the continuous context is proposed in our model to take into account the relative position of words in the surrounding context of the word. Quality improvements implied by this method are shown on the the Semantic-Syntactic Word Relationship test and on a real application framework implying a theme identification task of human dialogues. The promising gains of our adapted Word2vec model of 7 and 5 points for Skip-gram and CBOW approaches respectively demonstrate that the proposed models are a step forward for word and document representation.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于口语理解的Word2vec空间的对数线性加权方法

本文提出了一种新颖的方法，将单词的上下文信息整合到Word2vec神经网络中，Word2vec神经网络从单词及其上下文窗口中学习。在经典的词嵌入方法中，上下文窗口被表示为词袋，即上下文中的每个词都被平等对待。在我们的模型中提出了一种对数线性加权方法来建模连续上下文，以考虑单词在单词周围上下文中的相对位置。在语义-句法词关系测试和一个隐含人类对话主题识别任务的实际应用框架上，表明了该方法所隐含的质量改进。在Skip-gram和CBOW方法中，我们的Word2vec模型分别获得了7和5个点，这表明我们提出的模型在单词和文档表示方面向前迈进了一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量