Killian Janod, Mohamed Morchid, Richard Dufour, G. Linarès
{"title":"A log-linear weighting approach in the Word2vec space for spoken language understanding","authors":"Killian Janod, Mohamed Morchid, Richard Dufour, G. Linarès","doi":"10.1109/SLT.2016.7846289","DOIUrl":null,"url":null,"abstract":"This paper proposes an original method which integrates contextual information of words into Word2vec neural networks that learn from words and their respective context windows. In the classical word embedding approach, context windows are represented as bag-of-words, i.e. every word in the context is treated equally. A log-linear weighting approach modeling the continuous context is proposed in our model to take into account the relative position of words in the surrounding context of the word. Quality improvements implied by this method are shown on the the Semantic-Syntactic Word Relationship test and on a real application framework implying a theme identification task of human dialogues. The promising gains of our adapted Word2vec model of 7 and 5 points for Skip-gram and CBOW approaches respectively demonstrate that the proposed models are a step forward for word and document representation.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846289","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
This paper proposes an original method which integrates contextual information of words into Word2vec neural networks that learn from words and their respective context windows. In the classical word embedding approach, context windows are represented as bag-of-words, i.e. every word in the context is treated equally. A log-linear weighting approach modeling the continuous context is proposed in our model to take into account the relative position of words in the surrounding context of the word. Quality improvements implied by this method are shown on the the Semantic-Syntactic Word Relationship test and on a real application framework implying a theme identification task of human dialogues. The promising gains of our adapted Word2vec model of 7 and 5 points for Skip-gram and CBOW approaches respectively demonstrate that the proposed models are a step forward for word and document representation.