Intent detection using semantically enriched word embeddings

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846297

Joo-Kyung Kim, Gökhan Tür, Asli Celikyilmaz, Bin Cao, Ye-Yi Wang

{"title":"Intent detection using semantically enriched word embeddings","authors":"Joo-Kyung Kim, Gökhan Tür, Asli Celikyilmaz, Bin Cao, Ye-Yi Wang","doi":"10.1109/SLT.2016.7846297","DOIUrl":null,"url":null,"abstract":"State-of-the-art targeted language understanding systems rely on deep learning methods using 1-hot word vectors or off-the-shelf word embeddings. While word embeddings can be enriched with information from semantic lexicons (such as WordNet and PPDB) to improve their semantic representation, most previous research on word-embedding enriching has focused on improving intrinsic word-level tasks such as word analogy and antonym detection. In this work, we enrich word embeddings to force semantically similar or dissimilar words to be closer or farther away in the embedding space to improve the performance of an extrinsic task, namely, intent detection for spoken language understanding. We utilize several semantic lexicons, such as WordNet, PPDB, and Macmillan Dictionary to enrich the word embeddings and later use them as initial representation of words for intent detection. Thus, we enrich embeddings outside the neural network as opposed to learning the embeddings within the network, and, on top of the embeddings, build bidirectional LSTM for intent detection. Our experiments on ATIS and a real log dataset from Microsoft Cortana show that word embeddings enriched with semantic lexicons can improve intent detection.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"87","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846297","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 87

Abstract

State-of-the-art targeted language understanding systems rely on deep learning methods using 1-hot word vectors or off-the-shelf word embeddings. While word embeddings can be enriched with information from semantic lexicons (such as WordNet and PPDB) to improve their semantic representation, most previous research on word-embedding enriching has focused on improving intrinsic word-level tasks such as word analogy and antonym detection. In this work, we enrich word embeddings to force semantically similar or dissimilar words to be closer or farther away in the embedding space to improve the performance of an extrinsic task, namely, intent detection for spoken language understanding. We utilize several semantic lexicons, such as WordNet, PPDB, and Macmillan Dictionary to enrich the word embeddings and later use them as initial representation of words for intent detection. Thus, we enrich embeddings outside the neural network as opposed to learning the embeddings within the network, and, on top of the embeddings, build bidirectional LSTM for intent detection. Our experiments on ATIS and a real log dataset from Microsoft Cortana show that word embeddings enriched with semantic lexicons can improve intent detection.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于语义丰富词嵌入的意图检测

最先进的目标语言理解系统依赖于使用1-热词向量或现成的词嵌入的深度学习方法。虽然词嵌入可以利用语义词汇(如WordNet和PPDB)的信息进行丰富，以提高其语义表征，但大多数关于词嵌入丰富的研究都集中在提高固有的词级任务，如词类比和反义词检测。在这项工作中，我们丰富了词嵌入，以迫使语义相似或不相似的词在嵌入空间中更近或更远，以提高外在任务的性能，即用于口语理解的意图检测。我们利用几个语义词典，如WordNet、PPDB和Macmillan Dictionary来丰富词嵌入，然后将它们用作意图检测的词的初始表示。因此，我们丰富了神经网络外部的嵌入，而不是学习网络内部的嵌入，并且在嵌入的基础上，构建了用于意图检测的双向LSTM。我们在ATIS和来自Microsoft Cortana的真实日志数据集上的实验表明，富含语义词汇的词嵌入可以改善意图检测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量