Personalized language modeling by crowd sourcing with social network data for voice access of cloud applications

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI:10.1109/SLT.2012.6424220

Tsung-Hsien Wen, Hung-yi Lee, Tai-Yuan Chen, Lin-Shan Lee

{"title":"Personalized language modeling by crowd sourcing with social network data for voice access of cloud applications","authors":"Tsung-Hsien Wen, Hung-yi Lee, Tai-Yuan Chen, Lin-Shan Lee","doi":"10.1109/SLT.2012.6424220","DOIUrl":null,"url":null,"abstract":"Voice access of cloud applications via smartphones is very attractive today, specifically because a smartphones is used by a single user, so personalized acoustic/language models become feasible. In particular, huge quantities of texts are available within the social networks over the Internet with known authors and given relationships, it is possible to train personalized language models because it is reasonable to assume users with those relationships may share some common subject topics, wording habits and linguistic patterns. In this paper, we propose an adaptation framework for building a robust personalized language model by incorporating the texts the target user and other users had posted on the social networks over the Internet to take care of the linguistic mismatch across different users. Experiments on Facebook dataset showed encouraging improvements in terms of both model perplexity and recognition accuracy with proposed approaches considering relationships among users, similarity based on latent topics, and random walk over a user graph.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2012.6424220","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Voice access of cloud applications via smartphones is very attractive today, specifically because a smartphones is used by a single user, so personalized acoustic/language models become feasible. In particular, huge quantities of texts are available within the social networks over the Internet with known authors and given relationships, it is possible to train personalized language models because it is reasonable to assume users with those relationships may share some common subject topics, wording habits and linguistic patterns. In this paper, we propose an adaptation framework for building a robust personalized language model by incorporating the texts the target user and other users had posted on the social networks over the Internet to take care of the linguistic mismatch across different users. Experiments on Facebook dataset showed encouraging improvements in terms of both model perplexity and recognition accuracy with proposed approaches considering relationships among users, similarity based on latent topics, and random walk over a user graph.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

个性化的语言建模，通过群体外包与社会网络数据的云应用程序的语音访问

如今，通过智能手机访问云应用程序的语音非常有吸引力，特别是因为智能手机是由单个用户使用的，所以个性化的声学/语言模型变得可行。特别是，在互联网上的社交网络中有大量具有已知作者和给定关系的文本，因此可以训练个性化的语言模型，因为假设具有这些关系的用户可能共享一些共同的主题主题、措辞习惯和语言模式是合理的。在本文中，我们提出了一个适应框架，通过整合目标用户和其他用户在互联网社交网络上发布的文本来构建鲁棒的个性化语言模型，以照顾不同用户之间的语言不匹配。在Facebook数据集上的实验表明，通过考虑用户之间的关系、基于潜在主题的相似性和用户图上的随机漫步，所提出的方法在模型困惑度和识别精度方面都有了令人鼓舞的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2012 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量