自然语言语料库的众包获取:方法与观察

2012 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2012-12-01 DOI:10.1109/SLT.2012.6424200

William Yang Wang, D. Bohus, Ece Kamar, E. Horvitz

{"title":"自然语言语料库的众包获取:方法与观察","authors":"William Yang Wang, D. Bohus, Ece Kamar, E. Horvitz","doi":"10.1109/SLT.2012.6424200","DOIUrl":null,"url":null,"abstract":"We study the opportunity for using crowdsourcing methods to acquire language corpora for use in natural language processing systems. Specifically, we empirically investigate three methods for eliciting natural language sentences that correspond to a given semantic form. The methods convey frame semantics to crowd workers by means of sentences, scenarios, and list-based descriptions. We discuss various performance measures of the crowdsourcing process, and analyze the semantic correctness, naturalness, and biases of the collected language. We highlight research challenges and directions in applying these methods to acquire corpora for natural language processing applications.","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"122 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"58","resultStr":"{\"title\":\"Crowdsourcing the acquisition of natural language corpora: Methods and observations\",\"authors\":\"William Yang Wang, D. Bohus, Ece Kamar, E. Horvitz\",\"doi\":\"10.1109/SLT.2012.6424200\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the opportunity for using crowdsourcing methods to acquire language corpora for use in natural language processing systems. Specifically, we empirically investigate three methods for eliciting natural language sentences that correspond to a given semantic form. The methods convey frame semantics to crowd workers by means of sentences, scenarios, and list-based descriptions. We discuss various performance measures of the crowdsourcing process, and analyze the semantic correctness, naturalness, and biases of the collected language. We highlight research challenges and directions in applying these methods to acquire corpora for natural language processing applications.\",\"PeriodicalId\":375378,\"journal\":{\"name\":\"2012 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"122 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"58\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2012.6424200\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2012.6424200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 58

摘要

我们研究了使用众包方法获取用于自然语言处理系统的语言语料库的机会。具体来说，我们实证研究了三种方法来引出对应于给定语义形式的自然语言句子。这些方法通过句子、场景和基于列表的描述向人群工作者传递框架语义。我们讨论了众包过程的各种性能度量，并分析了收集语言的语义正确性、自然性和偏差。我们强调了应用这些方法获取用于自然语言处理应用的语料库的研究挑战和方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Crowdsourcing the acquisition of natural language corpora: Methods and observations

We study the opportunity for using crowdsourcing methods to acquire language corpora for use in natural language processing systems. Specifically, we empirically investigate three methods for eliciting natural language sentences that correspond to a given semantic form. The methods convey frame semantics to crowd workers by means of sentences, scenarios, and list-based descriptions. We discuss various performance measures of the crowdsourcing process, and analyze the semantic correctness, naturalness, and biases of the collected language. We highlight research challenges and directions in applying these methods to acquire corpora for natural language processing applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量