向已解析的语料库添加语言信息

Linguistic Issues in Language Technology Pub Date : 1900-01-01 DOI:10.33011/lilt.v18i.1435

S. Pintzuk

{"title":"向已解析的语料库添加语言信息","authors":"S. Pintzuk","doi":"10.33011/lilt.v18i.1435","DOIUrl":null,"url":null,"abstract":"No matter how comprehensively corpus builders design their annotation schemes, users frequently find that information is missing that they need for their research. In this methodological paper I describe and illustrate five methods of adding linguistic information to corpora that have been morphosyntactically annotated (=parsed) in the style of Penn treebanks. Some of these methods involve manual operations; some are executed by CorpusSearch functions; some require a combination of manual and automated procedures. Which method is used depends almost entirely on the type of information to be added and the goals of the user. Of course, the main goal, regardless of method, is to record within the corpus additional information that can be used for analysis and also retained through further searches and data processing.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"123 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adding linguistic information to parsed corpora\",\"authors\":\"S. Pintzuk\",\"doi\":\"10.33011/lilt.v18i.1435\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"No matter how comprehensively corpus builders design their annotation schemes, users frequently find that information is missing that they need for their research. In this methodological paper I describe and illustrate five methods of adding linguistic information to corpora that have been morphosyntactically annotated (=parsed) in the style of Penn treebanks. Some of these methods involve manual operations; some are executed by CorpusSearch functions; some require a combination of manual and automated procedures. Which method is used depends almost entirely on the type of information to be added and the goals of the user. Of course, the main goal, regardless of method, is to record within the corpus additional information that can be used for analysis and also retained through further searches and data processing.\",\"PeriodicalId\":218122,\"journal\":{\"name\":\"Linguistic Issues in Language Technology\",\"volume\":\"123 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Linguistic Issues in Language Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33011/lilt.v18i.1435\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguistic Issues in Language Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33011/lilt.v18i.1435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

无论语料库构建者如何全面地设计他们的标注方案，用户经常会发现缺少他们研究所需的信息。在这篇方法学论文中，我描述并说明了五种向语料库中添加语言信息的方法，这些语料库已经以Penn树库的风格进行了形态句法注释(=解析)。其中一些方法涉及人工操作;有些由CorpusSearch函数执行;有些需要人工和自动程序的结合。使用哪种方法几乎完全取决于要添加的信息类型和用户的目标。当然，无论采用何种方法，主要目标都是在语料库中记录可用于分析的附加信息，并通过进一步的搜索和数据处理保留这些信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Adding linguistic information to parsed corpora

No matter how comprehensively corpus builders design their annotation schemes, users frequently find that information is missing that they need for their research. In this methodological paper I describe and illustrate five methods of adding linguistic information to corpora that have been morphosyntactically annotated (=parsed) in the style of Penn treebanks. Some of these methods involve manual operations; some are executed by CorpusSearch functions; some require a combination of manual and automated procedures. Which method is used depends almost entirely on the type of information to be added and the goals of the user. Of course, the main goal, regardless of method, is to record within the corpus additional information that can be used for analysis and also retained through further searches and data processing.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Linguistic Issues in Language Technology

自引率

0.00%

发文量

期刊最新文献

Improving Multilingual Frame Identification by Estimating Frame Transferability Parsed Corpus as a Source for Testing Generalizations in Japanese Syntax Exploiting Parsed Corpora: Applications in Research, Pedagogy, and Processing Exploiting parsed corpora in grammar teaching Building a Chinese AMR Bank with Concept and Relation Alignments