Construction and Dissemination of a Corpus of Spoken Interaction - Tools and Workflows in the FOLK project

J. Lang. Technol. Comput. Linguistics Pub Date : 2016-07-01 DOI:10.21248/jlcl.31.2016.205

Thomas C. Schmidt

引用次数: 11

Abstract

This paper is about the workflow for construction and dissemination of FOLK (Forschungs - und Lehrkorpus Gesprochenes Deutsch – Research and Teaching Corpus of Spoken German), a large corpus of authentic spoken interaction data, recorded on audio and video. Section 2 describes in detail the tools used in the individual steps of transcription, anonymization, orthographic normalization, lemmatization and POS tagging of the data, as well as some utilities used for corpus management. Section 3 deals with the DGD (Datenbank fur Gesprochenes Deutsch - Database of Spoken German) as a tool for distributing completed data sets and making them available for qualitative and quantitative analysis. In section 4, some plans for further development are sketched.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

口语交互语料库的构建和传播——FOLK项目中的工具和工作流程

本文研究了德语口语研究与教学语料库FOLK (Forschungs - und Lehrkorpus Gesprochenes Deutsch)的构建与传播工作流程。FOLK是一个以音频和视频形式记录的大型真实口语互动数据语料库。第2节详细描述了在数据的转录、匿名化、正字法规范化、词形化和词性标注等各个步骤中使用的工具，以及用于语料库管理的一些实用程序。第3节讨论了DGD(德语口语数据库)作为分发完整数据集并使其可用于定性和定量分析的工具。在第4节中，概述了进一步发展的一些计划。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

J. Lang. Technol. Comput. Linguistics

自引率

0.00%

发文量