{"title":"Using the Europarl corpus for cross-linguistic research","authors":"Bruno Cartoni, S. Zufferey, T. Meyer","doi":"10.1075/BJL.27.02CAR","DOIUrl":null,"url":null,"abstract":"Europarl is a large multilingual corpus containing the minutes of the debates at the European Parliament. This article presents a method to extract different corpora from Europarl: monolingual and multilingual comparable corpora, as well as parallel corpora. Using state-of-the-art measures of homogeneity, we show that these corpora are very similar. In addition, we argue that they present many advantages for research in various fields of linguistics and translation studies, and we also discuss some of their limitations. We conclude by reviewing a number of previous studies that made use of these corpora, emphasizing in each case the possibilities offered by Europarl.","PeriodicalId":35124,"journal":{"name":"Belgian Journal of Linguistics","volume":"27 1","pages":"23-42"},"PeriodicalIF":0.0000,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1075/BJL.27.02CAR","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Belgian Journal of Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/BJL.27.02CAR","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 32
Abstract
Europarl is a large multilingual corpus containing the minutes of the debates at the European Parliament. This article presents a method to extract different corpora from Europarl: monolingual and multilingual comparable corpora, as well as parallel corpora. Using state-of-the-art measures of homogeneity, we show that these corpora are very similar. In addition, we argue that they present many advantages for research in various fields of linguistics and translation studies, and we also discuss some of their limitations. We conclude by reviewing a number of previous studies that made use of these corpora, emphasizing in each case the possibilities offered by Europarl.
期刊介绍:
The Belgian Journal of Linguistics is the annual publication of the Linguistic Society of Belgium and includes selected contributions from the international meetings organized by the LSB. Its volumes are topical and address a wide range of subjects in different fields of linguistics and neighboring disciplines (e.g. translation, poetics, political discourse). The BJL transcends its local basis, not only through the international orientation of its active advisory board, but also by inviting international scholars, both to act as guest editors and to contribute original papers. Articles go through an external and discriminating review process with due attention to ensuring the maintenance of the journal"s high-quality content.