{"title":"Corpus-based typology: applications, challenges and some solutions.","authors":"Natalia Levshina","doi":"10.1515/lingty-2020-0118","DOIUrl":null,"url":null,"abstract":"<p><p>Over the last few years, the number of corpora that can be used for language comparison has dramatically increased. The corpora are so diverse in their structure, size and annotation style, that a novice might not know where to start. The present paper charts this new and changing territory, providing a few landmarks, warning signs and safe paths. Although no corpus at present can replace the traditional type of typological data based on language description in reference grammars, corpora can help with diverse tasks, being particularly well suited for investigating probabilistic and gradient properties of languages and for discovering and interpreting cross-linguistic generalizations based on processing and communicative mechanisms. At the same time, the use of corpora for typological purposes has not only advantages and opportunities, but also numerous challenges. This paper also contains an empirical case study addressing two pertinent problems: the role of text types in language comparison and the problem of the word as a comparative concept.</p>","PeriodicalId":45834,"journal":{"name":"Linguistic Typology","volume":"26 1","pages":"129-160"},"PeriodicalIF":1.7000,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/lingty-2020-0118","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguistic Typology","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1515/lingty-2020-0118","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/3/30 0:00:00","PubModel":"Epub","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 12
Abstract
Over the last few years, the number of corpora that can be used for language comparison has dramatically increased. The corpora are so diverse in their structure, size and annotation style, that a novice might not know where to start. The present paper charts this new and changing territory, providing a few landmarks, warning signs and safe paths. Although no corpus at present can replace the traditional type of typological data based on language description in reference grammars, corpora can help with diverse tasks, being particularly well suited for investigating probabilistic and gradient properties of languages and for discovering and interpreting cross-linguistic generalizations based on processing and communicative mechanisms. At the same time, the use of corpora for typological purposes has not only advantages and opportunities, but also numerous challenges. This paper also contains an empirical case study addressing two pertinent problems: the role of text types in language comparison and the problem of the word as a comparative concept.
期刊介绍:
Linguistic Typology provides a forum for all work of relevance to the study of language typology and cross-linguistic variation. It welcomes work taking a typological perspective on all domains of the structure of spoken and signed languages, including historical change, language processing, and sociolinguistics. Diverse descriptive and theoretical frameworks are welcomed so long as they have a clear bearing on the study of cross-linguistic variation. We welcome cross-disciplinary approaches to the study of linguistic diversity, as well as work dealing with just one or a few languages, as long as it is typologically informed and typologically and theoretically relevant, and contains new empirical evidence.