{"title":"Tools of extraction and procedures of preparation of linguistic data for statistic analysis in the historical corpus «Manuscript»","authors":"V. Baranov, R. Gnutikov","doi":"10.29003/M1797.978-5-317-06529-4/113-119","DOIUrl":null,"url":null,"abstract":"Considered is the task of automatic reduction of the text forms with variable graphics and orthography of the corpus “Manuscript” (manuscripts.ru), comprising exact transcriptions of Slavonic medieval manuscripts, to one and only one lemma, which is necessary for correct statistic analysis of the corpus linguistic data. Several ways and procedures of comparison of normalized forms, which are available in the corpus electronic dictionary, with the text forms are proposed.","PeriodicalId":13026,"journal":{"name":"Historical research in the context of data science: Information resources, analytical methods and digital technologies","volume":"104 2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Historical research in the context of data science: Information resources, analytical methods and digital technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29003/M1797.978-5-317-06529-4/113-119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Considered is the task of automatic reduction of the text forms with variable graphics and orthography of the corpus “Manuscript” (manuscripts.ru), comprising exact transcriptions of Slavonic medieval manuscripts, to one and only one lemma, which is necessary for correct statistic analysis of the corpus linguistic data. Several ways and procedures of comparison of normalized forms, which are available in the corpus electronic dictionary, with the text forms are proposed.