{"title":"历史语料库«手稿»中统计分析语言数据的提取工具和准备程序","authors":"V. Baranov, R. Gnutikov","doi":"10.29003/M1797.978-5-317-06529-4/113-119","DOIUrl":null,"url":null,"abstract":"Considered is the task of automatic reduction of the text forms with variable graphics and orthography of the corpus “Manuscript” (manuscripts.ru), comprising exact transcriptions of Slavonic medieval manuscripts, to one and only one lemma, which is necessary for correct statistic analysis of the corpus linguistic data. Several ways and procedures of comparison of normalized forms, which are available in the corpus electronic dictionary, with the text forms are proposed.","PeriodicalId":13026,"journal":{"name":"Historical research in the context of data science: Information resources, analytical methods and digital technologies","volume":"104 2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Tools of extraction and procedures of preparation of linguistic data for statistic analysis in the historical corpus «Manuscript»\",\"authors\":\"V. Baranov, R. Gnutikov\",\"doi\":\"10.29003/M1797.978-5-317-06529-4/113-119\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Considered is the task of automatic reduction of the text forms with variable graphics and orthography of the corpus “Manuscript” (manuscripts.ru), comprising exact transcriptions of Slavonic medieval manuscripts, to one and only one lemma, which is necessary for correct statistic analysis of the corpus linguistic data. Several ways and procedures of comparison of normalized forms, which are available in the corpus electronic dictionary, with the text forms are proposed.\",\"PeriodicalId\":13026,\"journal\":{\"name\":\"Historical research in the context of data science: Information resources, analytical methods and digital technologies\",\"volume\":\"104 2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Historical research in the context of data science: Information resources, analytical methods and digital technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.29003/M1797.978-5-317-06529-4/113-119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Historical research in the context of data science: Information resources, analytical methods and digital technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.29003/M1797.978-5-317-06529-4/113-119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Tools of extraction and procedures of preparation of linguistic data for statistic analysis in the historical corpus «Manuscript»
Considered is the task of automatic reduction of the text forms with variable graphics and orthography of the corpus “Manuscript” (manuscripts.ru), comprising exact transcriptions of Slavonic medieval manuscripts, to one and only one lemma, which is necessary for correct statistic analysis of the corpus linguistic data. Several ways and procedures of comparison of normalized forms, which are available in the corpus electronic dictionary, with the text forms are proposed.