{"title":"现代突厥文字ExTL扩展西里尔字母符号数据库概述","authors":"A. A. Golubnichiy, A. D. Yablontseva","doi":"10.3103/S0005105522060036","DOIUrl":null,"url":null,"abstract":"<div><div><p><b>Abstract</b>—An algorithm for creating a database containing the characters of the extended Cyrillic alphabets of modern Turkic writing (ExTL) is presented. The methods and technologies of image distortion during the formation of a data set are considered. The resulting character base contains 52 920 images of basic Cyrillic (33 characters in two registers), 29 characters of the extended Cyrillic alphabet in two registers, and 23 punctuation marks using 14 types of distortions. A public interactive interface to the database created by means of the R programming language and Shiny technology is described.</p></div></div>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":null,"pages":null},"PeriodicalIF":0.5000,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Outline of a Database of Symbols of Extended Cyrillic Alphabets of Modern Turkic Writing ExTL\",\"authors\":\"A. A. Golubnichiy, A. D. Yablontseva\",\"doi\":\"10.3103/S0005105522060036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div><p><b>Abstract</b>—An algorithm for creating a database containing the characters of the extended Cyrillic alphabets of modern Turkic writing (ExTL) is presented. The methods and technologies of image distortion during the formation of a data set are considered. The resulting character base contains 52 920 images of basic Cyrillic (33 characters in two registers), 29 characters of the extended Cyrillic alphabet in two registers, and 23 punctuation marks using 14 types of distortions. A public interactive interface to the database created by means of the R programming language and Shiny technology is described.</p></div></div>\",\"PeriodicalId\":42995,\"journal\":{\"name\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2023-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.3103/S0005105522060036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0005105522060036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Outline of a Database of Symbols of Extended Cyrillic Alphabets of Modern Turkic Writing ExTL
Abstract—An algorithm for creating a database containing the characters of the extended Cyrillic alphabets of modern Turkic writing (ExTL) is presented. The methods and technologies of image distortion during the formation of a data set are considered. The resulting character base contains 52 920 images of basic Cyrillic (33 characters in two registers), 29 characters of the extended Cyrillic alphabet in two registers, and 23 punctuation marks using 14 types of distortions. A public interactive interface to the database created by means of the R programming language and Shiny technology is described.
期刊介绍:
Automatic Documentation and Mathematical Linguistics is an international peer reviewed journal that covers all aspects of automation of information processes and systems, as well as algorithms and methods for automatic language analysis. Emphasis is on the practical applications of new technologies and techniques for information analysis and processing.