{"title":"Creation of a Russian Stop Word List","authors":"V. A. Yatsko","doi":"10.3103/S0005105522030049","DOIUrl":null,"url":null,"abstract":"<div><div><h3>\n <b>Abstract</b>—</h3><p>This article describes three identifying characteristics of stop words—statistical, semantic, and morphological—and postulates new principles for the creation of stop word lists based on these characteristics. The application of the principles is demonstrated by creating a Russian-language general stop word list based on the analysis of existing sources and frequency distributions in the Russian National Corpus. The resulting list contains 535 stop words.</p></div></div>","PeriodicalId":42995,"journal":{"name":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","volume":"56 3","pages":"138 - 144"},"PeriodicalIF":0.5000,"publicationDate":"2022-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.3103/S0005105522030049","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract—
This article describes three identifying characteristics of stop words—statistical, semantic, and morphological—and postulates new principles for the creation of stop word lists based on these characteristics. The application of the principles is demonstrated by creating a Russian-language general stop word list based on the analysis of existing sources and frequency distributions in the Russian National Corpus. The resulting list contains 535 stop words.
期刊介绍:
Automatic Documentation and Mathematical Linguistics is an international peer reviewed journal that covers all aspects of automation of information processes and systems, as well as algorithms and methods for automatic language analysis. Emphasis is on the practical applications of new technologies and techniques for information analysis and processing.