{"title":"Approximation of the Meaning for Thematic Subject Headings by Simple Interpretable Representations","authors":"R. V. Sulzhenko, B. V. Dobrov","doi":"10.1134/s1995080224600778","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Abstract</h3><p>The paper studies methods for approximating a user labeled topics by simple representations in a text classification problem. It is assumed that in real information systems the meaning of thematic categories can be approximated by a fairly simple interpreted expression. An algorithm for constructing formulas is considered, which constructs a representation of a text topic in the form of a Boolean formula—in fact, a request to a full-text information system. The algorithm is based on an optimized selection of various logical predicates with words and terms from the thesaurus. The presented algorithm has been compared with modern machine learning techniques on real collections with noisy expert markup. The described method can be used for text classification, expert evaluation of the content of the heading, assessment of the complexity of the description of the topic, and correcting the markup.</p>","PeriodicalId":46135,"journal":{"name":"Lobachevskii Journal of Mathematics","volume":"15 1","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Lobachevskii Journal of Mathematics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1134/s1995080224600778","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
The paper studies methods for approximating a user labeled topics by simple representations in a text classification problem. It is assumed that in real information systems the meaning of thematic categories can be approximated by a fairly simple interpreted expression. An algorithm for constructing formulas is considered, which constructs a representation of a text topic in the form of a Boolean formula—in fact, a request to a full-text information system. The algorithm is based on an optimized selection of various logical predicates with words and terms from the thesaurus. The presented algorithm has been compared with modern machine learning techniques on real collections with noisy expert markup. The described method can be used for text classification, expert evaluation of the content of the heading, assessment of the complexity of the description of the topic, and correcting the markup.
期刊介绍:
Lobachevskii Journal of Mathematics is an international peer reviewed journal published in collaboration with the Russian Academy of Sciences and Kazan Federal University. The journal covers mathematical topics associated with the name of famous Russian mathematician Nikolai Lobachevsky (Lobachevskii). The journal publishes research articles on geometry and topology, algebra, complex analysis, functional analysis, differential equations and mathematical physics, probability theory and stochastic processes, computational mathematics, mathematical modeling, numerical methods and program complexes, computer science, optimal control, and theory of algorithms as well as applied mathematics. The journal welcomes manuscripts from all countries in the English language.