Investigating the Capabilities of Recurrent Neural Networks for Solving the Problem of Classifying Poorly Structured Information on the Example of Bibliographic Data
{"title":"Investigating the Capabilities of Recurrent Neural Networks for Solving the Problem of Classifying Poorly Structured Information on the Example of Bibliographic Data","authors":"E. N. Petrov, E. M. Portnov","doi":"10.1134/s1063739723070120","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">\n<b>Abstract</b>—</h3><p>With the development of information technology, new fields of automatic data processing are becoming available, including bibliographic data. When information is collected from different sources and contains nonuniformly structured bibliographic records with formatting mistakes, transmitting the data to a summary table takes considerable time and effort and the result is subject to the influence of the human factor. Consequently, automatic bibliographic data processing is relevant and in demand. This paper investigates the capabilities of recurrent neural networks (RNSs) in relation to solving the problem of classifying poorly structured bibliographic information. It is shown that in order to use a RNS, it is necessary to change from the natural presentation of the bibliographic data collected to an indicative one, i.e., to present the data as a set of features. Selecting such a set of features is a separate complex problem. The developed RNS structure is implemented using the Python programming language. To evaluate the developed software module’s performance, a test set was formed from the publications list of the National Research University of Electronic Technology’s (MIET) Institute of Systems and Software Engineers and Information Technology, covering the past five years. An accuracy of 86%, which is 11% higher than the result obtained using a feed-forward neural network, is attained. The developed feature set and RNS structure allow automated bibliographic data processing, followed by the mandatory correction of the results by an operator.</p>","PeriodicalId":21534,"journal":{"name":"Russian Microelectronics","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Russian Microelectronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1134/s1063739723070120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0
Abstract—
With the development of information technology, new fields of automatic data processing are becoming available, including bibliographic data. When information is collected from different sources and contains nonuniformly structured bibliographic records with formatting mistakes, transmitting the data to a summary table takes considerable time and effort and the result is subject to the influence of the human factor. Consequently, automatic bibliographic data processing is relevant and in demand. This paper investigates the capabilities of recurrent neural networks (RNSs) in relation to solving the problem of classifying poorly structured bibliographic information. It is shown that in order to use a RNS, it is necessary to change from the natural presentation of the bibliographic data collected to an indicative one, i.e., to present the data as a set of features. Selecting such a set of features is a separate complex problem. The developed RNS structure is implemented using the Python programming language. To evaluate the developed software module’s performance, a test set was formed from the publications list of the National Research University of Electronic Technology’s (MIET) Institute of Systems and Software Engineers and Information Technology, covering the past five years. An accuracy of 86%, which is 11% higher than the result obtained using a feed-forward neural network, is attained. The developed feature set and RNS structure allow automated bibliographic data processing, followed by the mandatory correction of the results by an operator.
期刊介绍:
Russian Microelectronics covers physical, technological, and some VLSI and ULSI circuit-technical aspects of microelectronics and nanoelectronics; it informs the reader of new trends in submicron optical, x-ray, electron, and ion-beam lithography technology; dry processing techniques, etching, doping; and deposition and planarization technology. Significant space is devoted to problems arising in the application of proton, electron, and ion beams, plasma, etc. Consideration is given to new equipment, including cluster tools and control in situ and submicron CMOS, bipolar, and BICMOS technologies. The journal publishes papers addressing problems of molecular beam epitaxy and related processes; heterojunction devices and integrated circuits; the technology and devices of nanoelectronics; and the fabrication of nanometer scale devices, including new device structures, quantum-effect devices, and superconducting devices. The reader will find papers containing news of the diagnostics of surfaces and microelectronic structures, the modeling of technological processes and devices in micro- and nanoelectronics, including nanotransistors, and solid state qubits.