Investigating the Capabilities of Recurrent Neural Networks for Solving the Problem of Classifying Poorly Structured Information on the Example of Bibliographic Data

Q4 Engineering Russian Microelectronics Pub Date : 2024-02-15 DOI:10.1134/s1063739723070120
E. N. Petrov, E. M. Portnov
{"title":"Investigating the Capabilities of Recurrent Neural Networks for Solving the Problem of Classifying Poorly Structured Information on the Example of Bibliographic Data","authors":"E. N. Petrov, E. M. Portnov","doi":"10.1134/s1063739723070120","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">\n<b>Abstract</b>—</h3><p>With the development of information technology, new fields of automatic data processing are becoming available, including bibliographic data. When information is collected from different sources and contains nonuniformly structured bibliographic records with formatting mistakes, transmitting the data to a summary table takes considerable time and effort and the result is subject to the influence of the human factor. Consequently, automatic bibliographic data processing is relevant and in demand. This paper investigates the capabilities of recurrent neural networks (RNSs) in relation to solving the problem of classifying poorly structured bibliographic information. It is shown that in order to use a RNS, it is necessary to change from the natural presentation of the bibliographic data collected to an indicative one, i.e., to present the data as a set of features. Selecting such a set of features is a separate complex problem. The developed RNS structure is implemented using the Python programming language. To evaluate the developed software module’s performance, a test set was formed from the publications list of the National Research University of Electronic Technology’s (MIET) Institute of Systems and Software Engineers and Information Technology, covering the past five years. An accuracy of 86%, which is 11% higher than the result obtained using a feed-forward neural network, is attained. The developed feature set and RNS structure allow automated bibliographic data processing, followed by the mandatory correction of the results by an operator.</p>","PeriodicalId":21534,"journal":{"name":"Russian Microelectronics","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Russian Microelectronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1134/s1063739723070120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 0

Abstract

With the development of information technology, new fields of automatic data processing are becoming available, including bibliographic data. When information is collected from different sources and contains nonuniformly structured bibliographic records with formatting mistakes, transmitting the data to a summary table takes considerable time and effort and the result is subject to the influence of the human factor. Consequently, automatic bibliographic data processing is relevant and in demand. This paper investigates the capabilities of recurrent neural networks (RNSs) in relation to solving the problem of classifying poorly structured bibliographic information. It is shown that in order to use a RNS, it is necessary to change from the natural presentation of the bibliographic data collected to an indicative one, i.e., to present the data as a set of features. Selecting such a set of features is a separate complex problem. The developed RNS structure is implemented using the Python programming language. To evaluate the developed software module’s performance, a test set was formed from the publications list of the National Research University of Electronic Technology’s (MIET) Institute of Systems and Software Engineers and Information Technology, covering the past five years. An accuracy of 86%, which is 11% higher than the result obtained using a feed-forward neural network, is attained. The developed feature set and RNS structure allow automated bibliographic data processing, followed by the mandatory correction of the results by an operator.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
以书目数据为例,研究递归神经网络解决结构不良信息分类问题的能力
摘要--随着信息技术的发展,自动数据处理正在进入新的领域,其中包括书目数据。当信息从不同来源收集,并且包含结构不统一、格式错误的书目记录时,将数据传输到汇总表需要花费大量的时间和精力,其结果也会受到人为因素的影响。因此,自动书目数据处理具有现实意义和需求。本文研究了递归神经网络(RNS)在解决结构不良书目信息分类问题方面的能力。研究表明,为了使用 RNS,有必要将所收集书目数据的自然呈现方式改为指示性呈现方式,即以一组特征的形式呈现数据。选择这样一组特征是一个单独的复杂问题。所开发的 RNS 结构使用 Python 编程语言实现。为了评估所开发软件模块的性能,我们从国立电子科技大学(MIET)系统与软件工程师和信息技术研究所过去五年的出版物列表中创建了一个测试集。准确率达到了 86%,比使用前馈神经网络获得的结果高出 11%。所开发的特征集和 RNS 结构允许自动处理书目数据,然后由操作员对结果进行强制性修正。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Russian Microelectronics
Russian Microelectronics Materials Science-Materials Chemistry
CiteScore
0.70
自引率
0.00%
发文量
43
期刊介绍: Russian Microelectronics  covers physical, technological, and some VLSI and ULSI circuit-technical aspects of microelectronics and nanoelectronics; it informs the reader of new trends in submicron optical, x-ray, electron, and ion-beam lithography technology; dry processing techniques, etching, doping; and deposition and planarization technology. Significant space is devoted to problems arising in the application of proton, electron, and ion beams, plasma, etc. Consideration is given to new equipment, including cluster tools and control in situ and submicron CMOS, bipolar, and BICMOS technologies. The journal publishes papers addressing problems of molecular beam epitaxy and related processes; heterojunction devices and integrated circuits; the technology and devices of nanoelectronics; and the fabrication of nanometer scale devices, including new device structures, quantum-effect devices, and superconducting devices. The reader will find papers containing news of the diagnostics of surfaces and microelectronic structures, the modeling of technological processes and devices in micro- and nanoelectronics, including nanotransistors, and solid state qubits.
期刊最新文献
A Comprehensive Study of Nonuniformity Properties of the LiCoO2 Thin-Film Cathode Fabricated by RF Sputtering Structure and Formation of Superflash Nonvolatile Memory Cells Influence of Laser Radiation on Functional Properties MOS Device Structures Simulation of Silicon Field-Effect Conical GAA Nanotransistors with a Stacked SiO2/HfO2 Subgate Dielectric Influence of Hydrogen Additive on Electrophysical Parameters and Emission Spectra of Tetrafluoromethane Plasma
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1