I. Bolodurina, Yu. P. Ivanova, L. M. Antsiferova, V. D. Blinov
{"title":"在不完全信息条件下从引文系统中收集书目信息数据的改进筛选方法的开发","authors":"I. Bolodurina, Yu. P. Ivanova, L. M. Antsiferova, V. D. Blinov","doi":"10.14529/ctcr200413","DOIUrl":null,"url":null,"abstract":"Currently, transition to the electronic presentation of bibliographic information about scientific works has caused an increased interest in scientometric research. At the same time, the existing sci-entometric methods are criticized by scientists, since the incomplete bibliographic base and tools for its assessment do not allow the most accurate assessment of the contribution of scientific work. The problem of the quality of scientometric assessments, as a rule, is based on the study of the data of a certain citation system, which does not include complete information about all publications of the authors contained in other citation systems. Aim. This study is aimed at developing an adaptive ap-proach for the formation of aggregated data of bibliographic information of a scientific organiza-tion in conditions of incomplete information from the citation systems of the RSCI, “Google Aca-demy” and Scopus. Methods. The definition of the aggregated list of publications for the analysis of scientometric indicators was carried out by the Winnowing method, the Levenshtein algorithm, the shingle method and the Jaro–Winkler method. In the framework of the experimental study, the effectiveness of the application of the considered methods for aggregating information from cita-tion systems was assessed based on the analysis of accuracy, completeness and F-measure. Results. Expe¬riments on test data from the list of publications by authors of the Orenburg State University from the citation systems RSCI, Google Academy and Scopus showed that the Winnowing method formed the most accurate lists of publications by the F-measure criterion. To improve the perfor-mance of this algorithm, a two-stage optimization of the aggregation process was carried out, which made it possible to improve the running time of the algorithm when generating a list of bibliographic descriptions. Conclusion. The proposed approach for the formation of aggregated data of biblio-graphic information of a scientific organization in conditions of incomplete information from the ci-tation systems of the Russian Science Citation Index, Google Academy and Scopus allows increas-ing productivity in the formation of a list of authors' publications and shows good efficiency in de-termining the scientometric characteristics of authors.","PeriodicalId":338904,"journal":{"name":"Bulletin of the South Ural State University. Ser. Computer Technologies, Automatic Control & Radioelectronics","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DEVELOPMENT OF A MODIFIED WINNOWING METHOD FOR AGGREGATING BIBLIOGRAPHIC INFORMATION DATA FROM CITATION SYSTEMS UNDER THE CONDITIONS OF INCOMPLETE INFORMATION\",\"authors\":\"I. Bolodurina, Yu. P. Ivanova, L. M. Antsiferova, V. D. Blinov\",\"doi\":\"10.14529/ctcr200413\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently, transition to the electronic presentation of bibliographic information about scientific works has caused an increased interest in scientometric research. At the same time, the existing sci-entometric methods are criticized by scientists, since the incomplete bibliographic base and tools for its assessment do not allow the most accurate assessment of the contribution of scientific work. The problem of the quality of scientometric assessments, as a rule, is based on the study of the data of a certain citation system, which does not include complete information about all publications of the authors contained in other citation systems. Aim. This study is aimed at developing an adaptive ap-proach for the formation of aggregated data of bibliographic information of a scientific organiza-tion in conditions of incomplete information from the citation systems of the RSCI, “Google Aca-demy” and Scopus. Methods. The definition of the aggregated list of publications for the analysis of scientometric indicators was carried out by the Winnowing method, the Levenshtein algorithm, the shingle method and the Jaro–Winkler method. In the framework of the experimental study, the effectiveness of the application of the considered methods for aggregating information from cita-tion systems was assessed based on the analysis of accuracy, completeness and F-measure. Results. Expe¬riments on test data from the list of publications by authors of the Orenburg State University from the citation systems RSCI, Google Academy and Scopus showed that the Winnowing method formed the most accurate lists of publications by the F-measure criterion. To improve the perfor-mance of this algorithm, a two-stage optimization of the aggregation process was carried out, which made it possible to improve the running time of the algorithm when generating a list of bibliographic descriptions. Conclusion. The proposed approach for the formation of aggregated data of biblio-graphic information of a scientific organization in conditions of incomplete information from the ci-tation systems of the Russian Science Citation Index, Google Academy and Scopus allows increas-ing productivity in the formation of a list of authors' publications and shows good efficiency in de-termining the scientometric characteristics of authors.\",\"PeriodicalId\":338904,\"journal\":{\"name\":\"Bulletin of the South Ural State University. Ser. Computer Technologies, Automatic Control & Radioelectronics\",\"volume\":\"95 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bulletin of the South Ural State University. Ser. Computer Technologies, Automatic Control & Radioelectronics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14529/ctcr200413\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of the South Ural State University. Ser. Computer Technologies, Automatic Control & Radioelectronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14529/ctcr200413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
目前,向科学著作书目信息的电子表示的过渡引起了人们对科学计量学研究的兴趣。与此同时,现有的科学计量学方法也受到了科学家的批评,因为其评估的书目基础和工具不完整,无法最准确地评估科学工作的贡献。通常,科学计量学评估的质量问题是基于对某一引文系统数据的研究,而该系统不包括其他引文系统中作者所有出版物的完整信息。的目标。本研究旨在开发一种在RSCI、“Google academy -demy”和Scopus引文系统信息不完全的情况下,形成科学组织书目信息汇总数据的自适应方法。方法。采用Winnowing法、Levenshtein算法、shingle法和Jaro-Winkler法定义用于科学计量指标分析的出版物汇总列表。在实验研究的框架内,基于准确性、完整性和F-measure分析,评估了所考虑的引文系统信息聚合方法的有效性。结果。对来自RSCI、Google Academy和Scopus引文系统的奥伦堡州立大学作者的出版物列表的测试数据进行了实验,结果表明,根据F-measure标准,Winnowing方法形成了最准确的出版物列表。为了提高算法的性能,对聚合过程进行了两阶段优化,使得算法在生成书目描述列表时的运行时间得以提高。结论。在俄罗斯科学引文索引(Russian Science Citation Index)、Google Academy和Scopus等引文系统信息不完全的情况下,提出的形成科学组织书目信息汇总数据的方法可以提高作者出版物列表的形成效率,并在确定作者的科学计量特征方面显示出良好的效率。
DEVELOPMENT OF A MODIFIED WINNOWING METHOD FOR AGGREGATING BIBLIOGRAPHIC INFORMATION DATA FROM CITATION SYSTEMS UNDER THE CONDITIONS OF INCOMPLETE INFORMATION
Currently, transition to the electronic presentation of bibliographic information about scientific works has caused an increased interest in scientometric research. At the same time, the existing sci-entometric methods are criticized by scientists, since the incomplete bibliographic base and tools for its assessment do not allow the most accurate assessment of the contribution of scientific work. The problem of the quality of scientometric assessments, as a rule, is based on the study of the data of a certain citation system, which does not include complete information about all publications of the authors contained in other citation systems. Aim. This study is aimed at developing an adaptive ap-proach for the formation of aggregated data of bibliographic information of a scientific organiza-tion in conditions of incomplete information from the citation systems of the RSCI, “Google Aca-demy” and Scopus. Methods. The definition of the aggregated list of publications for the analysis of scientometric indicators was carried out by the Winnowing method, the Levenshtein algorithm, the shingle method and the Jaro–Winkler method. In the framework of the experimental study, the effectiveness of the application of the considered methods for aggregating information from cita-tion systems was assessed based on the analysis of accuracy, completeness and F-measure. Results. Expe¬riments on test data from the list of publications by authors of the Orenburg State University from the citation systems RSCI, Google Academy and Scopus showed that the Winnowing method formed the most accurate lists of publications by the F-measure criterion. To improve the perfor-mance of this algorithm, a two-stage optimization of the aggregation process was carried out, which made it possible to improve the running time of the algorithm when generating a list of bibliographic descriptions. Conclusion. The proposed approach for the formation of aggregated data of biblio-graphic information of a scientific organization in conditions of incomplete information from the ci-tation systems of the Russian Science Citation Index, Google Academy and Scopus allows increas-ing productivity in the formation of a list of authors' publications and shows good efficiency in de-termining the scientometric characteristics of authors.