DEVELOPMENT OF A MODIFIED WINNOWING METHOD FOR AGGREGATING BIBLIOGRAPHIC INFORMATION DATA FROM CITATION SYSTEMS UNDER THE CONDITIONS OF INCOMPLETE INFORMATION
I. Bolodurina, Yu. P. Ivanova, L. M. Antsiferova, V. D. Blinov
{"title":"DEVELOPMENT OF A MODIFIED WINNOWING METHOD FOR AGGREGATING BIBLIOGRAPHIC INFORMATION DATA FROM CITATION SYSTEMS UNDER THE CONDITIONS OF INCOMPLETE INFORMATION","authors":"I. Bolodurina, Yu. P. Ivanova, L. M. Antsiferova, V. D. Blinov","doi":"10.14529/ctcr200413","DOIUrl":null,"url":null,"abstract":"Currently, transition to the electronic presentation of bibliographic information about scientific works has caused an increased interest in scientometric research. At the same time, the existing sci-entometric methods are criticized by scientists, since the incomplete bibliographic base and tools for its assessment do not allow the most accurate assessment of the contribution of scientific work. The problem of the quality of scientometric assessments, as a rule, is based on the study of the data of a certain citation system, which does not include complete information about all publications of the authors contained in other citation systems. Aim. This study is aimed at developing an adaptive ap-proach for the formation of aggregated data of bibliographic information of a scientific organiza-tion in conditions of incomplete information from the citation systems of the RSCI, “Google Aca-demy” and Scopus. Methods. The definition of the aggregated list of publications for the analysis of scientometric indicators was carried out by the Winnowing method, the Levenshtein algorithm, the shingle method and the Jaro–Winkler method. In the framework of the experimental study, the effectiveness of the application of the considered methods for aggregating information from cita-tion systems was assessed based on the analysis of accuracy, completeness and F-measure. Results. Expe¬riments on test data from the list of publications by authors of the Orenburg State University from the citation systems RSCI, Google Academy and Scopus showed that the Winnowing method formed the most accurate lists of publications by the F-measure criterion. To improve the perfor-mance of this algorithm, a two-stage optimization of the aggregation process was carried out, which made it possible to improve the running time of the algorithm when generating a list of bibliographic descriptions. Conclusion. The proposed approach for the formation of aggregated data of biblio-graphic information of a scientific organization in conditions of incomplete information from the ci-tation systems of the Russian Science Citation Index, Google Academy and Scopus allows increas-ing productivity in the formation of a list of authors' publications and shows good efficiency in de-termining the scientometric characteristics of authors.","PeriodicalId":338904,"journal":{"name":"Bulletin of the South Ural State University. Ser. Computer Technologies, Automatic Control & Radioelectronics","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of the South Ural State University. Ser. Computer Technologies, Automatic Control & Radioelectronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14529/ctcr200413","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Currently, transition to the electronic presentation of bibliographic information about scientific works has caused an increased interest in scientometric research. At the same time, the existing sci-entometric methods are criticized by scientists, since the incomplete bibliographic base and tools for its assessment do not allow the most accurate assessment of the contribution of scientific work. The problem of the quality of scientometric assessments, as a rule, is based on the study of the data of a certain citation system, which does not include complete information about all publications of the authors contained in other citation systems. Aim. This study is aimed at developing an adaptive ap-proach for the formation of aggregated data of bibliographic information of a scientific organiza-tion in conditions of incomplete information from the citation systems of the RSCI, “Google Aca-demy” and Scopus. Methods. The definition of the aggregated list of publications for the analysis of scientometric indicators was carried out by the Winnowing method, the Levenshtein algorithm, the shingle method and the Jaro–Winkler method. In the framework of the experimental study, the effectiveness of the application of the considered methods for aggregating information from cita-tion systems was assessed based on the analysis of accuracy, completeness and F-measure. Results. Expe¬riments on test data from the list of publications by authors of the Orenburg State University from the citation systems RSCI, Google Academy and Scopus showed that the Winnowing method formed the most accurate lists of publications by the F-measure criterion. To improve the perfor-mance of this algorithm, a two-stage optimization of the aggregation process was carried out, which made it possible to improve the running time of the algorithm when generating a list of bibliographic descriptions. Conclusion. The proposed approach for the formation of aggregated data of biblio-graphic information of a scientific organization in conditions of incomplete information from the ci-tation systems of the Russian Science Citation Index, Google Academy and Scopus allows increas-ing productivity in the formation of a list of authors' publications and shows good efficiency in de-termining the scientometric characteristics of authors.