{"title":"收集和评估汇集在WorldCat数据库中的大量书目元数据:克服挑战的建议方法","authors":"Vyacheslav Zavalin, Shawne D. Miksa","doi":"10.1108/el-11-2020-0316","DOIUrl":null,"url":null,"abstract":"\nPurpose\nThis paper aims to discuss the challenges encountered in collecting, cleaning and analyzing the large data set of bibliographic metadata records in machine-readable cataloging [MARC 21] format. Possible solutions are presented.\n\n\nDesign/methodology/approach\nThis mixed method study relied on content analysis and social network analysis. The study examined subject representation in MARC 21 metadata records created in 2020 in WorldCat – the largest international database of “big smart data.” The methodological challenges that were encountered and solutions are examined.\n\n\nFindings\nIn this general review paper with a focus on methodological issues, the discussion of challenges is followed by a discussion of solutions developed and tested as part of this study. Data collection, processing, analysis and visualization are addressed separately. Lessons learned and conclusions related to challenges and solutions for the design of a large-scale study evaluating MARC 21 bibliographic metadata from WorldCat are given. Overall recommendations for the design and implementation of future research are suggested.\n\n\nOriginality/value\nThere are no previous publications that address the challenges and solutions of data collection and analysis of WorldCat’s “big smart data” in the form of MARC 21 data. This is the first study to use a large data set to systematically examine MARC 21 library metadata records created after the most recent addition of new fields and subfields to MARC 21 Bibliographic Format standard in 2019 based on resource description and access rules. It is also the first to focus its analyzes on the networks formed by subject terms shared by MARC 21 bibliographic records in a data set extracted from a heterogeneous centralized database WorldCat.\n","PeriodicalId":330882,"journal":{"name":"Electron. Libr.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Collecting and evaluating large volumes of bibliographic metadata aggregated in the WorldCat database: a proposed methodology to overcome challenges\",\"authors\":\"Vyacheslav Zavalin, Shawne D. Miksa\",\"doi\":\"10.1108/el-11-2020-0316\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\nPurpose\\nThis paper aims to discuss the challenges encountered in collecting, cleaning and analyzing the large data set of bibliographic metadata records in machine-readable cataloging [MARC 21] format. Possible solutions are presented.\\n\\n\\nDesign/methodology/approach\\nThis mixed method study relied on content analysis and social network analysis. The study examined subject representation in MARC 21 metadata records created in 2020 in WorldCat – the largest international database of “big smart data.” The methodological challenges that were encountered and solutions are examined.\\n\\n\\nFindings\\nIn this general review paper with a focus on methodological issues, the discussion of challenges is followed by a discussion of solutions developed and tested as part of this study. Data collection, processing, analysis and visualization are addressed separately. Lessons learned and conclusions related to challenges and solutions for the design of a large-scale study evaluating MARC 21 bibliographic metadata from WorldCat are given. Overall recommendations for the design and implementation of future research are suggested.\\n\\n\\nOriginality/value\\nThere are no previous publications that address the challenges and solutions of data collection and analysis of WorldCat’s “big smart data” in the form of MARC 21 data. This is the first study to use a large data set to systematically examine MARC 21 library metadata records created after the most recent addition of new fields and subfields to MARC 21 Bibliographic Format standard in 2019 based on resource description and access rules. It is also the first to focus its analyzes on the networks formed by subject terms shared by MARC 21 bibliographic records in a data set extracted from a heterogeneous centralized database WorldCat.\\n\",\"PeriodicalId\":330882,\"journal\":{\"name\":\"Electron. Libr.\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electron. Libr.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/el-11-2020-0316\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electron. Libr.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/el-11-2020-0316","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
目的探讨机器可读编目[MARC 21]格式的大型书目元数据记录的收集、清理和分析所面临的挑战。提出了可能的解决方案。设计/方法/方法这种混合方法研究依赖于内容分析和社会网络分析。该研究检查了2020年在世界最大的“大智能数据”国际数据库WorldCat中创建的MARC 21元数据记录中的主题表示。所遇到的方法上的挑战和解决方案进行了审查。在这篇以方法论问题为重点的综述文章中,首先讨论了挑战,然后讨论了作为本研究一部分开发和测试的解决方案。数据的收集、处理、分析和可视化分别进行了讨论。本文给出了设计一项评估WorldCat MARC 21书目元数据的大规模研究的经验教训和相关的挑战和解决方案。对未来研究的设计和实施提出了总体建议。原创性/价值以前没有出版物以MARC 21数据的形式解决WorldCat的“大智能数据”的数据收集和分析的挑战和解决方案。这是第一个使用大型数据集系统检查MARC 21图书馆元数据记录的研究,该记录是在2019年根据资源描述和访问规则向MARC 21书目格式标准添加新字段和子字段后创建的。它也是第一个集中分析由从异构中央数据库WorldCat提取的数据集中的MARC 21书目记录共享的主题术语所形成的网络。
Collecting and evaluating large volumes of bibliographic metadata aggregated in the WorldCat database: a proposed methodology to overcome challenges
Purpose
This paper aims to discuss the challenges encountered in collecting, cleaning and analyzing the large data set of bibliographic metadata records in machine-readable cataloging [MARC 21] format. Possible solutions are presented.
Design/methodology/approach
This mixed method study relied on content analysis and social network analysis. The study examined subject representation in MARC 21 metadata records created in 2020 in WorldCat – the largest international database of “big smart data.” The methodological challenges that were encountered and solutions are examined.
Findings
In this general review paper with a focus on methodological issues, the discussion of challenges is followed by a discussion of solutions developed and tested as part of this study. Data collection, processing, analysis and visualization are addressed separately. Lessons learned and conclusions related to challenges and solutions for the design of a large-scale study evaluating MARC 21 bibliographic metadata from WorldCat are given. Overall recommendations for the design and implementation of future research are suggested.
Originality/value
There are no previous publications that address the challenges and solutions of data collection and analysis of WorldCat’s “big smart data” in the form of MARC 21 data. This is the first study to use a large data set to systematically examine MARC 21 library metadata records created after the most recent addition of new fields and subfields to MARC 21 Bibliographic Format standard in 2019 based on resource description and access rules. It is also the first to focus its analyzes on the networks formed by subject terms shared by MARC 21 bibliographic records in a data set extracted from a heterogeneous centralized database WorldCat.