机器可读编目到机器可理解数据与分布式大数据管理

Q2 Social Sciences Journal of Library Metadata Pub Date : 2018-01-02 DOI:10.1080/19386389.2018.1461177

K. Sharma, U. Marjit, U. Biswas

{"title":"机器可读编目到机器可理解数据与分布式大数据管理","authors":"K. Sharma, U. Marjit, U. Biswas","doi":"10.1080/19386389.2018.1461177","DOIUrl":null,"url":null,"abstract":"ABSTRACT In recent years, the library domain has been using semantic web technologies to enable the data-centric information that can be processed directly by machines. Attempts have been evolved for data transitioning from MAchine-Readable Cataloging (MARC) formats into the Resource Description Framework (RDF). Storing library data in RDF format enhances interlinking and reusing of the resources on the web. Moreover, the machine can interpret library resources meaningfully because of rich source of semantics. Existing approaches rely on the single-node environment but they fail when they meet the large volume of the input data. Some of the bibliographic records in MARC 21 formats are huge in size that traditional data-management tools become incapable during data processing and requires larger storage area. Such data need serious attention by the systems that can perform tasks in parallel. In this article, we propose a distributed approach to convert legacy library data into RDF format using Apache Spark and Hadoop. We describe the process of data conversion from MARC 21 formats for Bibliographic data into RDF and show preliminary reports on the processing speed and storage analysis. The performance of the conversion process is improved in terms of processing time and the storage size.","PeriodicalId":39057,"journal":{"name":"Journal of Library Metadata","volume":"61 1","pages":"13 - 29"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"MAchine Readable Cataloging to MAchine Understandable Data with Distributed Big Data Management\",\"authors\":\"K. Sharma, U. Marjit, U. Biswas\",\"doi\":\"10.1080/19386389.2018.1461177\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT In recent years, the library domain has been using semantic web technologies to enable the data-centric information that can be processed directly by machines. Attempts have been evolved for data transitioning from MAchine-Readable Cataloging (MARC) formats into the Resource Description Framework (RDF). Storing library data in RDF format enhances interlinking and reusing of the resources on the web. Moreover, the machine can interpret library resources meaningfully because of rich source of semantics. Existing approaches rely on the single-node environment but they fail when they meet the large volume of the input data. Some of the bibliographic records in MARC 21 formats are huge in size that traditional data-management tools become incapable during data processing and requires larger storage area. Such data need serious attention by the systems that can perform tasks in parallel. In this article, we propose a distributed approach to convert legacy library data into RDF format using Apache Spark and Hadoop. We describe the process of data conversion from MARC 21 formats for Bibliographic data into RDF and show preliminary reports on the processing speed and storage analysis. The performance of the conversion process is improved in terms of processing time and the storage size.\",\"PeriodicalId\":39057,\"journal\":{\"name\":\"Journal of Library Metadata\",\"volume\":\"61 1\",\"pages\":\"13 - 29\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Library Metadata\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/19386389.2018.1461177\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Library Metadata","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/19386389.2018.1461177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}

引用次数: 2

摘要

近年来，图书馆领域一直在使用语义web技术，使以数据为中心的信息可以由机器直接处理。人们已经尝试将数据从机器可读编目(MARC)格式转换为资源描述框架(RDF)。以RDF格式存储库数据增强了web上资源的相互链接和重用。此外，由于丰富的语义来源，机器可以对图书馆资源进行有意义的解释。现有的方法依赖于单节点环境，但当它们满足大量输入数据时就会失败。MARC 21格式的一些书目记录由于规模巨大，传统的数据管理工具在数据处理过程中无法胜任，需要更大的存储面积。这些数据需要能够并行执行任务的系统认真关注。在本文中，我们提出了一种使用Apache Spark和Hadoop将遗留库数据转换为RDF格式的分布式方法。我们描述了将书目数据从MARC 21格式转换为RDF的过程，并给出了处理速度和存储分析的初步报告。转换过程的性能在处理时间和存储大小方面得到了改善。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MAchine Readable Cataloging to MAchine Understandable Data with Distributed Big Data Management

ABSTRACT In recent years, the library domain has been using semantic web technologies to enable the data-centric information that can be processed directly by machines. Attempts have been evolved for data transitioning from MAchine-Readable Cataloging (MARC) formats into the Resource Description Framework (RDF). Storing library data in RDF format enhances interlinking and reusing of the resources on the web. Moreover, the machine can interpret library resources meaningfully because of rich source of semantics. Existing approaches rely on the single-node environment but they fail when they meet the large volume of the input data. Some of the bibliographic records in MARC 21 formats are huge in size that traditional data-management tools become incapable during data processing and requires larger storage area. Such data need serious attention by the systems that can perform tasks in parallel. In this article, we propose a distributed approach to convert legacy library data into RDF format using Apache Spark and Hadoop. We describe the process of data conversion from MARC 21 formats for Bibliographic data into RDF and show preliminary reports on the processing speed and storage analysis. The performance of the conversion process is improved in terms of processing time and the storage size.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Library Metadata Social Sciences-Library and Information Sciences

CiteScore

2.00

自引率

0.00%

发文量