机器可读编目到机器可理解数据与分布式大数据管理

Q2 Social Sciences Journal of Library Metadata Pub Date : 2018-01-02 DOI:10.1080/19386389.2018.1461177
K. Sharma, U. Marjit, U. Biswas
{"title":"机器可读编目到机器可理解数据与分布式大数据管理","authors":"K. Sharma, U. Marjit, U. Biswas","doi":"10.1080/19386389.2018.1461177","DOIUrl":null,"url":null,"abstract":"ABSTRACT In recent years, the library domain has been using semantic web technologies to enable the data-centric information that can be processed directly by machines. Attempts have been evolved for data transitioning from MAchine-Readable Cataloging (MARC) formats into the Resource Description Framework (RDF). Storing library data in RDF format enhances interlinking and reusing of the resources on the web. Moreover, the machine can interpret library resources meaningfully because of rich source of semantics. Existing approaches rely on the single-node environment but they fail when they meet the large volume of the input data. Some of the bibliographic records in MARC 21 formats are huge in size that traditional data-management tools become incapable during data processing and requires larger storage area. Such data need serious attention by the systems that can perform tasks in parallel. In this article, we propose a distributed approach to convert legacy library data into RDF format using Apache Spark and Hadoop. We describe the process of data conversion from MARC 21 formats for Bibliographic data into RDF and show preliminary reports on the processing speed and storage analysis. The performance of the conversion process is improved in terms of processing time and the storage size.","PeriodicalId":39057,"journal":{"name":"Journal of Library Metadata","volume":"61 1","pages":"13 - 29"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"MAchine Readable Cataloging to MAchine Understandable Data with Distributed Big Data Management\",\"authors\":\"K. Sharma, U. Marjit, U. Biswas\",\"doi\":\"10.1080/19386389.2018.1461177\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ABSTRACT In recent years, the library domain has been using semantic web technologies to enable the data-centric information that can be processed directly by machines. Attempts have been evolved for data transitioning from MAchine-Readable Cataloging (MARC) formats into the Resource Description Framework (RDF). Storing library data in RDF format enhances interlinking and reusing of the resources on the web. Moreover, the machine can interpret library resources meaningfully because of rich source of semantics. Existing approaches rely on the single-node environment but they fail when they meet the large volume of the input data. Some of the bibliographic records in MARC 21 formats are huge in size that traditional data-management tools become incapable during data processing and requires larger storage area. Such data need serious attention by the systems that can perform tasks in parallel. In this article, we propose a distributed approach to convert legacy library data into RDF format using Apache Spark and Hadoop. We describe the process of data conversion from MARC 21 formats for Bibliographic data into RDF and show preliminary reports on the processing speed and storage analysis. The performance of the conversion process is improved in terms of processing time and the storage size.\",\"PeriodicalId\":39057,\"journal\":{\"name\":\"Journal of Library Metadata\",\"volume\":\"61 1\",\"pages\":\"13 - 29\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Library Metadata\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/19386389.2018.1461177\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Library Metadata","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/19386389.2018.1461177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Social Sciences","Score":null,"Total":0}
引用次数: 2

摘要

近年来,图书馆领域一直在使用语义web技术,使以数据为中心的信息可以由机器直接处理。人们已经尝试将数据从机器可读编目(MARC)格式转换为资源描述框架(RDF)。以RDF格式存储库数据增强了web上资源的相互链接和重用。此外,由于丰富的语义来源,机器可以对图书馆资源进行有意义的解释。现有的方法依赖于单节点环境,但当它们满足大量输入数据时就会失败。MARC 21格式的一些书目记录由于规模巨大,传统的数据管理工具在数据处理过程中无法胜任,需要更大的存储面积。这些数据需要能够并行执行任务的系统认真关注。在本文中,我们提出了一种使用Apache Spark和Hadoop将遗留库数据转换为RDF格式的分布式方法。我们描述了将书目数据从MARC 21格式转换为RDF的过程,并给出了处理速度和存储分析的初步报告。转换过程的性能在处理时间和存储大小方面得到了改善。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
MAchine Readable Cataloging to MAchine Understandable Data with Distributed Big Data Management
ABSTRACT In recent years, the library domain has been using semantic web technologies to enable the data-centric information that can be processed directly by machines. Attempts have been evolved for data transitioning from MAchine-Readable Cataloging (MARC) formats into the Resource Description Framework (RDF). Storing library data in RDF format enhances interlinking and reusing of the resources on the web. Moreover, the machine can interpret library resources meaningfully because of rich source of semantics. Existing approaches rely on the single-node environment but they fail when they meet the large volume of the input data. Some of the bibliographic records in MARC 21 formats are huge in size that traditional data-management tools become incapable during data processing and requires larger storage area. Such data need serious attention by the systems that can perform tasks in parallel. In this article, we propose a distributed approach to convert legacy library data into RDF format using Apache Spark and Hadoop. We describe the process of data conversion from MARC 21 formats for Bibliographic data into RDF and show preliminary reports on the processing speed and storage analysis. The performance of the conversion process is improved in terms of processing time and the storage size.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Library Metadata
Journal of Library Metadata Social Sciences-Library and Information Sciences
CiteScore
2.00
自引率
0.00%
发文量
13
期刊最新文献
Metadata Management in Data Lake Environments: A Survey A Minimal Metadata Schema and Its Tool to Improve the Searchableness of Research Data in Bioinformatics Changes in Digital Collections and Their Metadata: A Longitudinal Study of UIUC Digital Library Common Metadata Framework for Research Data Repository: Necessity to Support Open Science Entity Management Using RDA and Wikibase: A Case Study at the National Library of Greece
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1