The OpenCitations Index

arXiv - CS - Digital Libraries Pub Date : 2024-08-05 DOI:arxiv-2408.02321

Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti

{"title":"The OpenCitations Index","authors":"Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti","doi":"arxiv-2408.02321","DOIUrl":null,"url":null,"abstract":"This article presents the OpenCitations Index, a collection of open citation\ndata maintained by OpenCitations, an independent, not-for-profit infrastructure\norganisation for open scholarship dedicated to publishing open bibliographic\nand citation data using Semantic Web and Linked Open Data technologies. The\ncollection involves citation data harvested from multiple sources. To address\nthe possibility of different sources providing citation data for bibliographic\nentities represented with different identifiers, therefore potentially\nrepresenting same citation, a deduplication mechanism has been implemented.\nThis ensures that citations integrated into OpenCitations Index are accurately\nidentified uniquely, even when different identifiers are used. This mechanism\nfollows a specific workflow, which encompasses a preprocessing of the original\nsource data, a management of the provided bibliographic metadata, and the\ngeneration of new citation data to be integrated into the OpenCitations Index.\nThe process relies on another data collection: OpenCitations Meta, and on the\nuse of a new globally persistent identifier, namely OMID (OpenCitations Meta\nIdentifier). As of July 2024, OpenCitations Index stores over 2 billion unique\ncitation links, harvest from Crossref, the National Institute of Heath Open\nCitation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center\n(JaLC). OpenCitations Index can be systematically accessed and queried through\nseveral services, including SPARQL endpoint, REST APIs, and web interfaces.\nAdditionally, dataset dumps are available for free download and reuse (under\nCC0 waiver) in various formats (CSV, N-Triples, and Scholix), including\nprovenance and change tracking information.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.02321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This article presents the OpenCitations Index, a collection of open citation data maintained by OpenCitations, an independent, not-for-profit infrastructure organisation for open scholarship dedicated to publishing open bibliographic and citation data using Semantic Web and Linked Open Data technologies. The collection involves citation data harvested from multiple sources. To address the possibility of different sources providing citation data for bibliographic entities represented with different identifiers, therefore potentially representing same citation, a deduplication mechanism has been implemented. This ensures that citations integrated into OpenCitations Index are accurately identified uniquely, even when different identifiers are used. This mechanism follows a specific workflow, which encompasses a preprocessing of the original source data, a management of the provided bibliographic metadata, and the generation of new citation data to be integrated into the OpenCitations Index. The process relies on another data collection: OpenCitations Meta, and on the use of a new globally persistent identifier, namely OMID (OpenCitations Meta Identifier). As of July 2024, OpenCitations Index stores over 2 billion unique citation links, harvest from Crossref, the National Institute of Heath Open Citation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center (JaLC). OpenCitations Index can be systematically accessed and queried through several services, including SPARQL endpoint, REST APIs, and web interfaces. Additionally, dataset dumps are available for free download and reuse (under CC0 waiver) in various formats (CSV, N-Triples, and Scholix), including provenance and change tracking information.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

OpenCitations 索引

本文介绍了OpenCitations索引，这是一个由OpenCitations维护的开放引文数据集。OpenCitations是一个独立的非营利性开放学术基础设施组织，致力于利用语义网（Semantic Web）和关联开放数据（Linked Open Data）技术发布开放书目和引文数据。该文集涉及从多个来源获取的引文数据。为了解决不同来源为使用不同标识符表示的书目实体提供引文数据，从而可能代表相同引文的问题，我们实施了重复数据删除机制。该机制遵循一个特定的工作流程，其中包括对原始源数据的预处理、对所提供书目元数据的管理，以及生成新的引文数据以集成到 OpenCitations 索引中：该过程依赖于另一个数据收集：OpenCitations Meta，以及使用一个新的全球持久标识符，即 OMID（OpenCitations MetaIdentifier）。截至 2024 年 7 月，OpenCitations 索引存储了超过 20 亿条唯一引用链接，这些链接来自 Crossref、美国国立卫生研究院开放引文集（NIH-OCC）、DataCite、OpenAIRE 和日本链接中心（JaLC）。OpenCitations Index 可通过 SPARQL 端点、REST API 和 Web 界面等多种服务进行系统访问和查询。此外，数据集转储可通过各种格式（CSV、N-Triples 和 Scholix）免费下载和重复使用（根据CC0 豁免），包括证明和变更跟踪信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - CS - Digital Libraries

自引率

0.00%

发文量