Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti
{"title":"The OpenCitations Index","authors":"Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti","doi":"arxiv-2408.02321","DOIUrl":null,"url":null,"abstract":"This article presents the OpenCitations Index, a collection of open citation\ndata maintained by OpenCitations, an independent, not-for-profit infrastructure\norganisation for open scholarship dedicated to publishing open bibliographic\nand citation data using Semantic Web and Linked Open Data technologies. The\ncollection involves citation data harvested from multiple sources. To address\nthe possibility of different sources providing citation data for bibliographic\nentities represented with different identifiers, therefore potentially\nrepresenting same citation, a deduplication mechanism has been implemented.\nThis ensures that citations integrated into OpenCitations Index are accurately\nidentified uniquely, even when different identifiers are used. This mechanism\nfollows a specific workflow, which encompasses a preprocessing of the original\nsource data, a management of the provided bibliographic metadata, and the\ngeneration of new citation data to be integrated into the OpenCitations Index.\nThe process relies on another data collection: OpenCitations Meta, and on the\nuse of a new globally persistent identifier, namely OMID (OpenCitations Meta\nIdentifier). As of July 2024, OpenCitations Index stores over 2 billion unique\ncitation links, harvest from Crossref, the National Institute of Heath Open\nCitation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center\n(JaLC). OpenCitations Index can be systematically accessed and queried through\nseveral services, including SPARQL endpoint, REST APIs, and web interfaces.\nAdditionally, dataset dumps are available for free download and reuse (under\nCC0 waiver) in various formats (CSV, N-Triples, and Scholix), including\nprovenance and change tracking information.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.02321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This article presents the OpenCitations Index, a collection of open citation
data maintained by OpenCitations, an independent, not-for-profit infrastructure
organisation for open scholarship dedicated to publishing open bibliographic
and citation data using Semantic Web and Linked Open Data technologies. The
collection involves citation data harvested from multiple sources. To address
the possibility of different sources providing citation data for bibliographic
entities represented with different identifiers, therefore potentially
representing same citation, a deduplication mechanism has been implemented.
This ensures that citations integrated into OpenCitations Index are accurately
identified uniquely, even when different identifiers are used. This mechanism
follows a specific workflow, which encompasses a preprocessing of the original
source data, a management of the provided bibliographic metadata, and the
generation of new citation data to be integrated into the OpenCitations Index.
The process relies on another data collection: OpenCitations Meta, and on the
use of a new globally persistent identifier, namely OMID (OpenCitations Meta
Identifier). As of July 2024, OpenCitations Index stores over 2 billion unique
citation links, harvest from Crossref, the National Institute of Heath Open
Citation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center
(JaLC). OpenCitations Index can be systematically accessed and queried through
several services, including SPARQL endpoint, REST APIs, and web interfaces.
Additionally, dataset dumps are available for free download and reuse (under
CC0 waiver) in various formats (CSV, N-Triples, and Scholix), including
provenance and change tracking information.