The OpenCitations Index

Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti
{"title":"The OpenCitations Index","authors":"Ivan Heibi, Arianna Moretti, Silvio Peroni, Marta Soricetti","doi":"arxiv-2408.02321","DOIUrl":null,"url":null,"abstract":"This article presents the OpenCitations Index, a collection of open citation\ndata maintained by OpenCitations, an independent, not-for-profit infrastructure\norganisation for open scholarship dedicated to publishing open bibliographic\nand citation data using Semantic Web and Linked Open Data technologies. The\ncollection involves citation data harvested from multiple sources. To address\nthe possibility of different sources providing citation data for bibliographic\nentities represented with different identifiers, therefore potentially\nrepresenting same citation, a deduplication mechanism has been implemented.\nThis ensures that citations integrated into OpenCitations Index are accurately\nidentified uniquely, even when different identifiers are used. This mechanism\nfollows a specific workflow, which encompasses a preprocessing of the original\nsource data, a management of the provided bibliographic metadata, and the\ngeneration of new citation data to be integrated into the OpenCitations Index.\nThe process relies on another data collection: OpenCitations Meta, and on the\nuse of a new globally persistent identifier, namely OMID (OpenCitations Meta\nIdentifier). As of July 2024, OpenCitations Index stores over 2 billion unique\ncitation links, harvest from Crossref, the National Institute of Heath Open\nCitation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center\n(JaLC). OpenCitations Index can be systematically accessed and queried through\nseveral services, including SPARQL endpoint, REST APIs, and web interfaces.\nAdditionally, dataset dumps are available for free download and reuse (under\nCC0 waiver) in various formats (CSV, N-Triples, and Scholix), including\nprovenance and change tracking information.","PeriodicalId":501285,"journal":{"name":"arXiv - CS - Digital Libraries","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.02321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This article presents the OpenCitations Index, a collection of open citation data maintained by OpenCitations, an independent, not-for-profit infrastructure organisation for open scholarship dedicated to publishing open bibliographic and citation data using Semantic Web and Linked Open Data technologies. The collection involves citation data harvested from multiple sources. To address the possibility of different sources providing citation data for bibliographic entities represented with different identifiers, therefore potentially representing same citation, a deduplication mechanism has been implemented. This ensures that citations integrated into OpenCitations Index are accurately identified uniquely, even when different identifiers are used. This mechanism follows a specific workflow, which encompasses a preprocessing of the original source data, a management of the provided bibliographic metadata, and the generation of new citation data to be integrated into the OpenCitations Index. The process relies on another data collection: OpenCitations Meta, and on the use of a new globally persistent identifier, namely OMID (OpenCitations Meta Identifier). As of July 2024, OpenCitations Index stores over 2 billion unique citation links, harvest from Crossref, the National Institute of Heath Open Citation Collection (NIH-OCC), DataCite, OpenAIRE, and the Japan Link Center (JaLC). OpenCitations Index can be systematically accessed and queried through several services, including SPARQL endpoint, REST APIs, and web interfaces. Additionally, dataset dumps are available for free download and reuse (under CC0 waiver) in various formats (CSV, N-Triples, and Scholix), including provenance and change tracking information.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
OpenCitations 索引
本文介绍了OpenCitations索引,这是一个由OpenCitations维护的开放引文数据集。OpenCitations是一个独立的非营利性开放学术基础设施组织,致力于利用语义网(Semantic Web)和关联开放数据(Linked Open Data)技术发布开放书目和引文数据。该文集涉及从多个来源获取的引文数据。为了解决不同来源为使用不同标识符表示的书目实体提供引文数据,从而可能代表相同引文的问题,我们实施了重复数据删除机制。该机制遵循一个特定的工作流程,其中包括对原始源数据的预处理、对所提供书目元数据的管理,以及生成新的引文数据以集成到 OpenCitations 索引中:该过程依赖于另一个数据收集:OpenCitations Meta,以及使用一个新的全球持久标识符,即 OMID(OpenCitations MetaIdentifier)。截至 2024 年 7 月,OpenCitations 索引存储了超过 20 亿条唯一引用链接,这些链接来自 Crossref、美国国立卫生研究院开放引文集(NIH-OCC)、DataCite、OpenAIRE 和日本链接中心(JaLC)。OpenCitations Index 可通过 SPARQL 端点、REST API 和 Web 界面等多种服务进行系统访问和查询。此外,数据集转储可通过各种格式(CSV、N-Triples 和 Scholix)免费下载和重复使用(根据CC0 豁免),包括证明和变更跟踪信息。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Publishing Instincts: An Exploration-Exploitation Framework for Studying Academic Publishing Behavior and "Home Venues" Research Citations Building Trust in Wikipedia Evaluating the Linguistic Coverage of OpenAlex: An Assessment of Metadata Accuracy and Completeness Towards understanding evolution of science through language model series Ensuring Adherence to Standards in Experiment-Related Metadata Entered Via Spreadsheets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1