{"title":"De-duplicating the OpenAIRE Scholarly Communication Big Graph","authors":"Claudio Atzori, P. Manghi, A. Bardi","doi":"10.1109/eScience.2018.00104","DOIUrl":null,"url":null,"abstract":"The OpenAIRE infrastructure populates a scholarly communication big graph interlinking metadata objects of publications, datasets, software, organizations, funders, and projects. In order to de-duplicate this graph, OpenAIRE has developed GDup, an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup offers functionalities to realize a fully-fledged entity deduplication workflow over a generic input graph, inclusive of Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"25 1","pages":"372-373"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 14th International Conference on e-Science (e-Science)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2018.00104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The OpenAIRE infrastructure populates a scholarly communication big graph interlinking metadata objects of publications, datasets, software, organizations, funders, and projects. In order to de-duplicate this graph, OpenAIRE has developed GDup, an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup offers functionalities to realize a fully-fledged entity deduplication workflow over a generic input graph, inclusive of Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph.