Pub Date : 2022-08-27DOI: 10.1007/978-3-031-16802-4_19
Himarsha R. Jayanetti, Kritika Garg, Sawood Alam, Michael L. Nelson, Michele C. Weigle
{"title":"Robots Still Outnumber Humans in Web Archives, But Less Than Before","authors":"Himarsha R. Jayanetti, Kritika Garg, Sawood Alam, Michael L. Nelson, Michele C. Weigle","doi":"10.1007/978-3-031-16802-4_19","DOIUrl":"https://doi.org/10.1007/978-3-031-16802-4_19","url":null,"abstract":"","PeriodicalId":213862,"journal":{"name":"International Conference on Theory and Practice of Digital Libraries","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129627555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-01DOI: 10.48550/arXiv.2208.00665
Patrick Hochstenbach, H. Sompel, M. V. Sande, R. Dedecker, R. Verborgh
, Abstract. Linkages between research outputs are crucial in the scholarly knowledge graph. They include online citations, but also links between versions that differ according to various dimensions and links to resources that were used to arrive at research results. In current scholarly communication systems this information is only made available post factum and is obtained via elaborate batch processing. In this paper we report on work aimed at making linkages available in real-time, in which an alternative, decentralised scholarly communication network is considered that consists of interacting data nodes that host artifacts and service nodes that add value to artifacts. The first result of this work, the “Event Notifications in Value-Adding Networks” specification, details interoperability requirements for the exchange real-time life-cycle information pertaining to artifacts using Linked Data Notifications. In an experiment, we applied our specification to one particular use-case: distributing Scholix data-literature links to a network of Belgian institutional repositories by a national service node. The results of our experiment confirm the potential of our approach and provide a framework to create a network of interacting nodes implementing the core scholarly functions certification, awareness and archiving) in a decentralized and decoupled way.
{"title":"Event Notifications in Value-Adding Networks","authors":"Patrick Hochstenbach, H. Sompel, M. V. Sande, R. Dedecker, R. Verborgh","doi":"10.48550/arXiv.2208.00665","DOIUrl":"https://doi.org/10.48550/arXiv.2208.00665","url":null,"abstract":", Abstract. Linkages between research outputs are crucial in the scholarly knowledge graph. They include online citations, but also links between versions that differ according to various dimensions and links to resources that were used to arrive at research results. In current scholarly communication systems this information is only made available post factum and is obtained via elaborate batch processing. In this paper we report on work aimed at making linkages available in real-time, in which an alternative, decentralised scholarly communication network is considered that consists of interacting data nodes that host artifacts and service nodes that add value to artifacts. The first result of this work, the “Event Notifications in Value-Adding Networks” specification, details interoperability requirements for the exchange real-time life-cycle information pertaining to artifacts using Linked Data Notifications. In an experiment, we applied our specification to one particular use-case: distributing Scholix data-literature links to a network of Belgian institutional repositories by a national service node. The results of our experiment confirm the potential of our approach and provide a framework to create a network of interacting nodes implementing the core scholarly functions certification, awareness and archiving) in a decentralized and decoupled way.","PeriodicalId":213862,"journal":{"name":"International Conference on Theory and Practice of Digital Libraries","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133984661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
. Scholarly repositories are the cornerstone of modern open science, and their availability is vital for enacting its practices. To this end, scholarly registries such as FAIRsharing, re3data, OpenDOAR and ROAR give them presence and visibility across different research commu-nities, disciplines, and applications by assigning an identifier and persist-ing their profiles with summary metadata. Alas, like any other resource available on the Web, scholarly repositories, be they tailored for litera-ture, software or data, are quite dynamic and can be frequently changed, moved, merged or discontinued. Therefore, their references are prone to link rot over time, and their availability often boils down to whether the homepage URLs indicated in authoritative repository profiles within scholarly registries respond or not. For this study, we harvested the content of four prominent scholarly registries and resolved over 13 thousand unique repository URLs. By per-forming a quantitative analysis on such an extensive collection of repositories, this paper aims to provide a global snapshot of their availability, which bewilderingly is far from granted.
{"title":"\"Knock knock! Who's there?\" A study on scholarly repositories' availability","authors":"A. Mannocci, Miriam Baglioni, P. Manghi","doi":"10.5281/zenodo.6906884","DOIUrl":"https://doi.org/10.5281/zenodo.6906884","url":null,"abstract":". Scholarly repositories are the cornerstone of modern open science, and their availability is vital for enacting its practices. To this end, scholarly registries such as FAIRsharing, re3data, OpenDOAR and ROAR give them presence and visibility across different research commu-nities, disciplines, and applications by assigning an identifier and persist-ing their profiles with summary metadata. Alas, like any other resource available on the Web, scholarly repositories, be they tailored for litera-ture, software or data, are quite dynamic and can be frequently changed, moved, merged or discontinued. Therefore, their references are prone to link rot over time, and their availability often boils down to whether the homepage URLs indicated in authoritative repository profiles within scholarly registries respond or not. For this study, we harvested the content of four prominent scholarly registries and resolved over 13 thousand unique repository URLs. By per-forming a quantitative analysis on such an extensive collection of repositories, this paper aims to provide a global snapshot of their availability, which bewilderingly is far from granted.","PeriodicalId":213862,"journal":{"name":"International Conference on Theory and Practice of Digital Libraries","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116262176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-25DOI: 10.48550/arXiv.2207.12018
J. Kikkawa, Masao Takaku, F. Yoshikane
. Digital Object Identifiers (DOIs) are regarded as persistent; however, they are sometimes deleted. Deleted DOIs are an important issue not only for persistent access to scholarly content but also for bibliometrics, because they may cause problems in correctly identifying scholarly articles. However, little is known about how much of deleted DOIs and what causes them. We identified deleted DOIs by comparing the datasets of all Crossref DOIs on two different dates, investigated the number of deleted DOIs in the scholarly content along with the corresponding document types, and analyzed the factors that cause deleted DOIs. Using the proposed method, 708,282 deleted DOIs were identified. The majority corresponded to individual scholarly articles such as journal articles, proceedings articles, and book chapters. There were cases of many DOIs assigned to the same content, e.g., retracted journal articles and abstracts of international conferences. We show the publishers and academic societies which are the most common in deleted DOIs. In addition, the top cases of single scholarly content with a large number of deleted DOIs were revealed. The findings of this study are useful for citation analysis and altmetrics, as well as for avoiding deleted DOIs.
{"title":"Analysis of the deletions of DOIs: What factors undermine their persistence and to what extent?","authors":"J. Kikkawa, Masao Takaku, F. Yoshikane","doi":"10.48550/arXiv.2207.12018","DOIUrl":"https://doi.org/10.48550/arXiv.2207.12018","url":null,"abstract":". Digital Object Identifiers (DOIs) are regarded as persistent; however, they are sometimes deleted. Deleted DOIs are an important issue not only for persistent access to scholarly content but also for bibliometrics, because they may cause problems in correctly identifying scholarly articles. However, little is known about how much of deleted DOIs and what causes them. We identified deleted DOIs by comparing the datasets of all Crossref DOIs on two different dates, investigated the number of deleted DOIs in the scholarly content along with the corresponding document types, and analyzed the factors that cause deleted DOIs. Using the proposed method, 708,282 deleted DOIs were identified. The majority corresponded to individual scholarly articles such as journal articles, proceedings articles, and book chapters. There were cases of many DOIs assigned to the same content, e.g., retracted journal articles and abstracts of international conferences. We show the publishers and academic societies which are the most common in deleted DOIs. In addition, the top cases of single scholarly content with a large number of deleted DOIs were revealed. The findings of this study are useful for citation analysis and altmetrics, as well as for avoiding deleted DOIs.","PeriodicalId":213862,"journal":{"name":"International Conference on Theory and Practice of Digital Libraries","volume":"241 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115601288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-11DOI: 10.48550/arXiv.2207.04772
Zeyd Boukhers, N. Bahubali
As the number of authors is increasing exponentially over years, the number of authors sharing the same names is increasing proportionally. This makes it challenging to assign newly published papers to their adequate authors. Therefore, Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries. This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities by leveraging their co-authors and domain of research. To this end, we use a collection from the DBLP repository that contains more than 5 million bibliographic records authored by around 2.6 million co-authors. Our approach first groups authors who share the same last names and same first name initials. The author within each group is identified by capturing the relation with his/her co-authors and area of research, which is represented by the titles of the validated publications of the corresponding author. To this end, we train a neural network model that learns from the representations of the co-authors and titles. We validated the effectiveness of our approach by conducting extensive experiments on a large dataset.
{"title":"Whois? Deep Author Name Disambiguation using Bibliographic Data","authors":"Zeyd Boukhers, N. Bahubali","doi":"10.48550/arXiv.2207.04772","DOIUrl":"https://doi.org/10.48550/arXiv.2207.04772","url":null,"abstract":"As the number of authors is increasing exponentially over years, the number of authors sharing the same names is increasing proportionally. This makes it challenging to assign newly published papers to their adequate authors. Therefore, Author Name Ambiguity (ANA) is considered a critical open problem in digital libraries. This paper proposes an Author Name Disambiguation (AND) approach that links author names to their real-world entities by leveraging their co-authors and domain of research. To this end, we use a collection from the DBLP repository that contains more than 5 million bibliographic records authored by around 2.6 million co-authors. Our approach first groups authors who share the same last names and same first name initials. The author within each group is identified by capturing the relation with his/her co-authors and area of research, which is represented by the titles of the validated publications of the corresponding author. To this end, we train a neural network model that learns from the representations of the co-authors and titles. We validated the effectiveness of our approach by conducting extensive experiments on a large dataset.","PeriodicalId":213862,"journal":{"name":"International Conference on Theory and Practice of Digital Libraries","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121646978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-06-08DOI: 10.1007/978-3-031-16802-4_36
Giuseppe Grieco, Ivan Heibi, Arcangelo Massari, A. Moretti, S. Peroni
{"title":"Enabling Portability and Reusability of Open Science Infrastructures","authors":"Giuseppe Grieco, Ivan Heibi, Arcangelo Massari, A. Moretti, S. Peroni","doi":"10.1007/978-3-031-16802-4_36","DOIUrl":"https://doi.org/10.1007/978-3-031-16802-4_36","url":null,"abstract":"","PeriodicalId":213862,"journal":{"name":"International Conference on Theory and Practice of Digital Libraries","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114350035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-29DOI: 10.1007/978-3-031-16802-4_42
A. Cioffi, S. Peroni
{"title":"Structured references from PDF articles: assessing the tools for bibliographic reference extraction and parsing","authors":"A. Cioffi, S. Peroni","doi":"10.1007/978-3-031-16802-4_42","DOIUrl":"https://doi.org/10.1007/978-3-031-16802-4_42","url":null,"abstract":"","PeriodicalId":213862,"journal":{"name":"International Conference on Theory and Practice of Digital Libraries","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125276044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-05-26DOI: 10.48550/arXiv.2205.13419
Erika Alves dos Santos, S. Peroni, M. L. Mucheroni
. Current citation practices observed in articles are very noisy, confus-ing, and not standardised at all, making the identification of the cited works problematic for humans and any reference extraction software. In this work, we want to investigate on such citation practices for referencing different types of entities and, in particular, for understanding what the most used metadata in bibliographic references are. We identified 36 different types of cited entities (the most cited ones were articles, books, and proceeding papers) within the 34,140 bibliographic references extracted from a huge set of journal articles of 27 different subject areas. The analysis of such bibliographic references, grouped by the particular type of cited entities, enabled us to highlight the most used metadata for defining bibliographic references across the subject areas. However, we also noticed that, in some cases, bibliographic references did not provide the essential elements to easily identify the work they refer to.
{"title":"The way we cite: common metadata used across disciplines for defining bibliographic references","authors":"Erika Alves dos Santos, S. Peroni, M. L. Mucheroni","doi":"10.48550/arXiv.2205.13419","DOIUrl":"https://doi.org/10.48550/arXiv.2205.13419","url":null,"abstract":". Current citation practices observed in articles are very noisy, confus-ing, and not standardised at all, making the identification of the cited works problematic for humans and any reference extraction software. In this work, we want to investigate on such citation practices for referencing different types of entities and, in particular, for understanding what the most used metadata in bibliographic references are. We identified 36 different types of cited entities (the most cited ones were articles, books, and proceeding papers) within the 34,140 bibliographic references extracted from a huge set of journal articles of 27 different subject areas. The analysis of such bibliographic references, grouped by the particular type of cited entities, enabled us to highlight the most used metadata for defining bibliographic references across the subject areas. However, we also noticed that, in some cases, bibliographic references did not provide the essential elements to easily identify the work they refer to.","PeriodicalId":213862,"journal":{"name":"International Conference on Theory and Practice of Digital Libraries","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131236751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-08-12DOI: 10.1007/978-3-030-86324-1_9
Mohamed Aturban, Michael L. Nelson, Michele C. Weigle
{"title":"Where Did the Web Archive Go?","authors":"Mohamed Aturban, Michael L. Nelson, Michele C. Weigle","doi":"10.1007/978-3-030-86324-1_9","DOIUrl":"https://doi.org/10.1007/978-3-030-86324-1_9","url":null,"abstract":"","PeriodicalId":213862,"journal":{"name":"International Conference on Theory and Practice of Digital Libraries","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123749759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-07-08DOI: 10.1007/978-3-030-86324-1_22
A. Oelen, M. Stocker, S. Auer
{"title":"SmartReviews: Towards Human- and Machine-actionable Reviews","authors":"A. Oelen, M. Stocker, S. Auer","doi":"10.1007/978-3-030-86324-1_22","DOIUrl":"https://doi.org/10.1007/978-3-030-86324-1_22","url":null,"abstract":"","PeriodicalId":213862,"journal":{"name":"International Conference on Theory and Practice of Digital Libraries","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134403174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}