Muthu Kumar Chandrasekaran, Kokil Jaidka, Philipp Mayr
The large scale of scholarly publications poses a challenge for scholars in information-seeking and sensemaking. Bibliometric, information retrieval (IR), text mining and NLP techniques could help in these activities, but are not yet widely used in digital libraries. This workshop is intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometric and recommendation techniques which can advance the state-of-the-art in scholarly document understanding, analysis and retrieval at scale.
{"title":"Joint workshop on bibliometric-enhanced information retrieval and natural language processing for digital libraries (BIRNDL 2016)","authors":"Muthu Kumar Chandrasekaran, Kokil Jaidka, Philipp Mayr","doi":"10.1145/3077136.3084370","DOIUrl":"https://doi.org/10.1145/3077136.3084370","url":null,"abstract":"The large scale of scholarly publications poses a challenge for scholars in information-seeking and sensemaking. Bibliometric, information retrieval (IR), text mining and NLP techniques could help in these activities, but are not yet widely used in digital libraries. This workshop is intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometric and recommendation techniques which can advance the state-of-the-art in scholarly document understanding, analysis and retrieval at scale.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125482070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew N. Jackson, Jimmy J. Lin, Ian Milligan, Nick Ruest
Web archiving initiatives around the world capture ephemeral web content to preserve our collective digital memory. In this paper, we describe initial experiences in providing an exploratory search interface to web archives for humanities scholars and social scientists. We describe our initial implementation and discuss our findings in terms of desiderata for such a system. It is clear that the standard organization of a search engine results page (SERP), consisting of an ordered list of hits, is inadequate to support the needs of scholars. Shneiderman's mantra for visual information seeking (“overview first, zoom and filter, then details-on-demand”) provides a nice organizing principle for interface design, to which we propose an addendum: “Make everything transparent”. We elaborate on this by highlighting the importance of the temporal dimension of web pages as well as issues surrounding metadata and veracity.
{"title":"Desiderata for exploratory search interfaces to Web archives in support of scholarly activities","authors":"Andrew N. Jackson, Jimmy J. Lin, Ian Milligan, Nick Ruest","doi":"10.1145/2910896.2910912","DOIUrl":"https://doi.org/10.1145/2910896.2910912","url":null,"abstract":"Web archiving initiatives around the world capture ephemeral web content to preserve our collective digital memory. In this paper, we describe initial experiences in providing an exploratory search interface to web archives for humanities scholars and social scientists. We describe our initial implementation and discuss our findings in terms of desiderata for such a system. It is clear that the standard organization of a search engine results page (SERP), consisting of an ordered list of hits, is inadequate to support the needs of scholars. Shneiderman's mantra for visual information seeking (“overview first, zoom and filter, then details-on-demand”) provides a nice organizing principle for interface design, to which we propose an addendum: “Make everything transparent”. We elaborate on this by highlighting the importance of the temporal dimension of web pages as well as issues surrounding metadata and veracity.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115424240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joffrey Decourselle, F. Duchateau, Trond Aalberg, Naimdjon Takhirov, Nicolas Lumineau
The transformation of legacy MARC catalogs to FRBR catalogs (FRBRization) is a complex and important challenge for libraries. Although many FRBRization tools have provided experimental validation, it is difficult to evaluate and compare these systems on a fair basis due to a lack of common datasets. This poster presents two public datasets (T42 and BIB-RCAT) intended to support the validation of the FRBRization process.
{"title":"Open datasets for evaluating the interpretation of bibliographic records","authors":"Joffrey Decourselle, F. Duchateau, Trond Aalberg, Naimdjon Takhirov, Nicolas Lumineau","doi":"10.1145/2910896.2925457","DOIUrl":"https://doi.org/10.1145/2910896.2925457","url":null,"abstract":"The transformation of legacy MARC catalogs to FRBR catalogs (FRBRization) is a complex and important challenge for libraries. Although many FRBRization tools have provided experimental validation, it is difficult to evaluate and compare these systems on a fair basis due to a lack of common datasets. This poster presents two public datasets (T42 and BIB-RCAT) intended to support the validation of the FRBRization process.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125965272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BIBSURF is a system demonstrating search, ranking and filtering of bibliographic RDF data that is organized in form of entities representing intellectual endeavor at different levels of abstraction: item, manifestation, expression, work.
{"title":"BIBSURF — Discover bibliographic entities by searching for units of interest, ranking and filtering","authors":"Trond Aalberg, Tanja Mercun, M. Zumer","doi":"10.1145/2910896.2925434","DOIUrl":"https://doi.org/10.1145/2910896.2925434","url":null,"abstract":"BIBSURF is a system demonstrating search, ranking and filtering of bibliographic RDF data that is organized in form of entities representing intellectual endeavor at different levels of abstraction: item, manifestation, expression, work.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128967511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Large metadata aggregations provide access to documents held by multiple cultural heritage (CH) institutions. As CH institutions encode their metadata using different schemas and follow different data standards, aggregators must process the received data before making it available through a unified portal. Staff members at the contributing CH institutions don't receive feedback regarding the quality of the provided or the processed data. We are developing mechanisms that enable staff at the CH institutions to understand the effectiveness of their metadata with a goal of improving the visibility of their items in these large portals such as the Digital Public Library of America. This poster will present a classification of the DPLA metadata application profile highlighting compliance levels as well as a visualization framework for presenting the compliance of an institution's data with the DPLA data model.
{"title":"Visualizing published metadata in large aggregations","authors":"Unmil Karadkar, Geoffrey A. Potter, Shengwei Wang","doi":"10.1145/2910896.2925451","DOIUrl":"https://doi.org/10.1145/2910896.2925451","url":null,"abstract":"Large metadata aggregations provide access to documents held by multiple cultural heritage (CH) institutions. As CH institutions encode their metadata using different schemas and follow different data standards, aggregators must process the received data before making it available through a unified portal. Staff members at the contributing CH institutions don't receive feedback regarding the quality of the provided or the processed data. We are developing mechanisms that enable staff at the CH institutions to understand the effectiveness of their metadata with a goal of improving the visibility of their items in these large portals such as the Digital Public Library of America. This poster will present a classification of the DPLA metadata application profile highlighting compliance levels as well as a visualization framework for presenting the compliance of an institution's data with the DPLA data model.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121724294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing size of digital libraries (DLs) it has become a challenge to identify author names correctly and assign publications to them. The situation becomes more critical when different persons share the same name (homonym problem) or when the names of authors are presented in several different ways (synonym problem). This paper focuses on homonym names in the computer science bibliography DBLP. The goal of this study is to implement and evaluate a method which uses co-authorship networks in order to disambiguate homonym names, especially common names. The results show that the implemented method has a good performance and can be used for author name disambiguation of sparse bibliographic records.
{"title":"Using co-authorship networks for author name disambiguation","authors":"Fakhri Momeni, Philipp Mayr","doi":"10.1145/2910896.2925461","DOIUrl":"https://doi.org/10.1145/2910896.2925461","url":null,"abstract":"With the increasing size of digital libraries (DLs) it has become a challenge to identify author names correctly and assign publications to them. The situation becomes more critical when different persons share the same name (homonym problem) or when the names of authors are presented in several different ways (synonym problem). This paper focuses on homonym names in the computer science bibliography DBLP. The goal of this study is to implement and evaluate a method which uses co-authorship networks in order to disambiguate homonym names, especially common names. The results show that the implemented method has a good performance and can be used for author name disambiguation of sparse bibliographic records.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127811813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jacob Jett, Terhi Nurmikko-Fuller, Timothy W. Cole, Kevin R. Page, J. S. Downie
The HathiTrust Research Center (HTRC) is engaged in the development of tools that will give scholars the ability to analyze the HathiTrust digital library's 14 million volume corpus. A cornerstone of the HTRC's digital infrastructure is the workset - a kind of scholar-built research collection intended for use with the HTRC's analytics platform. Because more than 66% of the digital corpus is subject to copyright restrictions, scholarly users remain dependent upon the descriptive accounts provided by traditional metadata records in order to identify and gather together bibliographic resources for analysis. This paper compares the MADSRDF/MODSRDF, Bibframe, schema.org, BIBO, and FaBiO ontologies by assessing their suitability for employment by the HTRC to meet scholars' needs. These include distinguishing among multiple versions of the same work; representing the complex historical and physical relationships among those versions; and identifying and providing access to finer grained bibliographic entities, e.g., poems, chapters, sections, and even smaller segments of content.
{"title":"Enhancing scholarly use of digital libraries: A comparative survey and review of bibliographic metadata ontologies","authors":"Jacob Jett, Terhi Nurmikko-Fuller, Timothy W. Cole, Kevin R. Page, J. S. Downie","doi":"10.1145/2910896.2910903","DOIUrl":"https://doi.org/10.1145/2910896.2910903","url":null,"abstract":"The HathiTrust Research Center (HTRC) is engaged in the development of tools that will give scholars the ability to analyze the HathiTrust digital library's 14 million volume corpus. A cornerstone of the HTRC's digital infrastructure is the workset - a kind of scholar-built research collection intended for use with the HTRC's analytics platform. Because more than 66% of the digital corpus is subject to copyright restrictions, scholarly users remain dependent upon the descriptive accounts provided by traditional metadata records in order to identify and gather together bibliographic resources for analysis. This paper compares the MADSRDF/MODSRDF, Bibframe, schema.org, BIBO, and FaBiO ontologies by assessing their suitability for employment by the HTRC to meet scholars' needs. These include distinguishing among multiple versions of the same work; representing the complex historical and physical relationships among those versions; and identifying and providing access to finer grained bibliographic entities, e.g., poems, chapters, sections, and even smaller segments of content.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116941128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Summary form only given. In October 1991 the National Science Foundation (NSF) sponsored a workshop to examine the role of the Information Retrieval research community in the emerging environment of Internet, high performance text processing capabilities and ever-increasing volumes of digitized documents. Ed Fox, Michael Lesk and Michael McGill drafted a White Paper, calling for a National Electronic Science, Engineering, and Technology Library. The term “Digital Library” was adopted and for follow-up workshops with the goal to identify research directions, leading to National Science Foundation (NSF)/Defense Advanced Research Projects Agency (DARPA)/National Aeronautics and Space Administration (NASA) Research in Digital Libraries Initiative announced in late 1993. Now, in 2016, 25 years after the first workshop, 15 years after the Joint Conference on Digital Libraries has been established, and many initiatives and developments around the world, what is the state of Digital Libraries? What items should be in digital libraries, who should their custodians, how can the items be organized to support knowledge discovery, how can the contents be safeguarded and preserved? Ebla, Syria (2500 B.C.-2250 B.C.) constitutes the oldest organized library of tables yet discovered. What will the archeologists discover in year 4400 about the world, politics, economies, technologies, science, climate, species, health, food, culture, art, entertainment and everyday life through the ages? The talk will examine what we can do to support innovative research and design and implementation of lasting, informative Digital Libraries that will promote global goals of knowledge discovery and international understanding and personal needs to organize and selectively share important facts, creations, and memories.
只提供摘要形式。1991年10月,美国国家科学基金会(NSF)赞助了一个研讨会,探讨信息检索研究界在互联网、高性能文本处理能力和不断增长的数字化文档的新兴环境中的作用。Ed Fox, Michael Lesk和Michael McGill起草了一份白皮书,呼吁建立一个国家电子科学、工程和技术图书馆。术语“数字图书馆”被采用,并为后续研讨会的目标是确定研究方向,导致美国国家科学基金会(NSF)/国防高级研究计划局(DARPA)/美国国家航空航天局(NASA)在1993年底宣布的数字图书馆倡议研究。现在,2016年,在第一次研讨会召开25年后,在数字图书馆联合会议成立15年后,在世界各地有许多倡议和发展,数字图书馆的状况如何?数字图书馆中应该有哪些项目?谁应该保管这些项目?如何组织这些项目以支持知识发现?如何保护和保存内容?叙利亚的埃布拉(公元前2500年-公元前2250年)构成了迄今为止发现的最古老的有组织的表格图书馆。在公元4400年,考古学家们将会在世界、政治、经济、技术、科学、气候、物种、健康、食物、文化、艺术、娱乐和日常生活中发现什么?这次演讲将探讨我们可以做些什么来支持创新研究、设计和实施持久的、信息丰富的数字图书馆,这些图书馆将促进知识发现的全球目标、国际理解和个人需要,以组织和有选择地分享重要的事实、创作和记忆。
{"title":"Future digital libraries: Research and responsibilities","authors":"M. Zemankova","doi":"10.1145/2910896.2926740","DOIUrl":"https://doi.org/10.1145/2910896.2926740","url":null,"abstract":"Summary form only given. In October 1991 the National Science Foundation (NSF) sponsored a workshop to examine the role of the Information Retrieval research community in the emerging environment of Internet, high performance text processing capabilities and ever-increasing volumes of digitized documents. Ed Fox, Michael Lesk and Michael McGill drafted a White Paper, calling for a National Electronic Science, Engineering, and Technology Library. The term “Digital Library” was adopted and for follow-up workshops with the goal to identify research directions, leading to National Science Foundation (NSF)/Defense Advanced Research Projects Agency (DARPA)/National Aeronautics and Space Administration (NASA) Research in Digital Libraries Initiative announced in late 1993. Now, in 2016, 25 years after the first workshop, 15 years after the Joint Conference on Digital Libraries has been established, and many initiatives and developments around the world, what is the state of Digital Libraries? What items should be in digital libraries, who should their custodians, how can the items be organized to support knowledge discovery, how can the contents be safeguarded and preserved? Ebla, Syria (2500 B.C.-2250 B.C.) constitutes the oldest organized library of tables yet discovered. What will the archeologists discover in year 4400 about the world, politics, economies, technologies, science, climate, species, health, food, culture, art, entertainment and everyday life through the ages? The talk will examine what we can do to support innovative research and design and implementation of lasting, informative Digital Libraries that will promote global goals of knowledge discovery and international understanding and personal needs to organize and selectively share important facts, creations, and memories.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117234665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Academics have relied heavily on search engines to identify and locate research manuscripts that are related to their research areas. Many of the early information retrieval systems and technologies were developed while catering for librarians to help them sift through books and proceedings, followed by recent online academic search engines such as Google Scholar and Microsoft Academic Search. In spite of their popularity among academics and importance to academia, the usage, query behaviors, and retrieval models for academic search engines have not been well studied. To this end, we study the distribution of queries that are received by an academic search engine. Furthermore, we delve deeper into academic search queries and classify them into navigational and informational queries. This work introduces a definition for navigational queries in academic search engines under which a query is considered navigational if the user is searching for a specific paper or document. We describe multiple facets of navigational academic queries, and introduce a machine learning approach with a set of features to identify such queries.
{"title":"Towards better understanding of academic search","authors":"Madian Khabsa, Zhaohui Wu, C. Lee Giles","doi":"10.1145/2910896.2910922","DOIUrl":"https://doi.org/10.1145/2910896.2910922","url":null,"abstract":"Academics have relied heavily on search engines to identify and locate research manuscripts that are related to their research areas. Many of the early information retrieval systems and technologies were developed while catering for librarians to help them sift through books and proceedings, followed by recent online academic search engines such as Google Scholar and Microsoft Academic Search. In spite of their popularity among academics and importance to academia, the usage, query behaviors, and retrieval models for academic search engines have not been well studied. To this end, we study the distribution of queries that are received by an academic search engine. Furthermore, we delve deeper into academic search queries and classify them into navigational and informational queries. This work introduces a definition for navigational queries in academic search engines under which a query is considered navigational if the user is searching for a specific paper or document. We describe multiple facets of navigational academic queries, and introduce a machine learning approach with a set of features to identify such queries.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129505952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mining advisor-advisee relationships can benefit many interesting applications such as advisor recommendation and protege performance analysis. Based on the hypothesis that, advisor-advisee relationships among researchers are hidden in scholarly big data, we propose in this work a deep learning based advisor-advisee relationship identification method which considers the personal properties and network characteristics with a stacked autoencoder model. To the best of our knowledge, this is the first time that a deep learning model is utilized to represent coauthor network features for relationships identification. Moreover, experiments demonstrate that the proposed method has better performance compared with other state-of-the-art methods.
{"title":"Mining advisor-advisee relationships in scholarly big data: A deep learning approach","authors":"Wei Wang, Jiaying Liu, Shuo Yu, Chenxin Zhang, Zhenzhen Xu, Feng Xia","doi":"10.1145/2910896.2925435","DOIUrl":"https://doi.org/10.1145/2910896.2925435","url":null,"abstract":"Mining advisor-advisee relationships can benefit many interesting applications such as advisor recommendation and protege performance analysis. Based on the hypothesis that, advisor-advisee relationships among researchers are hidden in scholarly big data, we propose in this work a deep learning based advisor-advisee relationship identification method which considers the personal properties and network characteristics with a stacked autoencoder model. To the best of our knowledge, this is the first time that a deep learning model is utilized to represent coauthor network features for relationships identification. Moreover, experiments demonstrate that the proposed method has better performance compared with other state-of-the-art methods.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128614833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}