{"title":"Session details: Panels","authors":"A. Rauber, Hideo Joho","doi":"10.1145/3260519","DOIUrl":"https://doi.org/10.1145/3260519","url":null,"abstract":"","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123679416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A user often interacts with multiple applications while working on a task. User models can be developed individually at each of the individual applications, but there is no easy way to come up with a more complete user model based on the distributed activity of the user. To address this issue, this research studies the importance of combining various implicit and explicit relevance feedback indicators in a multi-application environment. It allows different applications used for different purposes by the user to contribute user activity and its context to mutually support users with unified relevance feedback. Using the data collected by the web browser, Microsoft Word and Microsoft PowerPoint, combinations of implicit relevance feedback with semi-explicit relevance feedback were analyzed and compared with explicit user ratings. Our results are two-fold: first we demonstrate the aggregation of implicit and semi-explicit user interest data across multiple everyday applications using our Interest Profile Manager (IPM) framework. Second, our experimental results show that incorporating implicit feedback with semi-explicit feedback for page-level user interest estimation resulted in a significant improvement over the content-based models.
{"title":"Unified Relevance Feedback for Multi-Application User Interest Modeling","authors":"S. Jayarathna, Atish Patra, F. Shipman","doi":"10.1145/2756406.2756914","DOIUrl":"https://doi.org/10.1145/2756406.2756914","url":null,"abstract":"A user often interacts with multiple applications while working on a task. User models can be developed individually at each of the individual applications, but there is no easy way to come up with a more complete user model based on the distributed activity of the user. To address this issue, this research studies the importance of combining various implicit and explicit relevance feedback indicators in a multi-application environment. It allows different applications used for different purposes by the user to contribute user activity and its context to mutually support users with unified relevance feedback. Using the data collected by the web browser, Microsoft Word and Microsoft PowerPoint, combinations of implicit relevance feedback with semi-explicit relevance feedback were analyzed and compared with explicit user ratings. Our results are two-fold: first we demonstrate the aggregation of implicit and semi-explicit user interest data across multiple everyday applications using our Interest Profile Manager (IPM) framework. Second, our experimental results show that incorporating implicit feedback with semi-explicit feedback for page-level user interest estimation resulted in a significant improvement over the content-based models.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124627428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Simon Barthel, S. Tönnies, B. Köhncke, Patrick Siehndel, Wolf-Tilo Balke
The most important goal for digital libraries is to ensure high quality search experience for all kinds of users. To attain this goal, it is necessary to have as much relevant metadata as possible at hand to assess the quality of publications. Recently, a new group of metrics appeared, that has the potential to raise the quality of publication metadata to the next level -- the altmetrics. These metrics try to reflect the impact of publications within the social web. However, currently it is still unclear if and how altmetrics should be used to assess the quality of a publication and how altmetrics are related to classical bibliographical metrics (like e.g. citations). To gain more insights about what kind of concepts are reflected by altmetrics, we conducted an in-depth analysis on a real world dataset crawled from the Public Library of Science (PLOS). Especially, we analyzed if the common approach to regard the users in the social web as one homogeneous group is sensible or if users need to be divided into diverse groups in order to receive meaningful results.
{"title":"What does Twitter Measure?: Influence of Diverse User Groups in Altmetrics","authors":"Simon Barthel, S. Tönnies, B. Köhncke, Patrick Siehndel, Wolf-Tilo Balke","doi":"10.1145/2756406.2756913","DOIUrl":"https://doi.org/10.1145/2756406.2756913","url":null,"abstract":"The most important goal for digital libraries is to ensure high quality search experience for all kinds of users. To attain this goal, it is necessary to have as much relevant metadata as possible at hand to assess the quality of publications. Recently, a new group of metrics appeared, that has the potential to raise the quality of publication metadata to the next level -- the altmetrics. These metrics try to reflect the impact of publications within the social web. However, currently it is still unclear if and how altmetrics should be used to assess the quality of a publication and how altmetrics are related to classical bibliographical metrics (like e.g. citations). To gain more insights about what kind of concepts are reflected by altmetrics, we conducted an in-depth analysis on a real world dataset crawled from the Public Library of Science (PLOS). Especially, we analyzed if the common approach to regard the users in the social web as one homogeneous group is sensible or if users need to be divided into diverse groups in order to receive meaningful results.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117293884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is studied how users browse search results to find interesting novels for four search scenarios. It is evaluated in particular whether there are differences in search result page (SERP) browsing patterns and effectiveness between an enriched catalog for finding fiction compared to a traditional public library catalog. The data was collected from 30 participants by eye-tracking and questionnaires. The results indicate that the enriched catalog supported users to identify sooner and more effectively potentially clickable items on the results list compared to a traditional public library catalog. This is likely due to the more informative metadata in the enriched catalog like snippets of content description on the result list items. The discussion includes a theoretical and empirical comparison of findings in studies on fiction and non-fiction searching.
{"title":"Result List Actions in Fiction Search","authors":"P. Vakkari, J. Pöntinen","doi":"10.1145/2756406.2756911","DOIUrl":"https://doi.org/10.1145/2756406.2756911","url":null,"abstract":"It is studied how users browse search results to find interesting novels for four search scenarios. It is evaluated in particular whether there are differences in search result page (SERP) browsing patterns and effectiveness between an enriched catalog for finding fiction compared to a traditional public library catalog. The data was collected from 30 participants by eye-tracking and questionnaires. The results indicate that the enriched catalog supported users to identify sooner and more effectively potentially clickable items on the results list compared to a traditional public library catalog. This is likely due to the more informative metadata in the enriched catalog like snippets of content description on the result list items. The discussion includes a theoretical and empirical comparison of findings in studies on fiction and non-fiction searching.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127000327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emotion annotations are important metadata for narrative texts in digital libraries. Such annotations are necessary for automatic text-to-speech conversion of narratives and affective education support and can be used as training data for machine learning algorithms to train automatic emotion detectors. However, obtaining high-quality emotion annotations is a challenging problem because it is usually expensive and time-consuming due to the subjectivity of emotion. Moreover, due to the multiplicity of "emotion", emotion annotations more naturally fit the paradigm of multi-label classification than that of multi-class classification since one instance (such as a sentence) may evoke a combination of multiple emotion categories. We thus investigated ways to obtain a set of high-quality emotion annotations ({instance, multi-emotion} paired data) from variable-quality crowdsourced annotations. A common quality control strategy for crowdsourced labeling tasks is to aggregate the responses provided by multiple annotators to produce a reliable annotation. Given that the categories of "emotion" have characteristics different from those of other kinds of labels, we propose incorporating domain-specific information of emotional consistencies across instances and contextual cues among emotion categories into the aggregation process. Experimental results demonstrate that, from a limited number of crowdsourced annotations, the proposed models enable gold standards to be more effectively estimated than the majority vote and the original domain-independent model.
{"title":"Multi-Emotion Estimation in Narratives from Crowdsourced Annotations","authors":"Lei Duan, S. Oyama, Haruhiko Sato, M. Kurihara","doi":"10.1145/2756406.2756910","DOIUrl":"https://doi.org/10.1145/2756406.2756910","url":null,"abstract":"Emotion annotations are important metadata for narrative texts in digital libraries. Such annotations are necessary for automatic text-to-speech conversion of narratives and affective education support and can be used as training data for machine learning algorithms to train automatic emotion detectors. However, obtaining high-quality emotion annotations is a challenging problem because it is usually expensive and time-consuming due to the subjectivity of emotion. Moreover, due to the multiplicity of \"emotion\", emotion annotations more naturally fit the paradigm of multi-label classification than that of multi-class classification since one instance (such as a sentence) may evoke a combination of multiple emotion categories. We thus investigated ways to obtain a set of high-quality emotion annotations ({instance, multi-emotion} paired data) from variable-quality crowdsourced annotations. A common quality control strategy for crowdsourced labeling tasks is to aggregate the responses provided by multiple annotators to produce a reliable annotation. Given that the categories of \"emotion\" have characteristics different from those of other kinds of labels, we propose incorporating domain-specific information of emotional consistencies across instances and contextual cues among emotion categories into the aggregation process. Experimental results demonstrate that, from a limited number of crowdsourced annotations, the proposed models enable gold standards to be more effectively estimated than the majority vote and the original domain-independent model.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126286390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nowadays, mathematical information is increasingly available in websites and repositories, such like ArXiv, Wikipedia and growing numbers of digital libraries. Mathematical formulae are highly structured and usually presented in layout presentations, such as PDF, LATEX and Presentation MathML. The differences of presentation between text and formulae challenge traditional text-based index and retrieval methods. To address the challenge, this paper proposes an upgraded Mathematical Information Retrieval (MIR) system, namely WikiMirs 3.0, based on the context, structure and importance of formulae in a document. In WikiMirs 3.0, users can easily "cut" formulae and contexts from PDF documents as well as type in queries. Furthermore, a novel hybrid indexing and matching model is proposed to support both exact and fuzzy matching. In the hybrid model, both context and structure information of formulae are taken into consideration. In addition, the concept of formula importance within a document is introduced into the model for more reasonable ranking. Experimental results, compared with two classical MIR systems, demonstrate that the proposed system along with the novel model provides higher accuracy and better ranking results over Wikipedia.
{"title":"WikiMirs 3.0: A Hybrid MIR System Based on the Context, Structure and Importance of Formulae in a Document","authors":"Yuehan Wang, Liangcai Gao, Simeng Wang, Zhi Tang, Xiaozhong Liu, Ke Yuan","doi":"10.1145/2756406.2756918","DOIUrl":"https://doi.org/10.1145/2756406.2756918","url":null,"abstract":"Nowadays, mathematical information is increasingly available in websites and repositories, such like ArXiv, Wikipedia and growing numbers of digital libraries. Mathematical formulae are highly structured and usually presented in layout presentations, such as PDF, LATEX and Presentation MathML. The differences of presentation between text and formulae challenge traditional text-based index and retrieval methods. To address the challenge, this paper proposes an upgraded Mathematical Information Retrieval (MIR) system, namely WikiMirs 3.0, based on the context, structure and importance of formulae in a document. In WikiMirs 3.0, users can easily \"cut\" formulae and contexts from PDF documents as well as type in queries. Furthermore, a novel hybrid indexing and matching model is proposed to support both exact and fuzzy matching. In the hybrid model, both context and structure information of formulae are taken into consideration. In addition, the concept of formula importance within a document is introduced into the model for more reasonable ranking. Experimental results, compared with two classical MIR systems, demonstrate that the proposed system along with the novel model provides higher accuracy and better ranking results over Wikipedia.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126086487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Web idea started on 1989 with a proposal from Sir Tim Berners-Lee. The first US website has been developed at SLAC on 1991. This early version of the Web and the subsequent updates until 1998 have been preserved by SLAC archive and history office for many years. In this paper, we discuss the strategy and techniques to reconstruct this early website and make it available through Stanford Web Archive Portal.
{"title":"Reconstruction of the US First Website","authors":"A. Alsum","doi":"10.1145/2756406.2756954","DOIUrl":"https://doi.org/10.1145/2756406.2756954","url":null,"abstract":"The Web idea started on 1989 with a proposal from Sir Tim Berners-Lee. The first US website has been developed at SLAC on 1991. This early version of the Web and the subsequent updates until 1998 have been preserved by SLAC archive and history office for many years. In this paper, we discuss the strategy and techniques to reconstruct this early website and make it available through Stanford Web Archive Portal.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122141559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Readers of news articles are typically faced with the problem of getting a good understanding of a complex story covered in an article. However, as news articles mainly focus on current or recent events, they often do not provide sufficient information about the history of an event or topic, leaving the user alone in discovering and exploring other news articles that might be related to a given article. This is a time consuming and non-trivial task, and the only help provided by some news outlets is some list of related articles or a few links within an article itself. What further complicates this task is that many of today's news stories cover a wide range of topics and events even within a single article, thus leaving the realm of traditional approaches that track a single topic or event over time. In this paper, we present a framework to link news articles based on temporal expressions that occur in the articles, following the idea "if an article refers to something in the past, then there should be an article about that something". Our approach aims to recover the chronology of one or more events and topics covered in an article, leading to an information network of articles that can be explored in a thematic and particular chronological fashion. For this, we propose a measure for the relatedness of articles that is primarily based on temporal expressions in articles but also exploits other information such as persons mentioned and keywords. We provide a comprehensive evaluation that demonstrates the functionality of our framework using a multi-source corpus of recent German news articles.
{"title":"Time will Tell: Temporal Linking of News Stories","authors":"Thomas Bögel, Michael Gertz","doi":"10.1145/2756406.2756919","DOIUrl":"https://doi.org/10.1145/2756406.2756919","url":null,"abstract":"Readers of news articles are typically faced with the problem of getting a good understanding of a complex story covered in an article. However, as news articles mainly focus on current or recent events, they often do not provide sufficient information about the history of an event or topic, leaving the user alone in discovering and exploring other news articles that might be related to a given article. This is a time consuming and non-trivial task, and the only help provided by some news outlets is some list of related articles or a few links within an article itself. What further complicates this task is that many of today's news stories cover a wide range of topics and events even within a single article, thus leaving the realm of traditional approaches that track a single topic or event over time. In this paper, we present a framework to link news articles based on temporal expressions that occur in the articles, following the idea \"if an article refers to something in the past, then there should be an article about that something\". Our approach aims to recover the chronology of one or more events and topics covered in an article, leading to an information network of articles that can be explored in a thematic and particular chronological fashion. For this, we propose a measure for the relatedness of articles that is primarily based on temporal expressions in articles but also exploits other information such as persons mentioned and keywords. We provide a comprehensive evaluation that demonstrates the functionality of our framework using a multi-source corpus of recent German news articles.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128340544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Terhi Nurmikko-Fuller, Kevin R. Page, P. Willcox, Jacob Jett, Christopher R. Maden, Timothy W. Cole, Colleen Fallaw, Megan Senseney, J. S. Downie
Bibliographic metadata standards are a longstanding mechanism for Digital Libraries to manage records and express relationships between them. As digital scholarship, particularly in the humanities, incorporates and manipulates these records in an increasingly direct manner, existing systems are proving insufficient for providing the underlying addressability and relational expressivity required to construct and interact with complex research collections. In this paper we describe motivations for these "worksets" and the technical requirements they raise. We survey the coverage of existing bibliographic ontologies in the context of meeting these scholarly needs, and finally provide an illustrated discussion of potential extensions that might fully realize a solution.
{"title":"Building Complex Research Collections in Digital Libraries: A Survey of Ontology Implications","authors":"Terhi Nurmikko-Fuller, Kevin R. Page, P. Willcox, Jacob Jett, Christopher R. Maden, Timothy W. Cole, Colleen Fallaw, Megan Senseney, J. S. Downie","doi":"10.1145/2756406.2756944","DOIUrl":"https://doi.org/10.1145/2756406.2756944","url":null,"abstract":"Bibliographic metadata standards are a longstanding mechanism for Digital Libraries to manage records and express relationships between them. As digital scholarship, particularly in the humanities, incorporates and manipulates these records in an increasingly direct manner, existing systems are proving insufficient for providing the underlying addressability and relational expressivity required to construct and interact with complex research collections. In this paper we describe motivations for these \"worksets\" and the technical requirements they raise. We survey the coverage of existing bibliographic ontologies in the context of meeting these scholarly needs, and finally provide an illustrated discussion of potential extensions that might fully realize a solution.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123740838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a detailed description of a full-day data digital curation tutorial held at JCDL'15.
本文描述了JCDL'15举办的全天数据数字策展教程的详细描述。
{"title":"Digital Data Curation Essentials for Data Scientists and Data Curators and Librarians","authors":"H. Tibbo, C. Hank","doi":"10.1145/2756406.2756928","DOIUrl":"https://doi.org/10.1145/2756406.2756928","url":null,"abstract":"This paper describes a detailed description of a full-day data digital curation tutorial held at JCDL'15.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"683 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134127862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}