首页 > 最新文献

Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries最新文献

英文 中文
Big Data Text Summarization for Events: A Problem Based Learning Course 事件的大数据文本摘要:基于问题的学习课程
Pub Date : 2015-06-21 DOI: 10.1145/2756406.2756943
Tarek Kanan, Xuan Zhang, M. Magdy, E. Fox
Problem/project Based Learning (PBL) is a highly effective student-centered teaching method, where student teams learn by solving problems. This paper describes an instance of PBL applied to digital library education. We show the design, implementation, results, and partial evaluation of a Computational Linguistics course that provides students an opportunity to engage in active learning about adding value to digital libraries with large collections of text, i.e., one aspect of "big data." Students are engaging in PBL with the semester long challenge of generating good English summaries of an event, given a large collection from our webpage archives. Six teams, each working with a different type of event, and applying three different summarization methods, learned how to generate good summaries; these have fair precision relative to the Wikipedia page that describes their event.
基于问题/项目的学习(PBL)是一种以学生为中心的高效教学方法,学生团队通过解决问题来学习。本文介绍了PBL在数字图书馆教育中的应用实例。我们展示了一门计算语言学课程的设计、实施、结果和部分评估,该课程为学生提供了一个积极学习的机会,让他们了解如何利用大量文本为数字图书馆增加价值,即“大数据”的一个方面。在PBL课程中,学生们将面临一个学期的挑战,即从我们的网页档案中收集大量内容,为一个事件生成良好的英语摘要。六个小组,每个小组处理不同类型的事件,并应用三种不同的总结方法,学习如何生成好的总结;相对于描述其事件的维基百科页面,它们具有相当的精确度。
{"title":"Big Data Text Summarization for Events: A Problem Based Learning Course","authors":"Tarek Kanan, Xuan Zhang, M. Magdy, E. Fox","doi":"10.1145/2756406.2756943","DOIUrl":"https://doi.org/10.1145/2756406.2756943","url":null,"abstract":"Problem/project Based Learning (PBL) is a highly effective student-centered teaching method, where student teams learn by solving problems. This paper describes an instance of PBL applied to digital library education. We show the design, implementation, results, and partial evaluation of a Computational Linguistics course that provides students an opportunity to engage in active learning about adding value to digital libraries with large collections of text, i.e., one aspect of \"big data.\" Students are engaging in PBL with the semester long challenge of generating good English summaries of an event, given a large collection from our webpage archives. Six teams, each working with a different type of event, and applying three different summarization methods, learned how to generate good summaries; these have fair precision relative to the Wikipedia page that describes their event.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116999490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Before the Repository: Defining the Preservation Threats to Research Data in the Lab 在存储库之前:定义实验室中研究数据的保存威胁
Pub Date : 2015-06-21 DOI: 10.1145/2756406.2756909
Stacy T. Kowalczyk
This paper describes the results of a large survey designed to quantify the risks and threats to the preservation of the research data in the lab and to determine the mitigating actions of researchers. A total of 724 National Science Foundation awardees completed this survey. Identifying risks and threats to digital preservation has been a significant research stream. Much of this work has been within the context of a preservation technology infrastructure such as data archives for a digital repository. This study looks at the risks and threats to research data prior to its inclusion in a preservation technology infrastructure. The greatest threat to preservation is human error, followed by equipment malfunction, obsolete software, and data corruption. Lost and mislabeled media are not components in the threat taxonomies developed for repositories; however, they do represent an important threat to research data in the lab. Researchers have recognized the need to mitigate the risks inherent in maintaining digital data by implementing data management in their lab environments and have taken their responsibility as data managers seriously; however, they would still prefer to have professional data management support.
本文描述了一项大型调查的结果,该调查旨在量化实验室中研究数据保存的风险和威胁,并确定研究人员的缓解措施。共有724名国家科学基金获得者完成了这项调查。识别数字保存的风险和威胁一直是重要的研究方向。大部分工作都是在保存技术基础设施(如数字存储库的数据存档)的背景下进行的。本研究着眼于研究数据在纳入保存技术基础设施之前的风险和威胁。保存数据的最大威胁是人为失误,其次是设备故障、软件过时和数据损坏。丢失和贴错标签的媒体不在为存储库开发的威胁分类中;然而,它们确实对实验室的研究数据构成了重大威胁。研究人员已经认识到需要通过在实验室环境中实施数据管理来减轻维护数字数据所固有的风险,并认真对待他们作为数据管理人员的责任;然而,他们仍然希望有专业的数据管理支持。
{"title":"Before the Repository: Defining the Preservation Threats to Research Data in the Lab","authors":"Stacy T. Kowalczyk","doi":"10.1145/2756406.2756909","DOIUrl":"https://doi.org/10.1145/2756406.2756909","url":null,"abstract":"This paper describes the results of a large survey designed to quantify the risks and threats to the preservation of the research data in the lab and to determine the mitigating actions of researchers. A total of 724 National Science Foundation awardees completed this survey. Identifying risks and threats to digital preservation has been a significant research stream. Much of this work has been within the context of a preservation technology infrastructure such as data archives for a digital repository. This study looks at the risks and threats to research data prior to its inclusion in a preservation technology infrastructure. The greatest threat to preservation is human error, followed by equipment malfunction, obsolete software, and data corruption. Lost and mislabeled media are not components in the threat taxonomies developed for repositories; however, they do represent an important threat to research data in the lab. Researchers have recognized the need to mitigate the risks inherent in maintaining digital data by implementing data management in their lab environments and have taken their responsibility as data managers seriously; however, they would still prefer to have professional data management support.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121463678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
The HathiTrust Research Center: Providing analytic access to the HathiTrust Digital Library's 4.7 billion pages HathiTrust研究中心:提供对HathiTrust数字图书馆47亿页的分析访问
Pub Date : 2015-06-21 DOI: 10.1145/2756406.2771494
J. S. Downie
This lecture provides an update on the recent developments and activities of the HathiTrust Research Center (HTRC). The HTRC is the research arm of the HathiTrust, an online repository dedicated to the provision of access to a comprehensive body of published works for scholarship and education. The HathiTrust is a partnership of over 100 major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future. Membership is open to institutions worldwide. Over 13.1 million volumes (4.7 billion pages) have been ingested into the HathiTrust digital archive from sources including Google Books, member university libraries, the Internet Archive, and numerous private collections. The HTRC is dedicated to facilitating scholarship by enabling analytic access to the corpus, developing research tools, fostering research projects and communities, and providing additional resources such as enhanced metadata and indices that will assist scholars to more easily exploit the HathiTrust materials. This talk will outline the mission, goals and structure of the HTRC. It will also provide an overview of recent work being conducted on a range of projects, partnerships and initiatives. Projects include Workset Creation for Scholarly Analysis project (WCSA, funded by the Andrew W. Mellon Foundation) and the HathiTrust + Bookworm project (HT+BW, funded by the National Endowment for the Humanities). HTRC's involvement with the NOVEL(TM) text mining project and the Single Interface for Music Score Searching and Analysis (SIMSSA) project, both funded by the SSHRC Partnership Grant programme, will be introduced. The HTRC's new feature extraction and Data Capsule initiatives, part of its ongoing work its ongoing efforts to enable the non-consumptive analyses of the approximately 8 million volumes under copyright restrictions will also be discussed. The talk will conclude with some suggestions on how the non-consumptive research model might be improved upon and possibly extended beyond the HathiTrust context.
本讲座将介绍HathiTrust研究中心(HTRC)的最新发展和活动。HTRC是HathiTrust的研究机构,HathiTrust是一个在线存储库,致力于为学术和教育提供全面的已出版作品。HathiTrust是由100多家主要研究机构和图书馆组成的伙伴关系,致力于确保文化记录在未来长期得到保存和访问。会员资格向世界各地的机构开放。超过1310万卷(47亿页)已经被纳入HathiTrust的数字档案,来源包括谷歌图书、成员大学图书馆、互联网档案馆和众多私人收藏。HTRC致力于通过提供对语料库的分析访问,开发研究工具,促进研究项目和社区,以及提供额外的资源,如增强元数据和索引,帮助学者更容易地利用HathiTrust材料,从而促进学术研究。本讲座将概述HTRC的使命、目标和结构。它还将概述最近在一系列项目、伙伴关系和倡议方面正在进行的工作。项目包括学术分析工作集创建项目(WCSA,由Andrew W. Mellon基金会资助)和HathiTrust + Bookworm项目(HT+BW,由国家人文基金会资助)。将介绍HTRC参与的NOVEL(TM)文本挖掘项目和乐谱搜索和分析单一界面(SIMSSA)项目,这两个项目都是由SSHRC伙伴关系资助计划资助的。HTRC的新特征提取和数据胶囊计划是其正在进行的工作的一部分,它正在努力使大约800万册受版权限制的非消耗性分析成为可能。讲座最后将提出一些建议,说明如何改进非消费性研究模式,并可能将其扩展到HathiTrust之外。
{"title":"The HathiTrust Research Center: Providing analytic access to the HathiTrust Digital Library's 4.7 billion pages","authors":"J. S. Downie","doi":"10.1145/2756406.2771494","DOIUrl":"https://doi.org/10.1145/2756406.2771494","url":null,"abstract":"This lecture provides an update on the recent developments and activities of the HathiTrust Research Center (HTRC). The HTRC is the research arm of the HathiTrust, an online repository dedicated to the provision of access to a comprehensive body of published works for scholarship and education. The HathiTrust is a partnership of over 100 major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future. Membership is open to institutions worldwide. Over 13.1 million volumes (4.7 billion pages) have been ingested into the HathiTrust digital archive from sources including Google Books, member university libraries, the Internet Archive, and numerous private collections. The HTRC is dedicated to facilitating scholarship by enabling analytic access to the corpus, developing research tools, fostering research projects and communities, and providing additional resources such as enhanced metadata and indices that will assist scholars to more easily exploit the HathiTrust materials. This talk will outline the mission, goals and structure of the HTRC. It will also provide an overview of recent work being conducted on a range of projects, partnerships and initiatives. Projects include Workset Creation for Scholarly Analysis project (WCSA, funded by the Andrew W. Mellon Foundation) and the HathiTrust + Bookworm project (HT+BW, funded by the National Endowment for the Humanities). HTRC's involvement with the NOVEL(TM) text mining project and the Single Interface for Music Score Searching and Analysis (SIMSSA) project, both funded by the SSHRC Partnership Grant programme, will be introduced. The HTRC's new feature extraction and Data Capsule initiatives, part of its ongoing work its ongoing efforts to enable the non-consumptive analyses of the approximately 8 million volumes under copyright restrictions will also be discussed. The talk will conclude with some suggestions on how the non-consumptive research model might be improved upon and possibly extended beyond the HathiTrust context.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115272594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Session details: Session 9 - Archiving, Repositories, and Content 会话详细信息:会话9 -归档、存储库和内容
Maureen Henninger
{"title":"Session details: Session 9 - Archiving, Repositories, and Content","authors":"Maureen Henninger","doi":"10.1145/3260517","DOIUrl":"https://doi.org/10.1145/3260517","url":null,"abstract":"","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121736925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Case Study of Waiting List on WPLC Digital Library 基于WPLC的数字图书馆排队案例研究
Pub Date : 2015-06-21 DOI: 10.1145/2756406.2756961
Wooseob Jeong, H. Han, Laura Ridenour
With the increasing popularity of e-books and audiobooks provided by public libraries in the U.S., the demand does not seem to be met with sufficient supply, as many popular titles require months of waiting time. In this study, we collected data from the Wisconsin Public Library Consortium's digital libraries service once a day for more than two months for selected popular titles. This data reflects the current supply and demand of popular titles in public libraries' digital library services. Based on our data analysis and observation, we suggest ways to achieve faster circulation, which ultimately allows for better services to library users.
随着美国公共图书馆提供的电子书和有声书越来越受欢迎,似乎供不应求,因为许多流行的书籍需要等待数月的时间。在这项研究中,我们从威斯康星州公共图书馆联盟的数字图书馆服务中收集数据,每天一次,持续两个多月,收集选定的流行书目。这一数据反映了公共图书馆数字图书馆服务中流行书目的供求现状。根据我们的数据分析和观察,我们提出了实现更快流通的方法,从而最终为图书馆用户提供更好的服务。
{"title":"Case Study of Waiting List on WPLC Digital Library","authors":"Wooseob Jeong, H. Han, Laura Ridenour","doi":"10.1145/2756406.2756961","DOIUrl":"https://doi.org/10.1145/2756406.2756961","url":null,"abstract":"With the increasing popularity of e-books and audiobooks provided by public libraries in the U.S., the demand does not seem to be met with sufficient supply, as many popular titles require months of waiting time. In this study, we collected data from the Wisconsin Public Library Consortium's digital libraries service once a day for more than two months for selected popular titles. This data reflects the current supply and demand of popular titles in public libraries' digital library services. Based on our data analysis and observation, we suggest ways to achieve faster circulation, which ultimately allows for better services to library users.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117169866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling iCrawl:通过整合社交网络和集中网络爬行来提高网络收藏的新鲜度
Pub Date : 2015-06-21 DOI: 10.1145/2756406.2756925
Gerhard Gossen, Elena Demidova, T. Risse
Researchers in the Digital Humanities and journalists need to monitor, collect and analyze fresh online content regarding current events such as the Ebola outbreak or the Ukraine crisis on demand. However, existing focused crawling approaches only consider topical aspects while ignoring temporal aspects and therefore cannot achieve thematically coherent and fresh Web collections. Especially Social Media provide a rich source of fresh content, which is not used by state-of-the-art focused crawlers. In this paper we address the issues of enabling the collection of fresh and relevant Web and Social Web content for a topic of interest through seamless integration of Web and Social Media in a novel integrated focused crawler. The crawler collects Web and Social Media content in a single system and exploits the stream of fresh Social Media content for guiding the crawler.
数字人文学科的研究人员和记者需要根据需要监测、收集和分析有关埃博拉疫情或乌克兰危机等时事的新鲜在线内容。然而,现有的聚焦爬行方法只考虑主题方面,而忽略了时间方面,因此无法实现主题一致和新鲜的Web集合。特别是社交媒体提供了丰富的新鲜内容来源,这是最先进的集中爬虫所没有使用的。在本文中,我们通过在一个新颖的集成聚焦爬虫中无缝集成网络和社交媒体,解决了为感兴趣的主题收集新鲜和相关的网络和社交网络内容的问题。爬虫在单个系统中收集Web和社交媒体内容,并利用新鲜的社交媒体内容流来引导爬虫。
{"title":"iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling","authors":"Gerhard Gossen, Elena Demidova, T. Risse","doi":"10.1145/2756406.2756925","DOIUrl":"https://doi.org/10.1145/2756406.2756925","url":null,"abstract":"Researchers in the Digital Humanities and journalists need to monitor, collect and analyze fresh online content regarding current events such as the Ebola outbreak or the Ukraine crisis on demand. However, existing focused crawling approaches only consider topical aspects while ignoring temporal aspects and therefore cannot achieve thematically coherent and fresh Web collections. Especially Social Media provide a rich source of fresh content, which is not used by state-of-the-art focused crawlers. In this paper we address the issues of enabling the collection of fresh and relevant Web and Social Web content for a topic of interest through seamless integration of Web and Social Media in a novel integrated focused crawler. The crawler collects Web and Social Media content in a single system and exploits the stream of fresh Social Media content for guiding the crawler.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"358 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122728865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Automatic Classification of Research Documents using Textual Entailment 基于文本蕴涵的研究文献自动分类
Pub Date : 2015-06-21 DOI: 10.1145/2756406.2756960
B. Ojokoh, O. Omisore, O. W. Samuel
Exploring the accumulative nature of Internet documents has become a rising issue that requires systematic ways to construct what we need from what we have. Manual and semi-manual document classification techniques have facilitated retrieval and maintenance of document repositories for easy access; however, they are customarily painstaking and labor-intensive. Herein, we propose a document classification model using automatic access of natural language meaning. The model is made up of application, business, and storage layers. The business layer, as a core component, automatically extracts sentences containing keywords from research documents and classifies them using the geometrical similarity of their sentential entailments.
探索互联网文档的累积性已经成为一个日益突出的问题,需要系统的方法从我们拥有的东西中构建我们需要的东西。手工和半手工文档分类技术促进了文档存储库的检索和维护,便于访问;然而,他们通常是艰苦和劳动密集型的。本文提出了一种基于自然语言语义自动获取的文档分类模型。该模型由应用程序层、业务层和存储层组成。业务层作为核心组件,自动从研究文档中提取包含关键字的句子,并根据句子蕴涵的几何相似性对其进行分类。
{"title":"Automatic Classification of Research Documents using Textual Entailment","authors":"B. Ojokoh, O. Omisore, O. W. Samuel","doi":"10.1145/2756406.2756960","DOIUrl":"https://doi.org/10.1145/2756406.2756960","url":null,"abstract":"Exploring the accumulative nature of Internet documents has become a rising issue that requires systematic ways to construct what we need from what we have. Manual and semi-manual document classification techniques have facilitated retrieval and maintenance of document repositories for easy access; however, they are customarily painstaking and labor-intensive. Herein, we propose a document classification model using automatic access of natural language meaning. The model is made up of application, business, and storage layers. The business layer, as a core component, automatically extracts sentences containing keywords from research documents and classifies them using the geometrical similarity of their sentential entailments.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123182803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Session details: Session 3 - Big Data, Big Resources 会议详情:会议3 -大数据,大资源
G. Newton
{"title":"Session details: Session 3 - Big Data, Big Resources","authors":"G. Newton","doi":"10.1145/3260511","DOIUrl":"https://doi.org/10.1145/3260511","url":null,"abstract":"","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129981385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Demystifying the Semantics of Relevant Objects in Scholarly Collections: A Probabilistic Approach 学术收藏中相关对象语义的揭秘:一种概率方法
Pub Date : 2015-06-21 DOI: 10.1145/2756406.2756923
J. M. Pinto, Wolf-Tilo Balke
Efforts to make highly specialized knowledge accessible through scientific digital libraries need to go beyond mere bibliographic metadata, since here information search is mostly entity-centric. Previous work has realized this trend and developed different methods to recognize and (to some degree even automatically) annotate several important types of entities: genes and proteins, chemical structures and molecules, or drug names to name but a few. Moreover, such entities are often crossreferenced with entries in curated databases. However, several questions still remain to be answered: Given a scientific discipline what are the important entities? How can they be automatically identified? Are really all of them relevant, i.e. do all of them carry deeper semantics for assessing a publication? How can they be represented, described, and subsequently annotated? How can they be used for search tasks? In this work we focus on answering some of these questions. We claim that to bring the use of scientific digital libraries to the next level we must find treat topic-specific entities as first class citizens and deeply integrate their semantics into the search process. To support this we propose a novel probabilistic approach that not only successfully provides a solution to the integration problem, but also demonstrates how to leverage the knowledge encoded in entities and provide insights to explore the use of our approach in different scenarios. Finally, we show how our results can benefit information providers.
通过科学数字图书馆使高度专业化的知识可访问的努力需要超越仅仅书目元数据,因为这里的信息搜索主要是以实体为中心的。以前的工作已经意识到这一趋势,并开发了不同的方法来识别和(在某种程度上甚至是自动的)注释几种重要类型的实体:基因和蛋白质,化学结构和分子,或药物名称等等。此外,这些实体经常与管理数据库中的条目交叉引用。然而,仍有几个问题有待回答:给定一门科学学科,什么是重要的实体?如何自动识别它们?它们真的都是相关的吗?也就是说,它们是否都有更深层次的语义来评估一篇文章?如何表示、描述和随后注释它们?如何将它们用于搜索任务?在这项工作中,我们专注于回答其中的一些问题。我们声称,为了将科学数字图书馆的使用提升到一个新的水平,我们必须将特定主题的实体视为一流公民,并将其语义深度整合到搜索过程中。为了支持这一点,我们提出了一种新颖的概率方法,该方法不仅成功地提供了集成问题的解决方案,而且还演示了如何利用实体中编码的知识,并为探索在不同场景中使用我们的方法提供了见解。最后,我们展示了我们的结果如何使信息提供者受益。
{"title":"Demystifying the Semantics of Relevant Objects in Scholarly Collections: A Probabilistic Approach","authors":"J. M. Pinto, Wolf-Tilo Balke","doi":"10.1145/2756406.2756923","DOIUrl":"https://doi.org/10.1145/2756406.2756923","url":null,"abstract":"Efforts to make highly specialized knowledge accessible through scientific digital libraries need to go beyond mere bibliographic metadata, since here information search is mostly entity-centric. Previous work has realized this trend and developed different methods to recognize and (to some degree even automatically) annotate several important types of entities: genes and proteins, chemical structures and molecules, or drug names to name but a few. Moreover, such entities are often crossreferenced with entries in curated databases. However, several questions still remain to be answered: Given a scientific discipline what are the important entities? How can they be automatically identified? Are really all of them relevant, i.e. do all of them carry deeper semantics for assessing a publication? How can they be represented, described, and subsequently annotated? How can they be used for search tasks? In this work we focus on answering some of these questions. We claim that to bring the use of scientific digital libraries to the next level we must find treat topic-specific entities as first class citizens and deeply integrate their semantics into the search process. To support this we propose a novel probabilistic approach that not only successfully provides a solution to the integration problem, but also demonstrates how to leverage the knowledge encoded in entities and provide insights to explore the use of our approach in different scenarios. Finally, we show how our results can benefit information providers.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"os-44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127782629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Analyzing User Requests for Anime Recommendations 分析用户对动漫推荐的请求
Pub Date : 2015-06-21 DOI: 10.1145/2756406.2756969
Jin Ha Lee, Yun-Jeong Shim, Jacob Jett
Anime is increasingly becoming recognized as an important commercial product and cultural artifact. However, little is known regarding users' information needs and behavior related to anime. This study specifically attempts to improve our understanding of how people seek anime recommendations. We analyzed 546 user questions in natural language, collected from a Korean Q&A website Naver Knowledge-iN, where users are asking for anime recommendations. The findings suggest the importance of establishing robust metadata for the seven commonly used features for anime recommenders (i.e., title, genre, artistic style, story, character description, series title, and mood) in digital libraries, as well as allowing users to specify known anime and series titles as examples for seeking similar items, or examples of the kinds of items to be excluded.
动漫越来越被认为是一种重要的商业产品和文化产物。然而,用户对动漫的信息需求和行为却知之甚少。这项研究特别试图提高我们对人们如何寻求动漫推荐的理解。我们分析了546个自然语言的用户问题,这些问题收集自韩国问答网站Naver Knowledge-iN,用户在该网站上询问动漫推荐。研究结果表明,在数字图书馆中为动漫推荐的七个常用特征(即标题、类型、艺术风格、故事、角色描述、系列标题和情绪)建立健壮的元数据非常重要,同时允许用户指定已知的动漫和系列标题作为寻找类似项目的例子,或者要排除的项目类型的例子。
{"title":"Analyzing User Requests for Anime Recommendations","authors":"Jin Ha Lee, Yun-Jeong Shim, Jacob Jett","doi":"10.1145/2756406.2756969","DOIUrl":"https://doi.org/10.1145/2756406.2756969","url":null,"abstract":"Anime is increasingly becoming recognized as an important commercial product and cultural artifact. However, little is known regarding users' information needs and behavior related to anime. This study specifically attempts to improve our understanding of how people seek anime recommendations. We analyzed 546 user questions in natural language, collected from a Korean Q&A website Naver Knowledge-iN, where users are asking for anime recommendations. The findings suggest the importance of establishing robust metadata for the seven commonly used features for anime recommenders (i.e., title, genre, artistic style, story, character description, series title, and mood) in digital libraries, as well as allowing users to specify known anime and series titles as examples for seeking similar items, or examples of the kinds of items to be excluded.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127103118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1