2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)最新文献

英文中文

Low-cost semantic enhancement to digital library metadata and indexing: Simple yet effective strategies 数字图书馆元数据和索引的低成本语义增强:简单而有效的策略

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)

Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910910

A. Hinze, D. Bainbridge, S. Cunningham, J. S. Downie

Most existing digital libraries use traditional lexically-based retrieval techniques. For established systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing and query interface) would require major technological effort, and would most likely be disruptive. In this paper, we describe ways to use the results of semantic analysis and disambiguation, while retaining an existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.

大多数现有的数字图书馆使用传统的基于词汇的检索技术。对于已建立的系统，完全替换或甚至对文档检索机制(文档分析、索引策略、查询处理和查询接口)进行重大更改将需要大量的技术工作，并且很可能是破坏性的。在本文中，我们描述了使用语义分析和消歧结果的方法，同时保留了现有的基于关键字的搜索和词典索引。我们对此进行了设计，以便语义分析的输出(离线执行)适合直接导入到现有的数字图书馆元数据和索引结构中，从而无需修改体系结构即可合并。

引用次数: 6

ArchiveSpark: Efficient Web archive access, extraction and derivation ArchiveSpark:高效的Web存档访问、提取和派生

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)

Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910902

Helge Holzmann, V. Goel, Avishek Anand

Web archives are a valuable resource for researchers of various disciplines. However, to use them as a scholarly source, researchers require a tool that provides efficient access to Web archive data for extraction and derivation of smaller datasets. Besides efficient access we identify five other objectives based on practical researcher needs such as ease of use, extensibility and reusability. Towards these objectives we propose ArchiveSpark, a framework for efficient, distributed Web archive processing that builds a research corpus by working on existing and standardized data formats commonly held by Web archiving institutions. Performance optimizations in ArchiveSpark, facilitated by the use of a widely available metadata index, result in significant speed-ups of data processing. Our benchmarks show that ArchiveSpark is faster than alternative approaches without depending on any additional data stores while improving usability by seamlessly integrating queries and derivations with external tools.

网络档案是各学科研究人员的宝贵资源。然而，为了将它们用作学术来源，研究人员需要一种工具来提供对Web存档数据的有效访问，以便提取和派生较小的数据集。除了高效访问外，我们还根据研究人员的实际需求确定了其他五个目标，如易用性、可扩展性和可重用性。为了实现这些目标，我们提出了ArchiveSpark，这是一个高效、分布式的Web归档处理框架，通过处理Web归档机构常用的现有和标准化的数据格式来构建一个研究语料库。通过使用广泛可用的元数据索引，ArchiveSpark中的性能优化大大提高了数据处理的速度。我们的基准测试表明，ArchiveSpark在不依赖任何额外数据存储的情况下比其他方法更快，同时通过与外部工具无缝集成查询和派生来提高可用性。

引用次数: 30

Semantic bookworm: Mining literary resources revisited 语义书虫:重新挖掘文学资源

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)

Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925444

A. Hinze, Michael Coleman, S. Cunningham, D. Bainbridge

In this paper, we describe Semantic Bookworm - a tool that supports scholarly text analysis. In contrast to the text-based Bookworm tool, the Semantic Bookworm identifies semantic concepts.

在本文中，我们描述语义书虫-一个支持学术文本分析的工具。与基于文本的Bookworm工具相比，Semantic Bookworm可以识别语义概念。

引用次数: 3

Curve separation for line graphs in scholarly documents 学术文献中线形图的曲线分离

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)

Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925469

Sagnik Ray Choudhury, Shuting Wang, C. Lee Giles

Line graphs are abundant in scholarly papers. They are usually generated from a data table and that data can not be accessed. One important step in an automated data extraction pipeline is the curve separation problem: segmenting the pixels into separate curves. Previous work in this domain has focused on raster graphics extracted from scholarly PDFs, whereas most scholarly plots are embedded as vector graphics. We report a system to extract these plots as SVG images and show how that can improve both the accuracy (90%) and the scalability (5-8 seconds) of the curve separation problem.

线形图在学术论文中比比皆是。它们通常是从数据表生成的，并且该数据不能被访问。自动数据提取管道中的一个重要步骤是曲线分离问题:将像素分割成单独的曲线。该领域以前的工作主要集中在从学术pdf中提取光栅图形，而大多数学术图形都嵌入为矢量图形。我们报告了一个将这些图提取为SVG图像的系统，并展示了如何提高曲线分离问题的准确性(90%)和可扩展性(5-8秒)。

引用次数: 18

Quality assessment of Wikipedia articles without feature engineering 没有特征工程的维基百科文章的质量评估

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)

Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910917

Quang-Vinh Dang, C. Ignat

As Wikipedia became the largest human knowledge repository, quality measurement of its articles received a lot of attention during the last decade. Most research efforts focused on classification of Wikipedia articles quality by using a different feature set. However, so far, no “golden feature set” was proposed. In this paper, we present a novel approach for classifying Wikipedia articles by analysing their content rather than by considering a feature set. Our approach uses recent techniques in natural language processing and deep learning, and achieved a comparable result with the state-of-the-art.

随着维基百科成为最大的人类知识库，其文章的质量度量在过去十年中受到了很多关注。大多数的研究都是通过使用不同的特征集来对维基百科文章的质量进行分类。然而，到目前为止，还没有提出“黄金特性集”。在本文中，我们提出了一种通过分析维基百科文章的内容而不是考虑特征集来对其进行分类的新方法。我们的方法使用了自然语言处理和深度学习方面的最新技术，并取得了与最先进技术相当的结果。

引用次数: 54

Games for crowdsourcing mobile content: An analysis of contribution patterns 面向众包移动内容的游戏:贡献模式分析

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)

Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925455

D. Goh, Ei Pa Pa Pe-Than, C. S. Lee

Crowdsourcing of mobile content through games is becoming a major way of populating information-rich online environments. A current research gap is that actual usage patterns of crowdsourcing games has been inadequately investigated. We address this gap by comparing content creation patterns in a game for crowdsourcing mobile content against a non-game version. Results show distinct differences in the types and distribution of content created.

通过游戏众包移动内容正成为填充信息丰富的网络环境的主要方式。目前的一个研究缺口是，众包游戏的实际使用模式尚未得到充分调查。我们通过比较众包手机内容与非游戏版本的游戏内容创造模式来解决这一差距。结果显示，所创建内容的类型和分布存在明显差异。

引用次数: 2

Investigating cluster stability when analyzing transaction logs 在分析事务日志时调查集群稳定性

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)

Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910923

D. Grech, Paul D. Clough

Data-driven approaches have become increasingly popular as a means for analyzing transaction logs from web search engines and digital libraries, for example using cluster analysis to identify common patterns of search and navigation behavior. However, steps must be taken to ensure that results are reliable and repeatable. Although clustering patterns of user interaction behavior has been previously explored, one aspect that has received less attention is cluster stability that can be used to aid cluster validation. In this paper we compute stability based on the Jaccard coefficient to investigate the cluster stability when using different subsets of transaction log data from WorldCat.org. Results provide insights into different types of search behaviors and highlight that clusters of varying degrees of stability will result from the clustering process. However, we show that additional investigation beyond the results of cluster stability is required to fully validate the resulting clusters.

数据驱动的方法作为一种分析来自web搜索引擎和数字图书馆的事务日志的方法已经变得越来越流行，例如使用聚类分析来识别搜索和导航行为的常见模式。但是，必须采取步骤确保结果是可靠的和可重复的。尽管以前已经对用户交互行为的聚类模式进行了探索，但有一个方面受到的关注较少，那就是可用于辅助聚类验证的聚类稳定性。本文基于Jaccard系数计算稳定性，研究了使用WorldCat.org不同子集的事务日志数据时的聚类稳定性。结果提供了对不同类型搜索行为的见解，并强调了聚类过程将产生不同程度稳定性的聚类。然而，我们表明，除了集群稳定性的结果之外，还需要额外的调查来充分验证所得到的集群。

引用次数: 5

Content selection and curation for web archiving: The gatekeepers vs. the masses 网络存档的内容选择和管理:守门人vs.大众

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)

Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910913

Ian Milligan, Nick Ruest, Jimmy J. Lin

Any preservation effort must begin with an assessment of what content to preserve, and web archiving is no different. There have historically been two answers to the question “what should we archive?” The Internet Archive's broad entire-web crawls have been supplemented by narrower domain-or topic-specific collections gathered by numerous libraries. We can characterize this as content selection and curation by “gatekeepers”. In contrast, we have witnessed the emergence of another approach driven by “the masses” - we can archive pages that are contained in social media streams such as Twitter. The interesting question, of course, is how these approaches differ. We provide an answer to this question in the context of a case study about the 2015 Canadian federal elections. Based on our analysis, we recommend a hybrid approach that combines an effort driven by social media and more traditional curatorial methods.

任何保存工作都必须从评估要保存什么内容开始，网络存档也不例外。对于“我们应该存档什么?”这个问题，历来有两种答案。互联网档案馆广泛的整个网络爬虫已经被许多图书馆收集的更狭窄的领域或特定主题的集合所补充。我们可以将其描述为“看门人”的内容选择和管理。相比之下，我们目睹了另一种由“大众”驱动的方法的出现——我们可以存档包含在社交媒体流(如Twitter)中的页面。当然，有趣的问题是这些方法有何不同。我们以2015年加拿大联邦选举的案例研究为背景，给出了这个问题的答案。根据我们的分析，我们建议采用一种混合方法，将社交媒体推动的努力与更传统的策展方法结合起来。

引用次数: 25

Question identification and classification on an academic question answering site 一个学术问答网站的问题识别与分类

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)

Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925442

B. Ojokoh, Tobore Igbe, A. Araoye, Friday Ameh

Online communities such as wikis, blogs, forums, scientific communities and other social networking services have enabled new levels of interactions and interconnections among individuals, documents and data and have become places for people to seek and share expertise. In this paper, we propose a systematic approach to identification and classification of questions. The questions were first identified using semantic occurrence of Part of Speech (POS) tag in English Language, after which they were classified based on maximum probability value of Naïve Bayes classification. The model was validated and evaluated with experiments on some crawled web pages from ResearchGate.

维基、博客、论坛、科学社区和其他社交网络服务等在线社区使个人、文件和数据之间的互动和相互联系达到了新的水平，并成为人们寻求和分享专业知识的场所。在本文中，我们提出了一种系统的问题识别和分类方法。首先利用英语词性标注的语义出现度对问题进行识别，然后根据Naïve贝叶斯分类的最大概率值对问题进行分类。通过对ResearchGate网站抓取的部分网页进行实验验证和评价。

引用次数: 10

The state of practice and use of digital collections: The digital public library of America as a platform for research 数字馆藏的实践和使用状况:美国数字公共图书馆作为研究平台

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)

Pub Date : 2016-06-19 DOI: 10.1145/2910896.2926741

R. Frick

Summary from only given. In 2016, Digital Public Library of America is celebrating the third year of its cultural heritage metadata aggregator service. Since its launch, the DPLA collection has grown to represent over 13 million objects and over 1900 institutions, from small historical societies to large research libraries. With onramps, or hubs, in over 20 states, DPLA is well on its way to complete the coverage map by the end of 2017. As it continues to build this amazing dataset, DPLA is taking the time to examine what lessons are to be learned from this unprecedented resource, as the organization's sustainability is directly tied to how the collection grows, how it measures use, and proving its value to the communities it serves. What does this collection data tell us about the state of bibliographic holdings information, and the knowledge and skills and abilities of those who create records, not just for local use, but for use in other environments and contexts? How well does the metadata perform when it leaves its original context? Working with colleagues at Europeana, DPLA has begun investigating and addressing the problematic issues regarding access and reuse of digital objects in the collective by examining current ways rights are expressed in the metadata, working towards standardization of this information. Ms. Frick will discuss DPLA's rights work, as well as other potential areas of research and DPLA's strategy for future growth.

摘要仅供参考。2016年，美国数字公共图书馆正在庆祝其文化遗产元数据聚合服务的第三年。自成立以来，中国人民解放军馆藏已增长到超过1300万件物品和1900多个机构，从小型历史学会到大型研究图书馆。在20多个州设有匝道或枢纽，DPLA正在顺利完成2017年底的覆盖地图。随着DPLA继续构建这个惊人的数据集，它正在花时间研究从这个前所未有的资源中学到什么教训，因为该组织的可持续性直接关系到收集的增长方式，如何衡量使用情况，并向其服务的社区证明其价值。这些收集数据告诉我们关于书目馆藏信息的状态，以及那些创造记录的人的知识、技能和能力，不仅仅是为了本地使用，而是为了在其他环境和背景下使用?当元数据离开其原始上下文时，它的性能如何?与Europeana的同事合作，DPLA已经开始调查和解决有关访问和重用集体数字对象的问题，通过检查当前在元数据中表达权利的方式，努力实现这些信息的标准化。Frick女士将讨论DPLA的权利工作，以及其他潜在的研究领域和DPLA的未来发展战略。

{"title":"The state of practice and use of digital collections: The digital public library of America as a platform for research","authors":"R. Frick","doi":"10.1145/2910896.2926741","DOIUrl":"https://doi.org/10.1145/2910896.2926741","url":null,"abstract":"Summary from only given. In 2016, Digital Public Library of America is celebrating the third year of its cultural heritage metadata aggregator service. Since its launch, the DPLA collection has grown to represent over 13 million objects and over 1900 institutions, from small historical societies to large research libraries. With onramps, or hubs, in over 20 states, DPLA is well on its way to complete the coverage map by the end of 2017. As it continues to build this amazing dataset, DPLA is taking the time to examine what lessons are to be learned from this unprecedented resource, as the organization's sustainability is directly tied to how the collection grows, how it measures use, and proving its value to the communities it serves. What does this collection data tell us about the state of bibliographic holdings information, and the knowledge and skills and abilities of those who create records, not just for local use, but for use in other environments and contexts? How well does the metadata perform when it leaves its original context? Working with colleagues at Europeana, DPLA has begun investigating and addressing the problematic issues regarding access and reuse of digital objects in the collective by examining current ways rights are expressed in the metadata, working towards standardization of this information. Ms. Frick will discuss DPLA's rights work, as well as other potential areas of research and DPLA's strategy for future growth.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128871418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀