首页 > 最新文献

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)最新文献

英文 中文
Experimental evaluation of affective embodied agents in an information literacy game 信息素养博弈中情感具身行为人的实验评价
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910897
Yanru Guo, D. Goh
Digital game-based learning (DGBL) has become increasingly popular. With elements such as narratives, rewards, quests, and interactivity, DGBL can actively engage learners, stimulating desired learning outcomes. In an effort to increase its appeal, affective embodied agents (EAs) have been incorporated as learning companions or instructors in DGBL. However, claims about the efficacy of using affective EAs in DGBL have scarcely been subjected to empirical analysis. Therefore, this study aims to investigate the influence of affective EAs on students' learning outcome, motivation, perceived usefulness, and behavioral intention in an information literacy (IL) game. Eighty tertiary students were recruited and randomly assigned in a pre- and post-test between-subjects experiment with two conditions: affective-EA and no-EA. Results showed that participants benefited from interacting with the affective EA in the IL game in terms of attention, confidence, satisfaction, and intention to learn IL knowledge and to recommend. However, there were no significant differences in learning outcome, relevance, or intention to play the game. Contributions and limitations of this study are also discussed at the end.
基于游戏的数字学习(DGBL)越来越受欢迎。通过叙述、奖励、任务和交互性等元素,DGBL可以积极地吸引学习者,刺激期望的学习结果。为了增加其吸引力,情感具身代理(EAs)被纳入DGBL中作为学习伙伴或指导者。然而,关于在DGBL中使用情感ea的功效的说法很少受到实证分析。因此,本研究旨在探讨在信息素养(IL)游戏中,情感情感行为对学生学习结果、动机、感知有用性和行为意向的影响。本研究招募80名大学生,随机分为两组,分别进行情绪情绪和无情绪情绪的测试前和测试后实验。结果表明,参与者与情感EA在IL游戏中的互动在注意力、信心、满意度和学习IL知识和推荐的意愿方面受益。然而,在学习结果、相关性或玩游戏的意图方面没有显著差异。最后讨论了本研究的贡献和局限性。
{"title":"Experimental evaluation of affective embodied agents in an information literacy game","authors":"Yanru Guo, D. Goh","doi":"10.1145/2910896.2910897","DOIUrl":"https://doi.org/10.1145/2910896.2910897","url":null,"abstract":"Digital game-based learning (DGBL) has become increasingly popular. With elements such as narratives, rewards, quests, and interactivity, DGBL can actively engage learners, stimulating desired learning outcomes. In an effort to increase its appeal, affective embodied agents (EAs) have been incorporated as learning companions or instructors in DGBL. However, claims about the efficacy of using affective EAs in DGBL have scarcely been subjected to empirical analysis. Therefore, this study aims to investigate the influence of affective EAs on students' learning outcome, motivation, perceived usefulness, and behavioral intention in an information literacy (IL) game. Eighty tertiary students were recruited and randomly assigned in a pre- and post-test between-subjects experiment with two conditions: affective-EA and no-EA. Results showed that participants benefited from interacting with the affective EA in the IL game in terms of attention, confidence, satisfaction, and intention to learn IL knowledge and to recommend. However, there were no significant differences in learning outcome, relevance, or intention to play the game. Contributions and limitations of this study are also discussed at the end.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131317850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Exploiting network analysis to investigate topic dynamics in the digital library evaluation domain 利用网络分析研究数字图书馆评价领域的主题动态
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925464
Leonidas Papachristopoulos, Michalis Sfakakis, Nikos Kleidis, G. Tsakonas, C. Papatheodorou
The multidimensional nature of digital libraries evaluation domain and the amount of scientific production published on the field hinders and disorientates the interested researchers who contemplate to focus on the specific domain. These communities need guidance in order to exploit the considerable amount of data and the diversity of methods effectively as well as to identify new research goals and develop their plans for future works. This poster investigates the core topics of the digital library evaluation field and their impact by applying topic modeling and network analysis on a corpus of the JCDL, ECDL/TDPL and ICADL conferences proceedings in the period 2001-2013.
数字图书馆评估领域的多维性和在该领域发表的科学成果的数量阻碍和迷惑了那些想要专注于特定领域的感兴趣的研究人员。这些社区需要指导,以便有效地利用大量数据和方法的多样性,以及确定新的研究目标和制定未来工作的计划。这张海报通过对2001-2013年期间JCDL、ECDL/TDPL和ICADL会议论文集的主题建模和网络分析,探讨了数字图书馆评估领域的核心主题及其影响。
{"title":"Exploiting network analysis to investigate topic dynamics in the digital library evaluation domain","authors":"Leonidas Papachristopoulos, Michalis Sfakakis, Nikos Kleidis, G. Tsakonas, C. Papatheodorou","doi":"10.1145/2910896.2925464","DOIUrl":"https://doi.org/10.1145/2910896.2925464","url":null,"abstract":"The multidimensional nature of digital libraries evaluation domain and the amount of scientific production published on the field hinders and disorientates the interested researchers who contemplate to focus on the specific domain. These communities need guidance in order to exploit the considerable amount of data and the diversity of methods effectively as well as to identify new research goals and develop their plans for future works. This poster investigates the core topics of the digital library evaluation field and their impact by applying topic modeling and network analysis on a corpus of the JCDL, ECDL/TDPL and ICADL conferences proceedings in the period 2001-2013.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126836073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MemGator — A portable concurrent memento aggregator: Cross-platform CLI and server binaries in Go MemGator -一个可移植的并发纪念品聚合器:跨平台CLI和Go中的服务器二进制文件
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925452
Sawood Alam, Michael L. Nelson
The Memento protocol makes it easy to build a uniform lookup service to aggregate the holdings of web archives. However, there is a lack of tools to utilize this capability in archiving applications and research projects. We created MemGator, an open source, easy to use, portable, concurrent, cross-platform, and self-documented Memento aggregator CLI and server tool written in Go. MemGator implements all the basic features of a Memento aggregator (e.g., TimeMap and TimeGate) and gives the ability to customize various options including which archives are aggregated. It is being used heavily by tools and services such as Mink, WAIL, OldWeb. today, and archiving research projects and has proved to be reliable even in conditions of extreme load.
Memento协议使得构建统一的查找服务来聚合web存档变得很容易。然而,在归档应用程序和研究项目中缺乏利用此功能的工具。我们创建了MemGator,这是一个开源的、易于使用的、可移植的、并发的、跨平台的、自我文档的Memento聚合器CLI和服务器工具,用Go语言编写。MemGator实现了Memento聚合器的所有基本功能(例如,TimeMap和TimeGate),并提供了自定义各种选项的能力,包括聚合哪些档案。它被Mink、WAIL、OldWeb等工具和服务大量使用。今天,归档研究项目和已被证明是可靠的,即使在极端负载条件下。
{"title":"MemGator — A portable concurrent memento aggregator: Cross-platform CLI and server binaries in Go","authors":"Sawood Alam, Michael L. Nelson","doi":"10.1145/2910896.2925452","DOIUrl":"https://doi.org/10.1145/2910896.2925452","url":null,"abstract":"The Memento protocol makes it easy to build a uniform lookup service to aggregate the holdings of web archives. However, there is a lack of tools to utilize this capability in archiving applications and research projects. We created MemGator, an open source, easy to use, portable, concurrent, cross-platform, and self-documented Memento aggregator CLI and server tool written in Go. MemGator implements all the basic features of a Memento aggregator (e.g., TimeMap and TimeGate) and gives the ability to customize various options including which archives are aggregated. It is being used heavily by tools and services such as Mink, WAIL, OldWeb. today, and archiving research projects and has proved to be reliable even in conditions of extreme load.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"622 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127524617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
Avoiding the Drunkard's search: Investigating collection strategies for building a Twitter dataset 避免酒鬼的搜索:调查构建Twitter数据集的收集策略
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925433
Claire Llewellyn, Laura Cram, A. Favero
We investigate methods for collecting data to form an archive on the debate within Twitter surrounding the UK's inclusion in the EU. We use three strategies, gathering data using hashtags, extracting data from the random stream and collecting from users known to be discussing the debate. We explore the various bias in the resulting datasets.
我们调查了收集数据的方法,以形成推特上围绕英国加入欧盟的辩论的档案。我们使用三种策略,使用标签收集数据,从随机流中提取数据,以及从已知正在讨论辩论的用户中收集数据。我们探讨了结果数据集中的各种偏差。
{"title":"Avoiding the Drunkard's search: Investigating collection strategies for building a Twitter dataset","authors":"Claire Llewellyn, Laura Cram, A. Favero","doi":"10.1145/2910896.2925433","DOIUrl":"https://doi.org/10.1145/2910896.2925433","url":null,"abstract":"We investigate methods for collecting data to form an archive on the debate within Twitter surrounding the UK's inclusion in the EU. We use three strategies, gathering data using hashtags, extracting data from the random stream and collecting from users known to be discussing the debate. We explore the various bias in the resulting datasets.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133483105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Inventor name disambiguation for a patent database using a random forest and DBSCAN 使用随机森林和DBSCAN的专利数据库的发明人名称消歧
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925465
Kunho Kim, Madian Khabsa, C. Lee Giles
Inventor name disambiguation is the task that distinguishes each unique inventor from all other inventor records in a patent database. This task is essential for processing person name queries in order to get information related to a specific inventor, e.g. a list of all that inventor's patents. Using earlier work on author name disambiguation, we apply it to inventor name disambiguation. A random forest classifier is trained to classify whether each pair of inventor records is the same person. The DBSCAN algorithm is use for inventor record clustering, and its distance function is derived using the random forest classifier. For scalability, blocking functions are used to reduce the complexity of record matching and enable parallelization since each block can be run simultaneously. Tested on the USPTO patent database, 12 million inventor records were disambiguated in 6.5 hours. Evaluation on the labeled datasets from USPTO PatentsView competition shows our algorithm outperforms all algorithms submitted to the competition.
发明人姓名消歧义是将专利数据库中的每个唯一发明人与所有其他发明人记录区分开来的任务。此任务对于处理人名查询是必不可少的,以便获得与特定发明人相关的信息,例如该发明人的所有专利列表。利用前人关于作者姓名消歧的研究成果,将其应用于发明者姓名消歧中。训练随机森林分类器来分类每对发明家记录是否为同一个人。采用DBSCAN算法对发明家记录进行聚类,并利用随机森林分类器导出其距离函数。对于可伸缩性,块函数用于降低记录匹配的复杂性并启用并行化,因为每个块可以同时运行。在美国专利商标局的专利数据库中进行测试,在6.5小时内消除了1200万个发明家记录的歧义。对来自USPTO PatentsView竞赛的标记数据集的评估表明,我们的算法优于提交给竞赛的所有算法。
{"title":"Inventor name disambiguation for a patent database using a random forest and DBSCAN","authors":"Kunho Kim, Madian Khabsa, C. Lee Giles","doi":"10.1145/2910896.2925465","DOIUrl":"https://doi.org/10.1145/2910896.2925465","url":null,"abstract":"Inventor name disambiguation is the task that distinguishes each unique inventor from all other inventor records in a patent database. This task is essential for processing person name queries in order to get information related to a specific inventor, e.g. a list of all that inventor's patents. Using earlier work on author name disambiguation, we apply it to inventor name disambiguation. A random forest classifier is trained to classify whether each pair of inventor records is the same person. The DBSCAN algorithm is use for inventor record clustering, and its distance function is derived using the random forest classifier. For scalability, blocking functions are used to reduce the complexity of record matching and enable parallelization since each block can be run simultaneously. Tested on the USPTO patent database, 12 million inventor records were disambiguated in 6.5 hours. Evaluation on the labeled datasets from USPTO PatentsView competition shows our algorithm outperforms all algorithms submitted to the competition.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122353373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
ACHS'16: First international workshop on accessing cultural heritage at scale 第16届:第一次大规模获取文化遗产国际研讨会
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2926733
Paul D. Clough, Paula Goodale, M. Agosti, S. Lawless
The workshop aims to bring together researchers and practitioners to review and discuss ways of providing effective access to large-scale collections of cultural heritage content. The scale, variety and availability of cultural heritage content, combined with the variety of user groups with respect to background knowledge, specialist experience and needs is challenging in the context of existing access methods. In particular, we consider going beyond keyword search in large-scale cultural heritage digital libraries, in support of exploration and discovery. Our purpose for the workshop is to consider the opportunities and challenges presented by new and existing technologies, as well as the needs and experiences of diverse user communities. Our goal is to assess the current state-of the-art, to identify opportunities and establish future research priorities, informed by the combined knowledge and experience of academics and practitioners.
该研讨会旨在将研究人员和实践者聚集在一起,审查和讨论提供有效访问大规模文化遗产内容收藏的方法。文化遗产内容的规模、多样性和可获得性,再加上用户群体在背景知识、专业经验和需求方面的多样性,在现有获取方法的背景下具有挑战性。特别是,我们考虑在大型文化遗产数字图书馆中超越关键词搜索,以支持探索和发现。我们研讨会的目的是考虑新技术和现有技术带来的机遇和挑战,以及不同用户群体的需求和经验。我们的目标是评估当前的最先进的状态,识别机会,并建立未来的研究重点,由学者和从业者的综合知识和经验通知。
{"title":"ACHS'16: First international workshop on accessing cultural heritage at scale","authors":"Paul D. Clough, Paula Goodale, M. Agosti, S. Lawless","doi":"10.1145/2910896.2926733","DOIUrl":"https://doi.org/10.1145/2910896.2926733","url":null,"abstract":"The workshop aims to bring together researchers and practitioners to review and discuss ways of providing effective access to large-scale collections of cultural heritage content. The scale, variety and availability of cultural heritage content, combined with the variety of user groups with respect to background knowledge, specialist experience and needs is challenging in the context of existing access methods. In particular, we consider going beyond keyword search in large-scale cultural heritage digital libraries, in support of exploration and discovery. Our purpose for the workshop is to consider the opportunities and challenges presented by new and existing technologies, as well as the needs and experiences of diverse user communities. Our goal is to assess the current state-of the-art, to identify opportunities and establish future research priorities, informed by the combined knowledge and experience of academics and practitioners.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116263337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Who are the rising stars in academia? 谁是学术界冉冉升起的新星?
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925436
Jun Zhang, Zhaolong Ning, Xiaomei Bai, Wei Wang, Shuo Yu, Feng Xia
This paper proposes a novel method named ScholarRank to evaluate the scientific impact of rising stars. Our proposed ScholarRank integrates the merits of both statistical indicators and influence calculation algorithms in heterogeneous academic networks. The ScholarRank method considers three factors, which are the citation counts of authors, the mutual influence among coauthors and the mutual reinforce process among different entities in heterogeneous academic networks. Through experiments on real datasets, we demonstrate that our ScholarRank can efficiently select more top ranking rising stars than other methods.
本文提出了一种名为ScholarRank的新方法来评估新星的科学影响。我们提出的ScholarRank综合了异构学术网络中统计指标和影响力计算算法的优点。ScholarRank方法考虑了异质学术网络中作者被引次数、共同作者之间的相互影响和不同实体之间的相互强化过程三个因素。通过在真实数据集上的实验,我们证明了我们的ScholarRank比其他方法可以有效地选择更多的顶级排名新星。
{"title":"Who are the rising stars in academia?","authors":"Jun Zhang, Zhaolong Ning, Xiaomei Bai, Wei Wang, Shuo Yu, Feng Xia","doi":"10.1145/2910896.2925436","DOIUrl":"https://doi.org/10.1145/2910896.2925436","url":null,"abstract":"This paper proposes a novel method named ScholarRank to evaluate the scientific impact of rising stars. Our proposed ScholarRank integrates the merits of both statistical indicators and influence calculation algorithms in heterogeneous academic networks. The ScholarRank method considers three factors, which are the citation counts of authors, the mutual influence among coauthors and the mutual reinforce process among different entities in heterogeneous academic networks. Through experiments on real datasets, we demonstrate that our ScholarRank can efficiently select more top ranking rising stars than other methods.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121926019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
PDFFigures 2.0: Mining figures from research papers pdfigures 2.0:来自研究论文的采矿数据
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910904
Christopher Clark, S. Divvala
Figures and tables are key sources of information in many scholarly documents. However, current academic search engines do not make use of figures and tables when semantically parsing documents or presenting document summaries to users. To facilitate these applications we develop an algorithm that extracts figures, tables, and captions from documents called “PDFFigures 2.0.” Our proposed approach analyzes the structure of individual pages by detecting captions, graphical elements, and chunks of body text, and then locates figures and tables by reasoning about the empty regions within that text. To evaluate our work, we introduce a new dataset of computer science papers, along with ground truth labels for the locations of the figures, tables, and captions within them. Our algorithm achieves impressive results (94% precision at 90% recall) on this dataset surpassing previous state of the art. Further, we show how our framework was used to extract figures from a corpus of over one million papers, and how the resulting extractions were integrated into the user interface of a smart academic search engine, Semantic Scholar (www.semanticscholar.org). Finally, we present results of exploratory data analysis completed on the extracted figures as well as an extension of our method for the task of section title extraction. We release our dataset and code on our project webpage for enabling future research (http://pdffigures2.allenai.org).
在许多学术文献中,图表是重要的信息来源。然而,目前的学术搜索引擎在对文档进行语义分析或向用户呈现文档摘要时,并没有使用图形和表格。为了方便这些应用程序,我们开发了一种算法,可以从称为“pdfigures 2.0”的文档中提取图形、表格和标题。我们提出的方法通过检测标题、图形元素和正文块来分析单个页面的结构,然后通过推理文本中的空白区域来定位图形和表格。为了评估我们的工作,我们引入了一个新的计算机科学论文数据集,并为其中的图形、表格和标题的位置提供了真实值标签。我们的算法在这个数据集上取得了令人印象深刻的结果(94%的准确率和90%的召回率),超过了以前的技术水平。此外,我们展示了如何使用我们的框架从超过一百万篇论文的语料库中提取图形,以及如何将结果提取集成到智能学术搜索引擎Semantic Scholar (www.semanticscholar.org)的用户界面中。最后,我们提出了对提取的图形完成探索性数据分析的结果,以及对我们的方法进行章节标题提取任务的扩展。我们在我们的项目网页上发布了我们的数据集和代码,以便将来的研究(http://pdffigures2.allenai.org)。
{"title":"PDFFigures 2.0: Mining figures from research papers","authors":"Christopher Clark, S. Divvala","doi":"10.1145/2910896.2910904","DOIUrl":"https://doi.org/10.1145/2910896.2910904","url":null,"abstract":"Figures and tables are key sources of information in many scholarly documents. However, current academic search engines do not make use of figures and tables when semantically parsing documents or presenting document summaries to users. To facilitate these applications we develop an algorithm that extracts figures, tables, and captions from documents called “PDFFigures 2.0.” Our proposed approach analyzes the structure of individual pages by detecting captions, graphical elements, and chunks of body text, and then locates figures and tables by reasoning about the empty regions within that text. To evaluate our work, we introduce a new dataset of computer science papers, along with ground truth labels for the locations of the figures, tables, and captions within them. Our algorithm achieves impressive results (94% precision at 90% recall) on this dataset surpassing previous state of the art. Further, we show how our framework was used to extract figures from a corpus of over one million papers, and how the resulting extractions were integrated into the user interface of a smart academic search engine, Semantic Scholar (www.semanticscholar.org). Finally, we present results of exploratory data analysis completed on the extracted figures as well as an extension of our method for the task of section title extraction. We release our dataset and code on our project webpage for enabling future research (http://pdffigures2.allenai.org).","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130554301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 116
Evaluating cost of cloud execution in a data repository 评估数据存储库中云执行的成本
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925454
Zhiwu Xie, Yinlin Chen, J. Speer, T. Walters
In this paper, we utilize a set of controlled experiments to benchmark the cost associated with the cloud execution of typical repository functions such as ingestion, fixity checking, and heavy data processing. We focus on the repository service pattern where content is explicitly stored away from where it is processed. We measured the processing speed and unit cost of each scenario using a large sensor dataset and Amazon Web Services (AWS). The initial results reveal three distinct cost patterns: 1) spend more to buy up to proportionally faster services; 2) more money does not necessarily buy better performance; and 3) spend less, but faster. Further investigations into these performance and cost patterns will help repositories to form a more effective operation strategy.
在本文中,我们利用一组受控实验来基准计算与云执行典型存储库功能(如摄取、固定检查和大量数据处理)相关的成本。我们将重点关注存储库服务模式,其中内容显式地存储在远离其处理位置的地方。我们使用大型传感器数据集和Amazon Web Services (AWS)测量了每个场景的处理速度和单位成本。初步结果揭示了三种不同的成本模式:1)花更多的钱购买按比例更快的服务;2)更多的钱不一定买到更好的性能;3)花费更少,但速度更快。对这些性能和成本模式的进一步研究将有助于存储库形成更有效的操作策略。
{"title":"Evaluating cost of cloud execution in a data repository","authors":"Zhiwu Xie, Yinlin Chen, J. Speer, T. Walters","doi":"10.1145/2910896.2925454","DOIUrl":"https://doi.org/10.1145/2910896.2925454","url":null,"abstract":"In this paper, we utilize a set of controlled experiments to benchmark the cost associated with the cloud execution of typical repository functions such as ingestion, fixity checking, and heavy data processing. We focus on the repository service pattern where content is explicitly stored away from where it is processed. We measured the processing speed and unit cost of each scenario using a large sensor dataset and Amazon Web Services (AWS). The initial results reveal three distinct cost patterns: 1) spend more to buy up to proportionally faster services; 2) more money does not necessarily buy better performance; and 3) spend less, but faster. Further investigations into these performance and cost patterns will help repositories to form a more effective operation strategy.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124620644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Characterizing users tagging behavior in academic blogs 学术博客中用户标签行为特征分析
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925438
Lei Li, Chengzhi Zhang
Along with popular of academic social media, academic blogs are one of the user generated academic information that can be annotated using social tags for user's information retrieval and organization. In order to improve the existing social tagging system to satisfy the users' needs, users' tagging behavior need to be understood. However, there is no researches on characterizing user tagging behaviors of academic resources. In this paper, using the tag of academic blog as the research object, the author analyze user's tagging behaviors based on the characteristics of tags (tags-based features) and those related to blog contents (content-based features). These characteristics can be used to the academic tagging system to promote organization and propagation of academic knowledge.
随着学术社交媒体的兴起,学术博客作为用户生成的学术信息之一,可以使用社交标签进行标注,方便用户检索和组织。为了改进现有的社会标签系统以满足用户的需求,需要了解用户的标签行为。然而,对学术资源的用户标注行为进行表征的研究还不多见。本文以学术博客的标签为研究对象,从标签本身的特征(基于标签的特征)和与博客内容相关的特征(基于内容的特征)两方面分析用户的标签行为。这些特点可以用于学术标注系统,促进学术知识的组织和传播。
{"title":"Characterizing users tagging behavior in academic blogs","authors":"Lei Li, Chengzhi Zhang","doi":"10.1145/2910896.2925438","DOIUrl":"https://doi.org/10.1145/2910896.2925438","url":null,"abstract":"Along with popular of academic social media, academic blogs are one of the user generated academic information that can be annotated using social tags for user's information retrieval and organization. In order to improve the existing social tagging system to satisfy the users' needs, users' tagging behavior need to be understood. However, there is no researches on characterizing user tagging behaviors of academic resources. In this paper, using the tag of academic blog as the research object, the author analyze user's tagging behaviors based on the characteristics of tags (tags-based features) and those related to blog contents (content-based features). These characteristics can be used to the academic tagging system to promote organization and propagation of academic knowledge.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133816796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1