首页 > 最新文献

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)最新文献

英文 中文
Coagmento 2.0: A system for capturing individual and group information seeking behavior 一个捕捉个人和群体信息寻求行为的系统
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925447
M. Mitsui, C. Shah
In this demo, we present Coagmento 2.0, a Web-based, open-source platform that provides support for one working in individual or group projects spanning multiple sessions that involve looking for, collecting, and synthesizing information. The system also provides a highly customizable platform for researchers who want to investigate individual and group information seeking behaviors in a lab or a field setting. The demo not only shows back-end components and front-end interaction elements of the system, but also how one could easily configure Coagmento for user studies involving information seeking/retrieval with digital libraries (including the Web).
在这个演示中,我们展示了一个基于web的开源平台,它为个人或团队项目提供支持,这些项目跨越多个会话,包括查找、收集和综合信息。该系统还为想要在实验室或现场环境中调查个人和群体信息寻求行为的研究人员提供了一个高度可定制的平台。该演示不仅展示了系统的后端组件和前端交互元素,而且还展示了如何为涉及数字图书馆(包括Web)的信息搜索/检索的用户研究轻松配置coagulmento。
{"title":"Coagmento 2.0: A system for capturing individual and group information seeking behavior","authors":"M. Mitsui, C. Shah","doi":"10.1145/2910896.2925447","DOIUrl":"https://doi.org/10.1145/2910896.2925447","url":null,"abstract":"In this demo, we present Coagmento 2.0, a Web-based, open-source platform that provides support for one working in individual or group projects spanning multiple sessions that involve looking for, collecting, and synthesizing information. The system also provides a highly customizable platform for researchers who want to investigate individual and group information seeking behaviors in a lab or a field setting. The demo not only shows back-end components and front-end interaction elements of the system, but also how one could easily configure Coagmento for user studies involving information seeking/retrieval with digital libraries (including the Web).","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126032959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Personal video collection management behavior 个人视频采集管理行为
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925440
S. Cunningham, D. Nichols, Judy Bowen
Video content typically consumes more storage space and bandwidth than other document types although users structure their content with the same organisational tools they use for smaller and simpler items. We analyze the `native' video management behavior as expressed in 35 self-interviews and diary studies produced by New Zealand students, to create a `rich picture' of personal video collections. We see that personal collections can have diffuse boundaries and many different intended uses - and that these information management needs are difficult to fulfill with their homegrown video collection management strategies.
视频内容通常比其他文档类型消耗更多的存储空间和带宽,尽管用户使用与用于更小和更简单的项目相同的组织工具来组织其内容。我们分析了35名新西兰学生的自我访谈和日记研究中表达的“本土”视频管理行为,以创建个人视频收藏的“丰富画面”。我们看到,个人收藏可能具有分散的边界和许多不同的预期用途,而这些信息管理需求很难用他们自己的视频收藏管理策略来满足。
{"title":"Personal video collection management behavior","authors":"S. Cunningham, D. Nichols, Judy Bowen","doi":"10.1145/2910896.2925440","DOIUrl":"https://doi.org/10.1145/2910896.2925440","url":null,"abstract":"Video content typically consumes more storage space and bandwidth than other document types although users structure their content with the same organisational tools they use for smaller and simpler items. We analyze the `native' video management behavior as expressed in 35 self-interviews and diary studies produced by New Zealand students, to create a `rich picture' of personal video collections. We see that personal collections can have diffuse boundaries and many different intended uses - and that these information management needs are difficult to fulfill with their homegrown video collection management strategies.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122071302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Extracting academic genealogy trees from the networked digital library of theses and dissertations 从论文和学位论文的网络化数字图书馆中提取学术谱系树
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910916
W. Dores, Fabrício Benevenuto, Alberto H. F. Laender
Along the history, many researchers provided remarkable contributions to science, not only advancing knowledge but also in terms of mentoring new scientists. Currently, identifying and studying the formation of researchers over the years is a challenging task as current repositories of theses and dissertations are cataloged in a decentralized way through many local digital libraries. In this paper, we give a first step towards building a large repository that records the academic genealogy of researchers across fields and countries. We crawled data from the Networked Digital Library of Theses and Dissertations (NDLTD) and develop a framework to extract academic genealogy trees from this data and provide a series of analyses that describe the main properties of the academic genealogy trees. Our effort identified interesting findings related to the structure of academic formation, which highlight the importance of cataloging academic genealogy trees. We hope our initial framework will be the basis of a much larger crowdsourcing system.
在历史上,许多研究人员为科学做出了卓越的贡献,不仅推动了知识的发展,而且还指导了新的科学家。目前,识别和研究多年来研究人员的形成是一项具有挑战性的任务,因为目前的论文和学位论文库是通过许多地方数字图书馆以分散的方式编目的。在本文中,我们向建立一个大型存储库迈出了第一步,该存储库记录了跨领域和国家的研究人员的学术谱系。我们从网络数字论文图书馆(NDLTD)中抓取数据,并开发了一个框架,从这些数据中提取学术谱系树,并提供了一系列描述学术谱系树主要属性的分析。我们的努力确定了与学术形成结构相关的有趣发现,这突出了编目学术谱系树的重要性。我们希望我们最初的框架将成为一个更大的众包系统的基础。
{"title":"Extracting academic genealogy trees from the networked digital library of theses and dissertations","authors":"W. Dores, Fabrício Benevenuto, Alberto H. F. Laender","doi":"10.1145/2910896.2910916","DOIUrl":"https://doi.org/10.1145/2910896.2910916","url":null,"abstract":"Along the history, many researchers provided remarkable contributions to science, not only advancing knowledge but also in terms of mentoring new scientists. Currently, identifying and studying the formation of researchers over the years is a challenging task as current repositories of theses and dissertations are cataloged in a decentralized way through many local digital libraries. In this paper, we give a first step towards building a large repository that records the academic genealogy of researchers across fields and countries. We crawled data from the Networked Digital Library of Theses and Dissertations (NDLTD) and develop a framework to extract academic genealogy trees from this data and provide a series of analyses that describe the main properties of the academic genealogy trees. Our effort identified interesting findings related to the structure of academic formation, which highlight the importance of cataloging academic genealogy trees. We hope our initial framework will be the basis of a much larger crowdsourcing system.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"249 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124006751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A mathematical information retrieval system based on RankBoost 基于RankBoost的数学信息检索系统
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925460
Ke Yuan, Liangcai Gao, Yuehan Wang, Xiaohan Yi, Zhi Tang
Mathematical Information Retrieval (MIR) systems are designed to help users to find related formulae and further understand the formulae in scientific documents. However, in existing MIR systems, nearly all the ranker models of MIR systems are based on tf-idf model, and few efforts have been made to discover the features besides the relevance between the query formula and related formulae. In this paper, we investigate a supervised ranking approach (RankBoost) in an MIR system, and we consider not only the relevance between a query formula and related formulae, but also the features of the query formula itself and plentiful features about the documents where the related formulae appear. Experimental results show that our system achieves better performance by comparing with state-of-the-art MIR systems.
数学信息检索(MIR)系统旨在帮助用户在科学文献中查找相关公式并进一步理解公式。然而,在现有的MIR系统中,几乎所有的MIR系统的排名模型都是基于tf-idf模型,除了查询公式和相关公式之间的相关性之外,很少有人去发现这些特征。在本文中,我们研究了MIR系统中的一种监督排序方法(RankBoost),我们不仅考虑了查询公式与相关公式之间的相关性,而且考虑了查询公式本身的特征以及相关公式出现的文档的大量特征。实验结果表明,与现有的MIR系统相比,我们的系统具有更好的性能。
{"title":"A mathematical information retrieval system based on RankBoost","authors":"Ke Yuan, Liangcai Gao, Yuehan Wang, Xiaohan Yi, Zhi Tang","doi":"10.1145/2910896.2925460","DOIUrl":"https://doi.org/10.1145/2910896.2925460","url":null,"abstract":"Mathematical Information Retrieval (MIR) systems are designed to help users to find related formulae and further understand the formulae in scientific documents. However, in existing MIR systems, nearly all the ranker models of MIR systems are based on tf-idf model, and few efforts have been made to discover the features besides the relevance between the query formula and related formulae. In this paper, we investigate a supervised ranking approach (RankBoost) in an MIR system, and we consider not only the relevance between a query formula and related formulae, but also the features of the query formula itself and plentiful features about the documents where the related formulae appear. Experimental results show that our system achieves better performance by comparing with state-of-the-art MIR systems.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129776230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Querylog-based assessment of retrievability bias in a large newspaper corpus 基于查询日志的大型报纸语料库可检索性偏差评估
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910907
Myriam C. Traub, Thaer Samar, J. V. Ossenbruggen, Jiyin He, A. D. Vries, L. Hardman
Bias in the retrieval of documents can directly influence the information access of a digital library. In the worst case, systematic favoritism for a certain type of document can render other parts of the collection invisible to users. This potential bias can be evaluated by measuring the retrievability for all documents in a collection. Previous evaluations have been performed on TREC collections using simulated query sets. The question remains, however, how representative this approach is of more realistic settings. To address this question, we investigate the effectiveness of the retrievability measure using a large digitized newspaper corpus, featuring two characteristics that distinguishes our experiments from previous studies: (1) compared to TREC collections, our collection contains noise originating from OCR processing, historical spelling and use of language; and (2) instead of simulated queries, the collection comes with real user query logs including click data. First, we assess the retrievability bias imposed on the newspaper collection by different IR models. We assess the retrievability measure and confirm its ability to capture the retrievability bias in our setup. Second, we show how simulated queries differ from real user queries regarding term frequency and prevalence of named entities, and how this affects the retrievability results.
文献检索中的偏差直接影响到数字图书馆的信息获取。在最坏的情况下,系统地偏爱某种类型的文档可能会使集合的其他部分对用户不可见。这种潜在的偏差可以通过测量集合中所有文档的可检索性来评估。前面的计算是使用模拟查询集对TREC集合执行的。然而,问题仍然是,这种方法在更现实的情况下有多大代表性。为了解决这个问题,我们使用一个大型数字化报纸语料库来研究可检索性措施的有效性,该实验具有与以往研究不同的两个特征:(1)与TREC集合相比,我们的集合包含来自OCR处理、历史拼写和语言使用的噪声;(2)该集合不是模拟查询,而是包含点击数据的真实用户查询日志。首先,我们评估了不同IR模型对报纸馆藏的可检索性偏差。在我们的设置中,我们评估了可检索性测量并确认其捕获可检索性偏差的能力。其次,我们将展示在术语频率和命名实体的流行度方面,模拟查询与真实用户查询有何不同,以及这如何影响可检索性结果。
{"title":"Querylog-based assessment of retrievability bias in a large newspaper corpus","authors":"Myriam C. Traub, Thaer Samar, J. V. Ossenbruggen, Jiyin He, A. D. Vries, L. Hardman","doi":"10.1145/2910896.2910907","DOIUrl":"https://doi.org/10.1145/2910896.2910907","url":null,"abstract":"Bias in the retrieval of documents can directly influence the information access of a digital library. In the worst case, systematic favoritism for a certain type of document can render other parts of the collection invisible to users. This potential bias can be evaluated by measuring the retrievability for all documents in a collection. Previous evaluations have been performed on TREC collections using simulated query sets. The question remains, however, how representative this approach is of more realistic settings. To address this question, we investigate the effectiveness of the retrievability measure using a large digitized newspaper corpus, featuring two characteristics that distinguishes our experiments from previous studies: (1) compared to TREC collections, our collection contains noise originating from OCR processing, historical spelling and use of language; and (2) instead of simulated queries, the collection comes with real user query logs including click data. First, we assess the retrievability bias imposed on the newspaper collection by different IR models. We assess the retrievability measure and confirm its ability to capture the retrievability bias in our setup. Second, we show how simulated queries differ from real user queries regarding term frequency and prevalence of named entities, and how this affects the retrievability results.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115898046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Low-cost semantic enhancement to digital library metadata and indexing: Simple yet effective strategies 数字图书馆元数据和索引的低成本语义增强:简单而有效的策略
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910910
A. Hinze, D. Bainbridge, S. Cunningham, J. S. Downie
Most existing digital libraries use traditional lexically-based retrieval techniques. For established systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing and query interface) would require major technological effort, and would most likely be disruptive. In this paper, we describe ways to use the results of semantic analysis and disambiguation, while retaining an existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.
大多数现有的数字图书馆使用传统的基于词汇的检索技术。对于已建立的系统,完全替换或甚至对文档检索机制(文档分析、索引策略、查询处理和查询接口)进行重大更改将需要大量的技术工作,并且很可能是破坏性的。在本文中,我们描述了使用语义分析和消歧结果的方法,同时保留了现有的基于关键字的搜索和词典索引。我们对此进行了设计,以便语义分析的输出(离线执行)适合直接导入到现有的数字图书馆元数据和索引结构中,从而无需修改体系结构即可合并。
{"title":"Low-cost semantic enhancement to digital library metadata and indexing: Simple yet effective strategies","authors":"A. Hinze, D. Bainbridge, S. Cunningham, J. S. Downie","doi":"10.1145/2910896.2910910","DOIUrl":"https://doi.org/10.1145/2910896.2910910","url":null,"abstract":"Most existing digital libraries use traditional lexically-based retrieval techniques. For established systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing and query interface) would require major technological effort, and would most likely be disruptive. In this paper, we describe ways to use the results of semantic analysis and disambiguation, while retaining an existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130240410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Routing memento requests using binary classifiers 使用二进制分类器路由纪念品请求
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910899
Nicolas J. Bornand, Lyudmila Balakireva, H. Sompel
The Memento protocol provides a uniform approach to query individual web archives. Soon after its emergence, Memento Aggregator infrastructure was introduced that supports querying across multiple archives simultaneously. An Aggregator generates a response by issuing the respective Memento request against each of the distributed archives it covers. As the number of archives grows, it becomes increasingly challenging to deliver aggregate responses while keeping response times and computational costs under control. Ad-hoc heuristic approaches have been introduced to address this challenge and research has been conducted aimed at optimizing query routing based on archive profiles. In this paper, we explore the use of binary, archive-specific classifiers generated on the basis of the content cached by an Aggregator, to determine whether or not to query an archive for a given URI. Our results turn out to be readily applicable and can help to significantly decrease both the number of requests and the overall response times without compromising on recall. We find, among others, that classifiers can reduce the average number of requests by 77% compared to a brute force approach on all archives, and the overall response time by 42% while maintaining a recall of 0.847.
Memento协议提供了一种统一的方法来查询单个web档案。在它出现后不久,Memento Aggregator基础设施就被引入,它支持同时跨多个档案进行查询。Aggregator通过针对它所覆盖的每个分布式归档发出各自的Memento请求来生成响应。随着归档数量的增长,在控制响应时间和计算成本的同时交付聚合响应变得越来越具有挑战性。已经引入了特别启发式方法来解决这一挑战,并且已经进行了旨在基于归档配置文件优化查询路由的研究。在本文中,我们探讨了基于Aggregator缓存的内容生成的二进制、特定于归档的分类器的使用,以确定是否为给定的URI查询归档。我们的结果很容易适用,可以帮助显著减少请求的数量和总体响应时间,而不会影响召回。我们发现,与暴力破解方法相比,分类器可以将所有档案的平均请求数量减少77%,总体响应时间减少42%,同时保持0.847的召回率。
{"title":"Routing memento requests using binary classifiers","authors":"Nicolas J. Bornand, Lyudmila Balakireva, H. Sompel","doi":"10.1145/2910896.2910899","DOIUrl":"https://doi.org/10.1145/2910896.2910899","url":null,"abstract":"The Memento protocol provides a uniform approach to query individual web archives. Soon after its emergence, Memento Aggregator infrastructure was introduced that supports querying across multiple archives simultaneously. An Aggregator generates a response by issuing the respective Memento request against each of the distributed archives it covers. As the number of archives grows, it becomes increasingly challenging to deliver aggregate responses while keeping response times and computational costs under control. Ad-hoc heuristic approaches have been introduced to address this challenge and research has been conducted aimed at optimizing query routing based on archive profiles. In this paper, we explore the use of binary, archive-specific classifiers generated on the basis of the content cached by an Aggregator, to determine whether or not to query an archive for a given URI. Our results turn out to be readily applicable and can help to significantly decrease both the number of requests and the overall response times without compromising on recall. We find, among others, that classifiers can reduce the average number of requests by 77% compared to a brute force approach on all archives, and the overall response time by 42% while maintaining a recall of 0.847.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128529426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Interplanetary Wayback: The permanent web archive 星际之路:永久的网络档案
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925467
Sawood Alam, Mat Kelly, Michael L. Nelson
To facilitate permanence and collaboration in web archives, we built Interplanetary Wayback to disseminate the contents of WARC files into the IPFS network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. We split the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, build a CDXJ index, and combine them at the time of replay. From a 1.0 GB sample Archive-It collection of WARCs containing 21,994 mementos, we found that on an average, 570 files can be indexed and disseminated into IPFS per minute. We also found that in our naive prototype implementation, replay took on an average 370 milliseconds per request.
为了促进网络档案的持久性和协作性,我们建立了星际回溯来将WARC文件的内容传播到IPFS网络中。IPFS是一种点对点内容可寻址的文件系统,它本质上允许重复数据删除并促进可选复制。在将WARC响应记录传播到IPFS之前,我们将其拆分报头和有效负载,以利用重复数据删除,构建CDXJ索引,并在重播时将它们组合在一起。从包含21,994个纪念品的1.0 GB的warc样本Archive-It集合中,我们发现平均每分钟可以索引570个文件并将其传播到IPFS中。我们还发现,在我们的原始原型实现中,每个请求平均需要370毫秒的重放时间。
{"title":"Interplanetary Wayback: The permanent web archive","authors":"Sawood Alam, Mat Kelly, Michael L. Nelson","doi":"10.1145/2910896.2925467","DOIUrl":"https://doi.org/10.1145/2910896.2925467","url":null,"abstract":"To facilitate permanence and collaboration in web archives, we built Interplanetary Wayback to disseminate the contents of WARC files into the IPFS network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. We split the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, build a CDXJ index, and combine them at the time of replay. From a 1.0 GB sample Archive-It collection of WARCs containing 21,994 mementos, we found that on an average, 570 files can be indexed and disseminated into IPFS per minute. We also found that in our naive prototype implementation, replay took on an average 370 milliseconds per request.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128842712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Early prediction of scholar popularity 学者人气的早期预测
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910905
Masoumeh Nezhadbiglari, Marcos André Gonçalves, J. Almeida
Prediction of scholar popularity has become an important research topic for a number of reasons. In this paper, we tackle the problem of predicting the popularity trend of scholars by concentrating on making predictions both as earlier and accurate as possible. In order to perform the prediction task, we first extract the popularity trends of scholars from a training set. To that end, we apply a time series clustering algorithm called K-Spectral Clustering (K-SC) to identify the popularity trends as cluster centroids. We then predict trends for scholars in a test set by solving a classification problem. Specifically, we first compute a set of measures for individual scholars based on the distance between earlier points in her particular popularity curve and the identified centroids. We then combine those distance measures with a set of academic features (e.g., number of publications, number of venues, etc) collected during the same monitoring period, and use them as input to a classification method. One aspect that distinguishes our method from other approaches is that the monitoring period, during which we gather information on each scholar popularity and academic features, is determined on a per scholar basis, as part of our approach. Using total citation count as measure of scientific popularity, we evaluate our solution on the popularity time series of more than 500,000 Computer Science scholars, gathered from Microsoft Azure Mar-ketplace1. The experimental results show that the our prediction method outperforms other alternative prediction methods. We also show how to apply our method jointly with regression models to improve the prediction of scholar popularity values (e.g., number of citations) at a given future time.
由于种种原因,学者人气预测已成为一个重要的研究课题。在本文中,我们通过尽可能早和准确地预测学者的流行趋势来解决预测问题。为了完成预测任务,我们首先从训练集中提取学者的流行趋势。为此,我们应用一种称为k -谱聚类(K-SC)的时间序列聚类算法来识别流行趋势作为聚类质心。然后,我们通过解决分类问题来预测测试集中学者的趋势。具体地说,我们首先计算了一组针对个别学者的测量,这些测量是基于她的特定人气曲线上较早的点与已识别的质心之间的距离。然后,我们将这些距离测量与同一监测期间收集的一组学术特征(例如,出版物数量,场地数量等)结合起来,并将它们作为分类方法的输入。我们的方法区别于其他方法的一个方面是,作为我们方法的一部分,监测期是根据每个学者确定的,在此期间,我们收集每位学者的受欢迎程度和学术特征的信息。使用总引用计数作为科学普及的衡量标准,我们根据来自Microsoft Azure market -ketplace1的50多万计算机科学学者的普及时间序列来评估我们的解决方案。实验结果表明,该预测方法优于其他预测方法。我们还展示了如何将我们的方法与回归模型联合应用,以提高对未来给定时间的学者人气值(例如,引用次数)的预测。
{"title":"Early prediction of scholar popularity","authors":"Masoumeh Nezhadbiglari, Marcos André Gonçalves, J. Almeida","doi":"10.1145/2910896.2910905","DOIUrl":"https://doi.org/10.1145/2910896.2910905","url":null,"abstract":"Prediction of scholar popularity has become an important research topic for a number of reasons. In this paper, we tackle the problem of predicting the popularity trend of scholars by concentrating on making predictions both as earlier and accurate as possible. In order to perform the prediction task, we first extract the popularity trends of scholars from a training set. To that end, we apply a time series clustering algorithm called K-Spectral Clustering (K-SC) to identify the popularity trends as cluster centroids. We then predict trends for scholars in a test set by solving a classification problem. Specifically, we first compute a set of measures for individual scholars based on the distance between earlier points in her particular popularity curve and the identified centroids. We then combine those distance measures with a set of academic features (e.g., number of publications, number of venues, etc) collected during the same monitoring period, and use them as input to a classification method. One aspect that distinguishes our method from other approaches is that the monitoring period, during which we gather information on each scholar popularity and academic features, is determined on a per scholar basis, as part of our approach. Using total citation count as measure of scientific popularity, we evaluate our solution on the popularity time series of more than 500,000 Computer Science scholars, gathered from Microsoft Azure Mar-ketplace1. The experimental results show that the our prediction method outperforms other alternative prediction methods. We also show how to apply our method jointly with regression models to improve the prediction of scholar popularity values (e.g., number of citations) at a given future time.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122056486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Can academic conferences promote research collaboration? 学术会议能促进研究合作吗?
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925446
Xiaoyan Su, Wei Wang, Shuo Yu, Chenxin Zhang, T. M. Bekele, Feng Xia
This work proposes to investigate the question of whether attending conference will breed new scientific collaboration based on the focal closure theory. Through the analysis of conference closure on individual and community level, we show that attending conference can promote new scientific collaborations, and conferences with more attendees and higher field ratings bring more new scientific collaborations.
本研究拟以焦点闭合理论为基础,探讨参加会议是否会孕育新的科学合作。通过个体和群体层面的会议闭幕分析,我们发现参加会议能够促进新的科学合作,参会人数越多、领域评分越高的会议带来更多的新的科学合作。
{"title":"Can academic conferences promote research collaboration?","authors":"Xiaoyan Su, Wei Wang, Shuo Yu, Chenxin Zhang, T. M. Bekele, Feng Xia","doi":"10.1145/2910896.2925446","DOIUrl":"https://doi.org/10.1145/2910896.2925446","url":null,"abstract":"This work proposes to investigate the question of whether attending conference will breed new scientific collaboration based on the focal closure theory. Through the analysis of conference closure on individual and community level, we show that attending conference can promote new scientific collaborations, and conferences with more attendees and higher field ratings bring more new scientific collaborations.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117183657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
期刊
2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1