首页 > 最新文献

2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)最新文献

英文 中文
How to identify specialized research communities related to a researcher's changing interests 如何识别与研究人员不断变化的兴趣相关的专门研究社区
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925450
Hamed Alhoori
Scholarly events and venues are increasing rapidly in number. This poses a challenge for researchers who seek to identify events and venues related to their work in order to draw more efficiently and comprehensively from published research and to share their own findings more effectively. Such efforts are hampered also by the fact that no rating system yet exists to assist researchers in culling the venues most relevant to their current readings and interests. This study describes a methodology we developed in response to this need, one that recommends scholarly venues related to researchers' specific interests according to personalized social web indicators. Our experiments applying our proposed rating and recommendation method show that it outperforms the baseline venue recommendations in terms of accuracy and ranking quality.
学术活动和场所的数量正在迅速增加。这给那些寻求确定与其工作相关的活动和场所的研究人员提出了挑战,以便更有效和全面地从已发表的研究中提取信息,并更有效地分享他们自己的发现。这种努力也受到了阻碍,因为目前还没有评级系统来帮助研究人员挑选与他们当前的阅读和兴趣最相关的场所。本研究描述了我们针对这一需求开发的一种方法,即根据个性化的社交网络指标,推荐与研究人员的特定兴趣相关的学术场所。我们应用我们提出的评级和推荐方法的实验表明,它在准确性和排名质量方面优于基线场地推荐。
{"title":"How to identify specialized research communities related to a researcher's changing interests","authors":"Hamed Alhoori","doi":"10.1145/2910896.2925450","DOIUrl":"https://doi.org/10.1145/2910896.2925450","url":null,"abstract":"Scholarly events and venues are increasing rapidly in number. This poses a challenge for researchers who seek to identify events and venues related to their work in order to draw more efficiently and comprehensively from published research and to share their own findings more effectively. Such efforts are hampered also by the fact that no rating system yet exists to assist researchers in culling the venues most relevant to their current readings and interests. This study describes a methodology we developed in response to this need, one that recommends scholarly venues related to researchers' specific interests according to personalized social web indicators. Our experiments applying our proposed rating and recommendation method show that it outperforms the baseline venue recommendations in terms of accuracy and ranking quality.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115535280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Coagmento 2.0: A system for capturing individual and group information seeking behavior 一个捕捉个人和群体信息寻求行为的系统
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925447
M. Mitsui, C. Shah
In this demo, we present Coagmento 2.0, a Web-based, open-source platform that provides support for one working in individual or group projects spanning multiple sessions that involve looking for, collecting, and synthesizing information. The system also provides a highly customizable platform for researchers who want to investigate individual and group information seeking behaviors in a lab or a field setting. The demo not only shows back-end components and front-end interaction elements of the system, but also how one could easily configure Coagmento for user studies involving information seeking/retrieval with digital libraries (including the Web).
在这个演示中,我们展示了一个基于web的开源平台,它为个人或团队项目提供支持,这些项目跨越多个会话,包括查找、收集和综合信息。该系统还为想要在实验室或现场环境中调查个人和群体信息寻求行为的研究人员提供了一个高度可定制的平台。该演示不仅展示了系统的后端组件和前端交互元素,而且还展示了如何为涉及数字图书馆(包括Web)的信息搜索/检索的用户研究轻松配置coagulmento。
{"title":"Coagmento 2.0: A system for capturing individual and group information seeking behavior","authors":"M. Mitsui, C. Shah","doi":"10.1145/2910896.2925447","DOIUrl":"https://doi.org/10.1145/2910896.2925447","url":null,"abstract":"In this demo, we present Coagmento 2.0, a Web-based, open-source platform that provides support for one working in individual or group projects spanning multiple sessions that involve looking for, collecting, and synthesizing information. The system also provides a highly customizable platform for researchers who want to investigate individual and group information seeking behaviors in a lab or a field setting. The demo not only shows back-end components and front-end interaction elements of the system, but also how one could easily configure Coagmento for user studies involving information seeking/retrieval with digital libraries (including the Web).","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126032959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Personal video collection management behavior 个人视频采集管理行为
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925440
S. Cunningham, D. Nichols, Judy Bowen
Video content typically consumes more storage space and bandwidth than other document types although users structure their content with the same organisational tools they use for smaller and simpler items. We analyze the `native' video management behavior as expressed in 35 self-interviews and diary studies produced by New Zealand students, to create a `rich picture' of personal video collections. We see that personal collections can have diffuse boundaries and many different intended uses - and that these information management needs are difficult to fulfill with their homegrown video collection management strategies.
视频内容通常比其他文档类型消耗更多的存储空间和带宽,尽管用户使用与用于更小和更简单的项目相同的组织工具来组织其内容。我们分析了35名新西兰学生的自我访谈和日记研究中表达的“本土”视频管理行为,以创建个人视频收藏的“丰富画面”。我们看到,个人收藏可能具有分散的边界和许多不同的预期用途,而这些信息管理需求很难用他们自己的视频收藏管理策略来满足。
{"title":"Personal video collection management behavior","authors":"S. Cunningham, D. Nichols, Judy Bowen","doi":"10.1145/2910896.2925440","DOIUrl":"https://doi.org/10.1145/2910896.2925440","url":null,"abstract":"Video content typically consumes more storage space and bandwidth than other document types although users structure their content with the same organisational tools they use for smaller and simpler items. We analyze the `native' video management behavior as expressed in 35 self-interviews and diary studies produced by New Zealand students, to create a `rich picture' of personal video collections. We see that personal collections can have diffuse boundaries and many different intended uses - and that these information management needs are difficult to fulfill with their homegrown video collection management strategies.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122071302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A mathematical information retrieval system based on RankBoost 基于RankBoost的数学信息检索系统
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925460
Ke Yuan, Liangcai Gao, Yuehan Wang, Xiaohan Yi, Zhi Tang
Mathematical Information Retrieval (MIR) systems are designed to help users to find related formulae and further understand the formulae in scientific documents. However, in existing MIR systems, nearly all the ranker models of MIR systems are based on tf-idf model, and few efforts have been made to discover the features besides the relevance between the query formula and related formulae. In this paper, we investigate a supervised ranking approach (RankBoost) in an MIR system, and we consider not only the relevance between a query formula and related formulae, but also the features of the query formula itself and plentiful features about the documents where the related formulae appear. Experimental results show that our system achieves better performance by comparing with state-of-the-art MIR systems.
数学信息检索(MIR)系统旨在帮助用户在科学文献中查找相关公式并进一步理解公式。然而,在现有的MIR系统中,几乎所有的MIR系统的排名模型都是基于tf-idf模型,除了查询公式和相关公式之间的相关性之外,很少有人去发现这些特征。在本文中,我们研究了MIR系统中的一种监督排序方法(RankBoost),我们不仅考虑了查询公式与相关公式之间的相关性,而且考虑了查询公式本身的特征以及相关公式出现的文档的大量特征。实验结果表明,与现有的MIR系统相比,我们的系统具有更好的性能。
{"title":"A mathematical information retrieval system based on RankBoost","authors":"Ke Yuan, Liangcai Gao, Yuehan Wang, Xiaohan Yi, Zhi Tang","doi":"10.1145/2910896.2925460","DOIUrl":"https://doi.org/10.1145/2910896.2925460","url":null,"abstract":"Mathematical Information Retrieval (MIR) systems are designed to help users to find related formulae and further understand the formulae in scientific documents. However, in existing MIR systems, nearly all the ranker models of MIR systems are based on tf-idf model, and few efforts have been made to discover the features besides the relevance between the query formula and related formulae. In this paper, we investigate a supervised ranking approach (RankBoost) in an MIR system, and we consider not only the relevance between a query formula and related formulae, but also the features of the query formula itself and plentiful features about the documents where the related formulae appear. Experimental results show that our system achieves better performance by comparing with state-of-the-art MIR systems.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129776230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Interplanetary Wayback: The permanent web archive 星际之路:永久的网络档案
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925467
Sawood Alam, Mat Kelly, Michael L. Nelson
To facilitate permanence and collaboration in web archives, we built Interplanetary Wayback to disseminate the contents of WARC files into the IPFS network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. We split the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, build a CDXJ index, and combine them at the time of replay. From a 1.0 GB sample Archive-It collection of WARCs containing 21,994 mementos, we found that on an average, 570 files can be indexed and disseminated into IPFS per minute. We also found that in our naive prototype implementation, replay took on an average 370 milliseconds per request.
为了促进网络档案的持久性和协作性,我们建立了星际回溯来将WARC文件的内容传播到IPFS网络中。IPFS是一种点对点内容可寻址的文件系统,它本质上允许重复数据删除并促进可选复制。在将WARC响应记录传播到IPFS之前,我们将其拆分报头和有效负载,以利用重复数据删除,构建CDXJ索引,并在重播时将它们组合在一起。从包含21,994个纪念品的1.0 GB的warc样本Archive-It集合中,我们发现平均每分钟可以索引570个文件并将其传播到IPFS中。我们还发现,在我们的原始原型实现中,每个请求平均需要370毫秒的重放时间。
{"title":"Interplanetary Wayback: The permanent web archive","authors":"Sawood Alam, Mat Kelly, Michael L. Nelson","doi":"10.1145/2910896.2925467","DOIUrl":"https://doi.org/10.1145/2910896.2925467","url":null,"abstract":"To facilitate permanence and collaboration in web archives, we built Interplanetary Wayback to disseminate the contents of WARC files into the IPFS network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. We split the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, build a CDXJ index, and combine them at the time of replay. From a 1.0 GB sample Archive-It collection of WARCs containing 21,994 mementos, we found that on an average, 570 files can be indexed and disseminated into IPFS per minute. We also found that in our naive prototype implementation, replay took on an average 370 milliseconds per request.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128842712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Routing memento requests using binary classifiers 使用二进制分类器路由纪念品请求
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910899
Nicolas J. Bornand, Lyudmila Balakireva, H. Sompel
The Memento protocol provides a uniform approach to query individual web archives. Soon after its emergence, Memento Aggregator infrastructure was introduced that supports querying across multiple archives simultaneously. An Aggregator generates a response by issuing the respective Memento request against each of the distributed archives it covers. As the number of archives grows, it becomes increasingly challenging to deliver aggregate responses while keeping response times and computational costs under control. Ad-hoc heuristic approaches have been introduced to address this challenge and research has been conducted aimed at optimizing query routing based on archive profiles. In this paper, we explore the use of binary, archive-specific classifiers generated on the basis of the content cached by an Aggregator, to determine whether or not to query an archive for a given URI. Our results turn out to be readily applicable and can help to significantly decrease both the number of requests and the overall response times without compromising on recall. We find, among others, that classifiers can reduce the average number of requests by 77% compared to a brute force approach on all archives, and the overall response time by 42% while maintaining a recall of 0.847.
Memento协议提供了一种统一的方法来查询单个web档案。在它出现后不久,Memento Aggregator基础设施就被引入,它支持同时跨多个档案进行查询。Aggregator通过针对它所覆盖的每个分布式归档发出各自的Memento请求来生成响应。随着归档数量的增长,在控制响应时间和计算成本的同时交付聚合响应变得越来越具有挑战性。已经引入了特别启发式方法来解决这一挑战,并且已经进行了旨在基于归档配置文件优化查询路由的研究。在本文中,我们探讨了基于Aggregator缓存的内容生成的二进制、特定于归档的分类器的使用,以确定是否为给定的URI查询归档。我们的结果很容易适用,可以帮助显著减少请求的数量和总体响应时间,而不会影响召回。我们发现,与暴力破解方法相比,分类器可以将所有档案的平均请求数量减少77%,总体响应时间减少42%,同时保持0.847的召回率。
{"title":"Routing memento requests using binary classifiers","authors":"Nicolas J. Bornand, Lyudmila Balakireva, H. Sompel","doi":"10.1145/2910896.2910899","DOIUrl":"https://doi.org/10.1145/2910896.2910899","url":null,"abstract":"The Memento protocol provides a uniform approach to query individual web archives. Soon after its emergence, Memento Aggregator infrastructure was introduced that supports querying across multiple archives simultaneously. An Aggregator generates a response by issuing the respective Memento request against each of the distributed archives it covers. As the number of archives grows, it becomes increasingly challenging to deliver aggregate responses while keeping response times and computational costs under control. Ad-hoc heuristic approaches have been introduced to address this challenge and research has been conducted aimed at optimizing query routing based on archive profiles. In this paper, we explore the use of binary, archive-specific classifiers generated on the basis of the content cached by an Aggregator, to determine whether or not to query an archive for a given URI. Our results turn out to be readily applicable and can help to significantly decrease both the number of requests and the overall response times without compromising on recall. We find, among others, that classifiers can reduce the average number of requests by 77% compared to a brute force approach on all archives, and the overall response time by 42% while maintaining a recall of 0.847.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128529426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Early prediction of scholar popularity 学者人气的早期预测
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910905
Masoumeh Nezhadbiglari, Marcos André Gonçalves, J. Almeida
Prediction of scholar popularity has become an important research topic for a number of reasons. In this paper, we tackle the problem of predicting the popularity trend of scholars by concentrating on making predictions both as earlier and accurate as possible. In order to perform the prediction task, we first extract the popularity trends of scholars from a training set. To that end, we apply a time series clustering algorithm called K-Spectral Clustering (K-SC) to identify the popularity trends as cluster centroids. We then predict trends for scholars in a test set by solving a classification problem. Specifically, we first compute a set of measures for individual scholars based on the distance between earlier points in her particular popularity curve and the identified centroids. We then combine those distance measures with a set of academic features (e.g., number of publications, number of venues, etc) collected during the same monitoring period, and use them as input to a classification method. One aspect that distinguishes our method from other approaches is that the monitoring period, during which we gather information on each scholar popularity and academic features, is determined on a per scholar basis, as part of our approach. Using total citation count as measure of scientific popularity, we evaluate our solution on the popularity time series of more than 500,000 Computer Science scholars, gathered from Microsoft Azure Mar-ketplace1. The experimental results show that the our prediction method outperforms other alternative prediction methods. We also show how to apply our method jointly with regression models to improve the prediction of scholar popularity values (e.g., number of citations) at a given future time.
由于种种原因,学者人气预测已成为一个重要的研究课题。在本文中,我们通过尽可能早和准确地预测学者的流行趋势来解决预测问题。为了完成预测任务,我们首先从训练集中提取学者的流行趋势。为此,我们应用一种称为k -谱聚类(K-SC)的时间序列聚类算法来识别流行趋势作为聚类质心。然后,我们通过解决分类问题来预测测试集中学者的趋势。具体地说,我们首先计算了一组针对个别学者的测量,这些测量是基于她的特定人气曲线上较早的点与已识别的质心之间的距离。然后,我们将这些距离测量与同一监测期间收集的一组学术特征(例如,出版物数量,场地数量等)结合起来,并将它们作为分类方法的输入。我们的方法区别于其他方法的一个方面是,作为我们方法的一部分,监测期是根据每个学者确定的,在此期间,我们收集每位学者的受欢迎程度和学术特征的信息。使用总引用计数作为科学普及的衡量标准,我们根据来自Microsoft Azure market -ketplace1的50多万计算机科学学者的普及时间序列来评估我们的解决方案。实验结果表明,该预测方法优于其他预测方法。我们还展示了如何将我们的方法与回归模型联合应用,以提高对未来给定时间的学者人气值(例如,引用次数)的预测。
{"title":"Early prediction of scholar popularity","authors":"Masoumeh Nezhadbiglari, Marcos André Gonçalves, J. Almeida","doi":"10.1145/2910896.2910905","DOIUrl":"https://doi.org/10.1145/2910896.2910905","url":null,"abstract":"Prediction of scholar popularity has become an important research topic for a number of reasons. In this paper, we tackle the problem of predicting the popularity trend of scholars by concentrating on making predictions both as earlier and accurate as possible. In order to perform the prediction task, we first extract the popularity trends of scholars from a training set. To that end, we apply a time series clustering algorithm called K-Spectral Clustering (K-SC) to identify the popularity trends as cluster centroids. We then predict trends for scholars in a test set by solving a classification problem. Specifically, we first compute a set of measures for individual scholars based on the distance between earlier points in her particular popularity curve and the identified centroids. We then combine those distance measures with a set of academic features (e.g., number of publications, number of venues, etc) collected during the same monitoring period, and use them as input to a classification method. One aspect that distinguishes our method from other approaches is that the monitoring period, during which we gather information on each scholar popularity and academic features, is determined on a per scholar basis, as part of our approach. Using total citation count as measure of scientific popularity, we evaluate our solution on the popularity time series of more than 500,000 Computer Science scholars, gathered from Microsoft Azure Mar-ketplace1. The experimental results show that the our prediction method outperforms other alternative prediction methods. We also show how to apply our method jointly with regression models to improve the prediction of scholar popularity values (e.g., number of citations) at a given future time.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122056486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Can academic conferences promote research collaboration? 学术会议能促进研究合作吗?
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925446
Xiaoyan Su, Wei Wang, Shuo Yu, Chenxin Zhang, T. M. Bekele, Feng Xia
This work proposes to investigate the question of whether attending conference will breed new scientific collaboration based on the focal closure theory. Through the analysis of conference closure on individual and community level, we show that attending conference can promote new scientific collaborations, and conferences with more attendees and higher field ratings bring more new scientific collaborations.
本研究拟以焦点闭合理论为基础,探讨参加会议是否会孕育新的科学合作。通过个体和群体层面的会议闭幕分析,我们发现参加会议能够促进新的科学合作,参会人数越多、领域评分越高的会议带来更多的新的科学合作。
{"title":"Can academic conferences promote research collaboration?","authors":"Xiaoyan Su, Wei Wang, Shuo Yu, Chenxin Zhang, T. M. Bekele, Feng Xia","doi":"10.1145/2910896.2925446","DOIUrl":"https://doi.org/10.1145/2910896.2925446","url":null,"abstract":"This work proposes to investigate the question of whether attending conference will breed new scientific collaboration based on the focal closure theory. Through the analysis of conference closure on individual and community level, we show that attending conference can promote new scientific collaborations, and conferences with more attendees and higher field ratings bring more new scientific collaborations.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117183657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Querylog-based assessment of retrievability bias in a large newspaper corpus 基于查询日志的大型报纸语料库可检索性偏差评估
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2910907
Myriam C. Traub, Thaer Samar, J. V. Ossenbruggen, Jiyin He, A. D. Vries, L. Hardman
Bias in the retrieval of documents can directly influence the information access of a digital library. In the worst case, systematic favoritism for a certain type of document can render other parts of the collection invisible to users. This potential bias can be evaluated by measuring the retrievability for all documents in a collection. Previous evaluations have been performed on TREC collections using simulated query sets. The question remains, however, how representative this approach is of more realistic settings. To address this question, we investigate the effectiveness of the retrievability measure using a large digitized newspaper corpus, featuring two characteristics that distinguishes our experiments from previous studies: (1) compared to TREC collections, our collection contains noise originating from OCR processing, historical spelling and use of language; and (2) instead of simulated queries, the collection comes with real user query logs including click data. First, we assess the retrievability bias imposed on the newspaper collection by different IR models. We assess the retrievability measure and confirm its ability to capture the retrievability bias in our setup. Second, we show how simulated queries differ from real user queries regarding term frequency and prevalence of named entities, and how this affects the retrievability results.
文献检索中的偏差直接影响到数字图书馆的信息获取。在最坏的情况下,系统地偏爱某种类型的文档可能会使集合的其他部分对用户不可见。这种潜在的偏差可以通过测量集合中所有文档的可检索性来评估。前面的计算是使用模拟查询集对TREC集合执行的。然而,问题仍然是,这种方法在更现实的情况下有多大代表性。为了解决这个问题,我们使用一个大型数字化报纸语料库来研究可检索性措施的有效性,该实验具有与以往研究不同的两个特征:(1)与TREC集合相比,我们的集合包含来自OCR处理、历史拼写和语言使用的噪声;(2)该集合不是模拟查询,而是包含点击数据的真实用户查询日志。首先,我们评估了不同IR模型对报纸馆藏的可检索性偏差。在我们的设置中,我们评估了可检索性测量并确认其捕获可检索性偏差的能力。其次,我们将展示在术语频率和命名实体的流行度方面,模拟查询与真实用户查询有何不同,以及这如何影响可检索性结果。
{"title":"Querylog-based assessment of retrievability bias in a large newspaper corpus","authors":"Myriam C. Traub, Thaer Samar, J. V. Ossenbruggen, Jiyin He, A. D. Vries, L. Hardman","doi":"10.1145/2910896.2910907","DOIUrl":"https://doi.org/10.1145/2910896.2910907","url":null,"abstract":"Bias in the retrieval of documents can directly influence the information access of a digital library. In the worst case, systematic favoritism for a certain type of document can render other parts of the collection invisible to users. This potential bias can be evaluated by measuring the retrievability for all documents in a collection. Previous evaluations have been performed on TREC collections using simulated query sets. The question remains, however, how representative this approach is of more realistic settings. To address this question, we investigate the effectiveness of the retrievability measure using a large digitized newspaper corpus, featuring two characteristics that distinguishes our experiments from previous studies: (1) compared to TREC collections, our collection contains noise originating from OCR processing, historical spelling and use of language; and (2) instead of simulated queries, the collection comes with real user query logs including click data. First, we assess the retrievability bias imposed on the newspaper collection by different IR models. We assess the retrievability measure and confirm its ability to capture the retrievability bias in our setup. Second, we show how simulated queries differ from real user queries regarding term frequency and prevalence of named entities, and how this affects the retrievability results.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115898046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Making literature review and manuscript writing tasks easier for novice researchers through Rec4LRW system 通过Rec4LRW系统,使研究新手更容易完成文献综述和稿件撰写任务
Pub Date : 2016-06-19 DOI: 10.1145/2910896.2925445
Aravind Sesagiri Raamkumar, S. Foo, N. Pang
We demonstrate the recently built Rec4LRW system, meant for assisting researchers in three literature review and manuscript writing tasks. The system has been designed to be useful for all researchers, albeit the evaluation results show that it is more beneficial for research students and beginners. In this demonstration, we provide a walkthrough of the system by executing the tasks with sample research topics. The unique User-Interface (UI) and the task interconnectivity features are some of the highlighted aspects.
我们展示了最近建立的Rec4LRW系统,旨在协助研究人员完成三个文献综述和手稿写作任务。该系统被设计为对所有研究人员都有用,尽管评估结果表明它对研究学生和初学者更有益。在本演示中,我们通过执行具有示例研究主题的任务来提供系统的演练。独特的用户界面(UI)和任务互连特性是一些突出的方面。
{"title":"Making literature review and manuscript writing tasks easier for novice researchers through Rec4LRW system","authors":"Aravind Sesagiri Raamkumar, S. Foo, N. Pang","doi":"10.1145/2910896.2925445","DOIUrl":"https://doi.org/10.1145/2910896.2925445","url":null,"abstract":"We demonstrate the recently built Rec4LRW system, meant for assisting researchers in three literature review and manuscript writing tasks. The system has been designed to be useful for all researchers, albeit the evaluation results show that it is more beneficial for research students and beginners. In this demonstration, we provide a walkthrough of the system by executing the tasks with sample research topics. The unique User-Interface (UI) and the task interconnectivity features are some of the highlighted aspects.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131731702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1