Scholarly events and venues are increasing rapidly in number. This poses a challenge for researchers who seek to identify events and venues related to their work in order to draw more efficiently and comprehensively from published research and to share their own findings more effectively. Such efforts are hampered also by the fact that no rating system yet exists to assist researchers in culling the venues most relevant to their current readings and interests. This study describes a methodology we developed in response to this need, one that recommends scholarly venues related to researchers' specific interests according to personalized social web indicators. Our experiments applying our proposed rating and recommendation method show that it outperforms the baseline venue recommendations in terms of accuracy and ranking quality.
{"title":"How to identify specialized research communities related to a researcher's changing interests","authors":"Hamed Alhoori","doi":"10.1145/2910896.2925450","DOIUrl":"https://doi.org/10.1145/2910896.2925450","url":null,"abstract":"Scholarly events and venues are increasing rapidly in number. This poses a challenge for researchers who seek to identify events and venues related to their work in order to draw more efficiently and comprehensively from published research and to share their own findings more effectively. Such efforts are hampered also by the fact that no rating system yet exists to assist researchers in culling the venues most relevant to their current readings and interests. This study describes a methodology we developed in response to this need, one that recommends scholarly venues related to researchers' specific interests according to personalized social web indicators. Our experiments applying our proposed rating and recommendation method show that it outperforms the baseline venue recommendations in terms of accuracy and ranking quality.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115535280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this demo, we present Coagmento 2.0, a Web-based, open-source platform that provides support for one working in individual or group projects spanning multiple sessions that involve looking for, collecting, and synthesizing information. The system also provides a highly customizable platform for researchers who want to investigate individual and group information seeking behaviors in a lab or a field setting. The demo not only shows back-end components and front-end interaction elements of the system, but also how one could easily configure Coagmento for user studies involving information seeking/retrieval with digital libraries (including the Web).
{"title":"Coagmento 2.0: A system for capturing individual and group information seeking behavior","authors":"M. Mitsui, C. Shah","doi":"10.1145/2910896.2925447","DOIUrl":"https://doi.org/10.1145/2910896.2925447","url":null,"abstract":"In this demo, we present Coagmento 2.0, a Web-based, open-source platform that provides support for one working in individual or group projects spanning multiple sessions that involve looking for, collecting, and synthesizing information. The system also provides a highly customizable platform for researchers who want to investigate individual and group information seeking behaviors in a lab or a field setting. The demo not only shows back-end components and front-end interaction elements of the system, but also how one could easily configure Coagmento for user studies involving information seeking/retrieval with digital libraries (including the Web).","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126032959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Video content typically consumes more storage space and bandwidth than other document types although users structure their content with the same organisational tools they use for smaller and simpler items. We analyze the `native' video management behavior as expressed in 35 self-interviews and diary studies produced by New Zealand students, to create a `rich picture' of personal video collections. We see that personal collections can have diffuse boundaries and many different intended uses - and that these information management needs are difficult to fulfill with their homegrown video collection management strategies.
{"title":"Personal video collection management behavior","authors":"S. Cunningham, D. Nichols, Judy Bowen","doi":"10.1145/2910896.2925440","DOIUrl":"https://doi.org/10.1145/2910896.2925440","url":null,"abstract":"Video content typically consumes more storage space and bandwidth than other document types although users structure their content with the same organisational tools they use for smaller and simpler items. We analyze the `native' video management behavior as expressed in 35 self-interviews and diary studies produced by New Zealand students, to create a `rich picture' of personal video collections. We see that personal collections can have diffuse boundaries and many different intended uses - and that these information management needs are difficult to fulfill with their homegrown video collection management strategies.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122071302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ke Yuan, Liangcai Gao, Yuehan Wang, Xiaohan Yi, Zhi Tang
Mathematical Information Retrieval (MIR) systems are designed to help users to find related formulae and further understand the formulae in scientific documents. However, in existing MIR systems, nearly all the ranker models of MIR systems are based on tf-idf model, and few efforts have been made to discover the features besides the relevance between the query formula and related formulae. In this paper, we investigate a supervised ranking approach (RankBoost) in an MIR system, and we consider not only the relevance between a query formula and related formulae, but also the features of the query formula itself and plentiful features about the documents where the related formulae appear. Experimental results show that our system achieves better performance by comparing with state-of-the-art MIR systems.
{"title":"A mathematical information retrieval system based on RankBoost","authors":"Ke Yuan, Liangcai Gao, Yuehan Wang, Xiaohan Yi, Zhi Tang","doi":"10.1145/2910896.2925460","DOIUrl":"https://doi.org/10.1145/2910896.2925460","url":null,"abstract":"Mathematical Information Retrieval (MIR) systems are designed to help users to find related formulae and further understand the formulae in scientific documents. However, in existing MIR systems, nearly all the ranker models of MIR systems are based on tf-idf model, and few efforts have been made to discover the features besides the relevance between the query formula and related formulae. In this paper, we investigate a supervised ranking approach (RankBoost) in an MIR system, and we consider not only the relevance between a query formula and related formulae, but also the features of the query formula itself and plentiful features about the documents where the related formulae appear. Experimental results show that our system achieves better performance by comparing with state-of-the-art MIR systems.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129776230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To facilitate permanence and collaboration in web archives, we built Interplanetary Wayback to disseminate the contents of WARC files into the IPFS network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. We split the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, build a CDXJ index, and combine them at the time of replay. From a 1.0 GB sample Archive-It collection of WARCs containing 21,994 mementos, we found that on an average, 570 files can be indexed and disseminated into IPFS per minute. We also found that in our naive prototype implementation, replay took on an average 370 milliseconds per request.
{"title":"Interplanetary Wayback: The permanent web archive","authors":"Sawood Alam, Mat Kelly, Michael L. Nelson","doi":"10.1145/2910896.2925467","DOIUrl":"https://doi.org/10.1145/2910896.2925467","url":null,"abstract":"To facilitate permanence and collaboration in web archives, we built Interplanetary Wayback to disseminate the contents of WARC files into the IPFS network. IPFS is a peer-to-peer content-addressable file system that inherently allows deduplication and facilitates opt-in replication. We split the header and payload of WARC response records before disseminating into IPFS to leverage the deduplication, build a CDXJ index, and combine them at the time of replay. From a 1.0 GB sample Archive-It collection of WARCs containing 21,994 mementos, we found that on an average, 570 files can be indexed and disseminated into IPFS per minute. We also found that in our naive prototype implementation, replay took on an average 370 milliseconds per request.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128842712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Nicolas J. Bornand, Lyudmila Balakireva, H. Sompel
The Memento protocol provides a uniform approach to query individual web archives. Soon after its emergence, Memento Aggregator infrastructure was introduced that supports querying across multiple archives simultaneously. An Aggregator generates a response by issuing the respective Memento request against each of the distributed archives it covers. As the number of archives grows, it becomes increasingly challenging to deliver aggregate responses while keeping response times and computational costs under control. Ad-hoc heuristic approaches have been introduced to address this challenge and research has been conducted aimed at optimizing query routing based on archive profiles. In this paper, we explore the use of binary, archive-specific classifiers generated on the basis of the content cached by an Aggregator, to determine whether or not to query an archive for a given URI. Our results turn out to be readily applicable and can help to significantly decrease both the number of requests and the overall response times without compromising on recall. We find, among others, that classifiers can reduce the average number of requests by 77% compared to a brute force approach on all archives, and the overall response time by 42% while maintaining a recall of 0.847.
{"title":"Routing memento requests using binary classifiers","authors":"Nicolas J. Bornand, Lyudmila Balakireva, H. Sompel","doi":"10.1145/2910896.2910899","DOIUrl":"https://doi.org/10.1145/2910896.2910899","url":null,"abstract":"The Memento protocol provides a uniform approach to query individual web archives. Soon after its emergence, Memento Aggregator infrastructure was introduced that supports querying across multiple archives simultaneously. An Aggregator generates a response by issuing the respective Memento request against each of the distributed archives it covers. As the number of archives grows, it becomes increasingly challenging to deliver aggregate responses while keeping response times and computational costs under control. Ad-hoc heuristic approaches have been introduced to address this challenge and research has been conducted aimed at optimizing query routing based on archive profiles. In this paper, we explore the use of binary, archive-specific classifiers generated on the basis of the content cached by an Aggregator, to determine whether or not to query an archive for a given URI. Our results turn out to be readily applicable and can help to significantly decrease both the number of requests and the overall response times without compromising on recall. We find, among others, that classifiers can reduce the average number of requests by 77% compared to a brute force approach on all archives, and the overall response time by 42% while maintaining a recall of 0.847.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128529426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Masoumeh Nezhadbiglari, Marcos André Gonçalves, J. Almeida
Prediction of scholar popularity has become an important research topic for a number of reasons. In this paper, we tackle the problem of predicting the popularity trend of scholars by concentrating on making predictions both as earlier and accurate as possible. In order to perform the prediction task, we first extract the popularity trends of scholars from a training set. To that end, we apply a time series clustering algorithm called K-Spectral Clustering (K-SC) to identify the popularity trends as cluster centroids. We then predict trends for scholars in a test set by solving a classification problem. Specifically, we first compute a set of measures for individual scholars based on the distance between earlier points in her particular popularity curve and the identified centroids. We then combine those distance measures with a set of academic features (e.g., number of publications, number of venues, etc) collected during the same monitoring period, and use them as input to a classification method. One aspect that distinguishes our method from other approaches is that the monitoring period, during which we gather information on each scholar popularity and academic features, is determined on a per scholar basis, as part of our approach. Using total citation count as measure of scientific popularity, we evaluate our solution on the popularity time series of more than 500,000 Computer Science scholars, gathered from Microsoft Azure Mar-ketplace1. The experimental results show that the our prediction method outperforms other alternative prediction methods. We also show how to apply our method jointly with regression models to improve the prediction of scholar popularity values (e.g., number of citations) at a given future time.
{"title":"Early prediction of scholar popularity","authors":"Masoumeh Nezhadbiglari, Marcos André Gonçalves, J. Almeida","doi":"10.1145/2910896.2910905","DOIUrl":"https://doi.org/10.1145/2910896.2910905","url":null,"abstract":"Prediction of scholar popularity has become an important research topic for a number of reasons. In this paper, we tackle the problem of predicting the popularity trend of scholars by concentrating on making predictions both as earlier and accurate as possible. In order to perform the prediction task, we first extract the popularity trends of scholars from a training set. To that end, we apply a time series clustering algorithm called K-Spectral Clustering (K-SC) to identify the popularity trends as cluster centroids. We then predict trends for scholars in a test set by solving a classification problem. Specifically, we first compute a set of measures for individual scholars based on the distance between earlier points in her particular popularity curve and the identified centroids. We then combine those distance measures with a set of academic features (e.g., number of publications, number of venues, etc) collected during the same monitoring period, and use them as input to a classification method. One aspect that distinguishes our method from other approaches is that the monitoring period, during which we gather information on each scholar popularity and academic features, is determined on a per scholar basis, as part of our approach. Using total citation count as measure of scientific popularity, we evaluate our solution on the popularity time series of more than 500,000 Computer Science scholars, gathered from Microsoft Azure Mar-ketplace1. The experimental results show that the our prediction method outperforms other alternative prediction methods. We also show how to apply our method jointly with regression models to improve the prediction of scholar popularity values (e.g., number of citations) at a given future time.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122056486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyan Su, Wei Wang, Shuo Yu, Chenxin Zhang, T. M. Bekele, Feng Xia
This work proposes to investigate the question of whether attending conference will breed new scientific collaboration based on the focal closure theory. Through the analysis of conference closure on individual and community level, we show that attending conference can promote new scientific collaborations, and conferences with more attendees and higher field ratings bring more new scientific collaborations.
{"title":"Can academic conferences promote research collaboration?","authors":"Xiaoyan Su, Wei Wang, Shuo Yu, Chenxin Zhang, T. M. Bekele, Feng Xia","doi":"10.1145/2910896.2925446","DOIUrl":"https://doi.org/10.1145/2910896.2925446","url":null,"abstract":"This work proposes to investigate the question of whether attending conference will breed new scientific collaboration based on the focal closure theory. Through the analysis of conference closure on individual and community level, we show that attending conference can promote new scientific collaborations, and conferences with more attendees and higher field ratings bring more new scientific collaborations.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117183657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Myriam C. Traub, Thaer Samar, J. V. Ossenbruggen, Jiyin He, A. D. Vries, L. Hardman
Bias in the retrieval of documents can directly influence the information access of a digital library. In the worst case, systematic favoritism for a certain type of document can render other parts of the collection invisible to users. This potential bias can be evaluated by measuring the retrievability for all documents in a collection. Previous evaluations have been performed on TREC collections using simulated query sets. The question remains, however, how representative this approach is of more realistic settings. To address this question, we investigate the effectiveness of the retrievability measure using a large digitized newspaper corpus, featuring two characteristics that distinguishes our experiments from previous studies: (1) compared to TREC collections, our collection contains noise originating from OCR processing, historical spelling and use of language; and (2) instead of simulated queries, the collection comes with real user query logs including click data. First, we assess the retrievability bias imposed on the newspaper collection by different IR models. We assess the retrievability measure and confirm its ability to capture the retrievability bias in our setup. Second, we show how simulated queries differ from real user queries regarding term frequency and prevalence of named entities, and how this affects the retrievability results.
{"title":"Querylog-based assessment of retrievability bias in a large newspaper corpus","authors":"Myriam C. Traub, Thaer Samar, J. V. Ossenbruggen, Jiyin He, A. D. Vries, L. Hardman","doi":"10.1145/2910896.2910907","DOIUrl":"https://doi.org/10.1145/2910896.2910907","url":null,"abstract":"Bias in the retrieval of documents can directly influence the information access of a digital library. In the worst case, systematic favoritism for a certain type of document can render other parts of the collection invisible to users. This potential bias can be evaluated by measuring the retrievability for all documents in a collection. Previous evaluations have been performed on TREC collections using simulated query sets. The question remains, however, how representative this approach is of more realistic settings. To address this question, we investigate the effectiveness of the retrievability measure using a large digitized newspaper corpus, featuring two characteristics that distinguishes our experiments from previous studies: (1) compared to TREC collections, our collection contains noise originating from OCR processing, historical spelling and use of language; and (2) instead of simulated queries, the collection comes with real user query logs including click data. First, we assess the retrievability bias imposed on the newspaper collection by different IR models. We assess the retrievability measure and confirm its ability to capture the retrievability bias in our setup. Second, we show how simulated queries differ from real user queries regarding term frequency and prevalence of named entities, and how this affects the retrievability results.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115898046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We demonstrate the recently built Rec4LRW system, meant for assisting researchers in three literature review and manuscript writing tasks. The system has been designed to be useful for all researchers, albeit the evaluation results show that it is more beneficial for research students and beginners. In this demonstration, we provide a walkthrough of the system by executing the tasks with sample research topics. The unique User-Interface (UI) and the task interconnectivity features are some of the highlighted aspects.
{"title":"Making literature review and manuscript writing tasks easier for novice researchers through Rec4LRW system","authors":"Aravind Sesagiri Raamkumar, S. Foo, N. Pang","doi":"10.1145/2910896.2925445","DOIUrl":"https://doi.org/10.1145/2910896.2925445","url":null,"abstract":"We demonstrate the recently built Rec4LRW system, meant for assisting researchers in three literature review and manuscript writing tasks. The system has been designed to be useful for all researchers, albeit the evaluation results show that it is more beneficial for research students and beginners. In this demonstration, we provide a walkthrough of the system by executing the tasks with sample research topics. The unique User-Interface (UI) and the task interconnectivity features are some of the highlighted aspects.","PeriodicalId":109613,"journal":{"name":"2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131731702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}