We compared fiction readers' search actions during various query reformulation intervals. We aimed to understand how readers' search actions differed between successful and unsuccessful QRIs and which search actions predicted the selecting of very interesting novels compared to less interesting ones. We conducted a controlled user study with 80 participants searching for interesting novels. Three types of browsing tasks and two types of catalogs were used. Our results demonstrated that browsing task type was associated to readers' document viewing behavior in terms of observed search result pages, opened book pages and dwell time on book pages. When browsing for topical novels, most effort was required to select somewhat interesting novels. When browsing for good novels, most effort was required to select very interesting ones. Logistic regression analysis yielded that the most significant predictors of higher document value were the number of observed search result pages and opened book pages.
{"title":"Books' Interest Grading and Fiction Readers' Search Actions During Query Reformulation Intervals","authors":"A. Mikkonen, P. Vakkari","doi":"10.1145/2756406.2756922","DOIUrl":"https://doi.org/10.1145/2756406.2756922","url":null,"abstract":"We compared fiction readers' search actions during various query reformulation intervals. We aimed to understand how readers' search actions differed between successful and unsuccessful QRIs and which search actions predicted the selecting of very interesting novels compared to less interesting ones. We conducted a controlled user study with 80 participants searching for interesting novels. Three types of browsing tasks and two types of catalogs were used. Our results demonstrated that browsing task type was associated to readers' document viewing behavior in terms of observed search result pages, opened book pages and dwell time on book pages. When browsing for topical novels, most effort was required to select somewhat interesting novels. When browsing for good novels, most effort was required to select very interesting ones. Logistic regression analysis yielded that the most significant predictors of higher document value were the number of observed search result pages and opened book pages.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134261422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes a Machine Learning (ML) approach for extracting named entities and disambiguating the location of tweets based on those named entities and related content. We conducted experiments with tweets (e.g., about potholes), and found significant improvement in disambiguating tweet locations using a ML algorithm along with the Stanford NER. Adding state information predicted by our classifiers increases the possibility to find the state-level geo-location unambiguously by up to 80%.
{"title":"Read between the lines: A Machine Learning Approach for Disambiguating the Geo-location of Tweets","authors":"Sunshin Lee, M. Farag, Tarek Kanan, E. Fox","doi":"10.1145/2756406.2756971","DOIUrl":"https://doi.org/10.1145/2756406.2756971","url":null,"abstract":"This paper describes a Machine Learning (ML) approach for extracting named entities and disambiguating the location of tweets based on those named entities and related content. We conducted experiments with tweets (e.g., about potholes), and found significant improvement in disambiguating tweet locations using a ML algorithm along with the Stanford NER. Adding state information predicted by our classifiers increases the possibility to find the state-level geo-location unambiguously by up to 80%.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133657681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Session 2 - Information Extraction","authors":"Martin Klein","doi":"10.1145/3260510","DOIUrl":"https://doi.org/10.1145/3260510","url":null,"abstract":"","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124548082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiwu Xie, Yinlin Chen, J. Speer, T. Walters, P. Tarazaga, M. Kasarda
We propose a use and reuse driven big data management approach that fuses the data repository and data processing capabilities in a co-located, public cloud. It answers to the urgent data management needs from the growing number of researchers who don't fit in the big science/small science dichotomy. This approach will allow researchers to more easily use, manage, and collaborate around big data sets, as well as give librarians the opportunity to work alongside the researchers to preserve and curate data while it is still fresh and being actively used. This also provides the technological foundation to foster a sharing culture more aligned with the open source software development paradigm than the lone-wolf, gift-exchanging small science sharing or the top-down, highly structured big science sharing. To materialize this vision, we provide a system architecture consisting of a scalable digital repository system coupled with the co-located cloud storage and cloud computing, as well as a job scheduler and a deployment management system. Motivated by Virginia Tech's Goodwin Hall instrumentation project, we implemented and evaluated a prototype. The results show not only sufficient capacities for this particular case, but also near perfect linear storage and data processing scalabilities under moderately high workload.
{"title":"Towards Use And Reuse Driven Big Data Management","authors":"Zhiwu Xie, Yinlin Chen, J. Speer, T. Walters, P. Tarazaga, M. Kasarda","doi":"10.1145/2756406.2756924","DOIUrl":"https://doi.org/10.1145/2756406.2756924","url":null,"abstract":"We propose a use and reuse driven big data management approach that fuses the data repository and data processing capabilities in a co-located, public cloud. It answers to the urgent data management needs from the growing number of researchers who don't fit in the big science/small science dichotomy. This approach will allow researchers to more easily use, manage, and collaborate around big data sets, as well as give librarians the opportunity to work alongside the researchers to preserve and curate data while it is still fresh and being actively used. This also provides the technological foundation to foster a sharing culture more aligned with the open source software development paradigm than the lone-wolf, gift-exchanging small science sharing or the top-down, highly structured big science sharing. To materialize this vision, we provide a system architecture consisting of a scalable digital repository system coupled with the co-located cloud storage and cloud computing, as well as a job scheduler and a deployment management system. Motivated by Virginia Tech's Goodwin Hall instrumentation project, we implemented and evaluated a prototype. The results show not only sufficient capacities for this particular case, but also near perfect linear storage and data processing scalabilities under moderately high workload.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124779210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinran Chen, Sei-Ching Joanna Sin, Y. Theng, C. S. Lee
Widespread misinformation on social media is a cause of concern. Currently, it is unclear what factors prompt regular social media users with no malicious intent to forward misinformation to their online networks. Using a questionnaire informed by the Uses and Gratifications theory and the literature on rumor research, this study asked university students in Singapore why they shared misinformation on social media. Gender differences were also tested. The study found that perceived information characteristics such as its ability to spark conversations and its catchiness were top factors. Self-expression and socializing motivations were also among the top reasons. Women reported a higher prevalence of misinformation sharing. The implications for the design of social media applications and information literacy training were discussed.
{"title":"Why Do Social Media Users Share Misinformation?","authors":"Xinran Chen, Sei-Ching Joanna Sin, Y. Theng, C. S. Lee","doi":"10.1145/2756406.2756941","DOIUrl":"https://doi.org/10.1145/2756406.2756941","url":null,"abstract":"Widespread misinformation on social media is a cause of concern. Currently, it is unclear what factors prompt regular social media users with no malicious intent to forward misinformation to their online networks. Using a questionnaire informed by the Uses and Gratifications theory and the literature on rumor research, this study asked university students in Singapore why they shared misinformation on social media. Gender differences were also tested. The study found that perceived information characteristics such as its ability to spark conversations and its catchiness were top factors. Self-expression and socializing motivations were also among the top reasons. Women reported a higher prevalence of misinformation sharing. The implications for the design of social media applications and information literacy training were discussed.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"2008 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125625236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Session 1 - People and Their Books","authors":"L. Cassel","doi":"10.1145/3260509","DOIUrl":"https://doi.org/10.1145/3260509","url":null,"abstract":"","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126816632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sawood Alam, Fateh ud din B. Mehmood, Michael L. Nelson
We propose an approach to index raster images of dictionary pages which in turn would require very little manual effort to enable direct access to the appropriate pages of the dictionary for lookup. Accessibility is further improved by feedback and crowdsourcing that enables highlighting of the specific location on the page where the lookup word is found, annotation, digitization, and fielded searching. This approach is equally applicable on simple scripts as well as complex writing systems. Using our proposed approach, we have built a Web application called "Dictionary Explorer" which supports word indexes in various languages and every language can have multiple dictionaries associated with it. Word lookup gives direct access to appropriate pages of all the dictionaries of that language simultaneously. The application has exploration features like searching, pagination, and navigating the word index through a tree-like interface. The application also supports feedback, annotation, and digitization features. Apart from the scanned images, "Dictionary Explorer" aggregates results from various sources and user contributions in Unicode. We have evaluated the time required for indexing dictionaries of different sizes and complexities in the Urdu language and examined various trade-offs in our implementation. Using our approach, a single person can make a dictionary of 1,000 pages searchable in less than an hour.
{"title":"Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages","authors":"Sawood Alam, Fateh ud din B. Mehmood, Michael L. Nelson","doi":"10.1145/2756406.2756926","DOIUrl":"https://doi.org/10.1145/2756406.2756926","url":null,"abstract":"We propose an approach to index raster images of dictionary pages which in turn would require very little manual effort to enable direct access to the appropriate pages of the dictionary for lookup. Accessibility is further improved by feedback and crowdsourcing that enables highlighting of the specific location on the page where the lookup word is found, annotation, digitization, and fielded searching. This approach is equally applicable on simple scripts as well as complex writing systems. Using our proposed approach, we have built a Web application called \"Dictionary Explorer\" which supports word indexes in various languages and every language can have multiple dictionaries associated with it. Word lookup gives direct access to appropriate pages of all the dictionaries of that language simultaneously. The application has exploration features like searching, pagination, and navigating the word index through a tree-like interface. The application also supports feedback, annotation, and digitization features. Apart from the scanned images, \"Dictionary Explorer\" aggregates results from various sources and user contributions in Unicode. We have evaluated the time required for indexing dictionaries of different sizes and complexities in the Urdu language and examined various trade-offs in our implementation. Using our approach, a single person can make a dictionary of 1,000 pages searchable in less than an hour.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120891723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sarah Weissman, S. Ayhan, Joshua Bradley, Jimmy J. Lin
In this paper, we identify sentences in Wikipedia articles that are either identical or highly similar by applying techniques for near-duplicate detection of web pages. This is accomplished with a MapReduce implementation of minhash to identify sentences with high Jaccard similarity, followed by a pass to generate sentence clusters. Based on manual examination, we discovered that these clusters can be categorized into six different types: templates, identical sentences, copyediting, factual drift, references, and other. Two of these categories are particularly interesting: identical sentences quantify the extent to which content in Wikipedia is copied and pasted, and near-duplicate sentences that state contradictory facts point to quality issues in Wikipedia.
{"title":"Identifying Duplicate and Contradictory Information in Wikipedia","authors":"Sarah Weissman, S. Ayhan, Joshua Bradley, Jimmy J. Lin","doi":"10.1145/2756406.2756947","DOIUrl":"https://doi.org/10.1145/2756406.2756947","url":null,"abstract":"In this paper, we identify sentences in Wikipedia articles that are either identical or highly similar by applying techniques for near-duplicate detection of web pages. This is accomplished with a MapReduce implementation of minhash to identify sentences with high Jaccard similarity, followed by a pass to generate sentence clusters. Based on manual examination, we discovered that these clusters can be categorized into six different types: templates, identical sentences, copyediting, factual drift, references, and other. Two of these categories are particularly interesting: identical sentences quantify the extent to which content in Wikipedia is copied and pasted, and near-duplicate sentences that state contradictory facts point to quality issues in Wikipedia.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115897890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Welcome to JCDL 2007! It is our great pleasure to welcome you to the 7th annual meeting of the ACM/IEEE Joint Conference on Digital Libraries (JCDL). JCDL is one of the primary international forums for the presentation and discussion of research, practice and social issues related to digital libraries. The conference theme this year is "Building and Sustaining the Digital Environment" and the program reflects these themes as well as the broader context of digital libraries and the boundary spanning research that supports their design, development and operation. This year we had a record number of submissions for the conference with 279 total submissions from digital library researchers in 31 countries. From 119 Full papers submissions and 68 short paper submissions the program committee selected 43 Full papers and 28 Short papers for presentation at the conference. In addition, 30 posters and 14 demonstrations were selected for presentation at the special poster and demo evening session. As in previous years we will be awarding the Vannevar Bush Best Paper Award (sponsored by ACM). In addition to the main meeting, a full schedule of tutorials and workshops has been arranged bracketing the main meeting. This year we also host the largest Doctoral Consortium yet held at JCDL, where student researchers in digital library topics work with an international panel of faculty mentors on exploring and refining their thesis research.
{"title":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","authors":"E. Rasmussen, R. Larson, Elaine Toms, S. Sugimoto","doi":"10.1145/2756406","DOIUrl":"https://doi.org/10.1145/2756406","url":null,"abstract":"Welcome to JCDL 2007! It is our great pleasure to welcome you to the 7th annual meeting of the ACM/IEEE Joint Conference on Digital Libraries (JCDL). JCDL is one of the primary international forums for the presentation and discussion of research, practice and social issues related to digital libraries. The conference theme this year is \"Building and Sustaining the Digital Environment\" and the program reflects these themes as well as the broader context of digital libraries and the boundary spanning research that supports their design, development and operation. \u0000 \u0000This year we had a record number of submissions for the conference with 279 total submissions from digital library researchers in 31 countries. From 119 Full papers submissions and 68 short paper submissions the program committee selected 43 Full papers and 28 Short papers for presentation at the conference. In addition, 30 posters and 14 demonstrations were selected for presentation at the special poster and demo evening session. As in previous years we will be awarding the Vannevar Bush Best Paper Award (sponsored by ACM). In addition to the main meeting, a full schedule of tutorials and workshops has been arranged bracketing the main meeting. This year we also host the largest Doctoral Consortium yet held at JCDL, where student researchers in digital library topics work with an international panel of faculty mentors on exploring and refining their thesis research.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"6 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131839019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}