Dylan A. Simon, Andrew S. Gordon, L. Steiger, R. Gilmore
Video and audio recordings serve as a primary data source in many fields, especially in the social and behavioral sciences. Recordings present unique opportunities for reuse and reanalysis for novel scientific purposes, but also present challenges related to respecting the privacy of individuals depicted. Databrary is a web-based service for sharing and reusing the video data created by researchers in the developmental and learning sciences. By investigating how researchers organize, analyze, and mine their own recordings, we have implemented a system that empowers researchers to capture, store, and share recordings in a standardized way. This demo will provide a tour through the Databrary service, highlighting how it promotes storage, management, sharing, and reuse of research data, controls access privileges to restricted human subject data, and facilitates browsing and discoverability of datasets.
{"title":"Databrary: Enabling Sharing and Reuse of Research Video","authors":"Dylan A. Simon, Andrew S. Gordon, L. Steiger, R. Gilmore","doi":"10.1145/2756406.2756951","DOIUrl":"https://doi.org/10.1145/2756406.2756951","url":null,"abstract":"Video and audio recordings serve as a primary data source in many fields, especially in the social and behavioral sciences. Recordings present unique opportunities for reuse and reanalysis for novel scientific purposes, but also present challenges related to respecting the privacy of individuals depicted. Databrary is a web-based service for sharing and reusing the video data created by researchers in the developmental and learning sciences. By investigating how researchers organize, analyze, and mine their own recordings, we have implemented a system that empowers researchers to capture, store, and share recordings in a standardized way. This demo will provide a tour through the Databrary service, highlighting how it promotes storage, management, sharing, and reuse of research data, controls access privileges to restricted human subject data, and facilitates browsing and discoverability of datasets.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130099253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Session 5 - User Issues","authors":"P. Vakkari","doi":"10.1145/3260513","DOIUrl":"https://doi.org/10.1145/3260513","url":null,"abstract":"","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121313593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Keynotes","authors":"P. Bogen","doi":"10.1145/3260508","DOIUrl":"https://doi.org/10.1145/3260508","url":null,"abstract":"","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"175 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126938964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we describe the process we used to debug a crowdsourced labeling task with low inter-rater agreement. In the labeling task, the workers' subjective judgment was used to detect high-quality social media content-interesting tweets-with the ultimate aim of building a classifier that would automatically curate Twitter content. We describe the effects of varying the genre and recency of the dataset, of testing the reliability of the workers, and of recruiting workers from different crowdsourcing platforms. We also examined the effect of redesigning the work itself, both to make it easier and to potentially improve inter-rater agreement. As a result of the debugging process, we have developed a framework for diagnosing similar efforts and a technique to evaluate worker reliability. The technique for evaluating worker reliability, Human Intelligence Data-Driven Enquiries (HIDDENs), differs from other such schemes, in that it has the potential to produce useful secondary results and enhance performance on the main task. HIDDEN subtasks pivot around the same data as the main task, but ask workers questions with greater expected inter-rater agreement. Both the framework and the HIDDENs are currently in use in a production environment.
{"title":"Debugging a Crowdsourced Task with Low Inter-Rater Agreement","authors":"Omar Alonso, C. Marshall, Marc Najork","doi":"10.1145/2756406.2757741","DOIUrl":"https://doi.org/10.1145/2756406.2757741","url":null,"abstract":"In this paper, we describe the process we used to debug a crowdsourced labeling task with low inter-rater agreement. In the labeling task, the workers' subjective judgment was used to detect high-quality social media content-interesting tweets-with the ultimate aim of building a classifier that would automatically curate Twitter content. We describe the effects of varying the genre and recency of the dataset, of testing the reliability of the workers, and of recruiting workers from different crowdsourcing platforms. We also examined the effect of redesigning the work itself, both to make it easier and to potentially improve inter-rater agreement. As a result of the debugging process, we have developed a framework for diagnosing similar efforts and a technique to evaluate worker reliability. The technique for evaluating worker reliability, Human Intelligence Data-Driven Enquiries (HIDDENs), differs from other such schemes, in that it has the potential to produce useful secondary results and enhance performance on the main task. HIDDEN subtasks pivot around the same data as the main task, but ask workers questions with greater expected inter-rater agreement. Both the framework and the HIDDENs are currently in use in a production environment.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115348232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the increasing number of multilingual webpages on the Internet, cross-language information retrieval has become an important research issue. Using Activity Theory as a theoretical framework, this study employs semi-structured interviews with key informants who are frequent users of Chinese-English mixed language queries in web searching. The findings present the context of and reasons for using Chinese-English mixed language queries, which can inform the design of cross-language controlled vocabularies and information retrieval systems.
{"title":"Studying Chinese-English Mixed Language Queries from the User Perspectives","authors":"Hengyi Fu, Shuheng Wu","doi":"10.1145/2756406.2756958","DOIUrl":"https://doi.org/10.1145/2756406.2756958","url":null,"abstract":"With the increasing number of multilingual webpages on the Internet, cross-language information retrieval has become an important research issue. Using Activity Theory as a theoretical framework, this study employs semi-structured interviews with key informants who are frequent users of Chinese-English mixed language queries in web searching. The findings present the context of and reasons for using Chinese-English mixed language queries, which can inform the design of cross-language controlled vocabularies and information retrieval systems.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130793384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Additional content for video games such as mods (modifications) or DLC (downloadable content) are increasingly prevalent in the current video game market. For cultural heritage institutions with video game collections, such content introduces various philosophical and practical challenges on multiple aspects including acquisition, description, access/use, and preservation. In this paper, we discuss these challenges and propose a solution that can alleviate the problem of managing a digital library collection including video games with additional content. While our discussion and proposed solution focus on video games, they also have broader implications for cultural heritage institutions that manage other types of digital and multimedia objects with additional content as well as serial publications.
{"title":"The Problem of \"Additional Content\" in Video Games","authors":"Jin Ha Lee, Jacob Jett, Andrew Perti","doi":"10.1145/2756406.2756949","DOIUrl":"https://doi.org/10.1145/2756406.2756949","url":null,"abstract":"Additional content for video games such as mods (modifications) or DLC (downloadable content) are increasingly prevalent in the current video game market. For cultural heritage institutions with video game collections, such content introduces various philosophical and practical challenges on multiple aspects including acquisition, description, access/use, and preservation. In this paper, we discuss these challenges and propose a solution that can alleviate the problem of managing a digital library collection including video games with additional content. While our discussion and proposed solution focus on video games, they also have broader implications for cultural heritage institutions that manage other types of digital and multimedia objects with additional content as well as serial publications.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131052414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unmil Karadkar, Karen M. Wickett, Madhura Parikh, R. Furuta, Joshua Sheehy, Meghanath Reddy Junnutula, Jeremy Tzou
The Computational Collection Description project is developing mechanisms for generating field-specific collection-level descriptors from item values. Using the Digital Public Library of America (DPLA) as a sample data set, we describe a flexible, extensible architecture for processing field-level values, an augmented Collection class to record the generated metadata, and our early results of enhancements for a DPLA collection.
计算集合描述项目正在开发从项值生成特定于字段的集合级描述符的机制。使用美国数字公共图书馆(Digital Public Library of America, DPLA)作为样本数据集,我们描述了一个灵活的、可扩展的体系结构,用于处理字段级值,一个增强的Collection类用于记录生成的元数据,以及我们对DPLA集合的早期增强结果。
{"title":"Computationally Supported Collection-level Descriptions in Large Heterogeneous Metadata Aggregations","authors":"Unmil Karadkar, Karen M. Wickett, Madhura Parikh, R. Furuta, Joshua Sheehy, Meghanath Reddy Junnutula, Jeremy Tzou","doi":"10.1145/2756406.2756970","DOIUrl":"https://doi.org/10.1145/2756406.2756970","url":null,"abstract":"The Computational Collection Description project is developing mechanisms for generating field-specific collection-level descriptors from item values. Using the Digital Public Library of America (DPLA) as a sample data set, we describe a flexible, extensible architecture for processing field-level values, an augmented Collection class to record the generated metadata, and our early results of enhancements for a DPLA collection.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130950157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Digital libraries are called upon to organize, aggregate, and steward born-digital news collections. Rather than continuously building silos of such non-traditional collections, digital libraries are seeking to manage these collections in conjunction with each other in order to provide the most value to scholars. We here present the results of a preliminary study analyzing characteristics of items in two collections of digital news media: television broadcasts and social media coverage. Our findings indicate a number of factors that similar efforts will need to take into consideration when linking digital "news" collections similar to ours.
{"title":"Analyzing News Events in Non-Traditional Digital Library Collections","authors":"Martin Klein, Peter M. Broadwell","doi":"10.1145/2756406.2756948","DOIUrl":"https://doi.org/10.1145/2756406.2756948","url":null,"abstract":"Digital libraries are called upon to organize, aggregate, and steward born-digital news collections. Rather than continuously building silos of such non-traditional collections, digital libraries are seeking to manage these collections in conjunction with each other in order to provide the most value to scholars. We here present the results of a preliminary study analyzing characteristics of items in two collections of digital news media: television broadcasts and social media coverage. Our findings indicate a number of factors that similar efforts will need to take into consideration when linking digital \"news\" collections similar to ours.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122621642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anderson A. Ferreira, Marcos André Gonçalves, Alberto H. F. Laender
Name ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in digital libraries and similar systems. This problem occurs when an author publishes works under distinct names or distinct authors publish works under similar names. The challenges of dealing with author name ambiguity have led to a myriad of name disambiguation methods. In this tutorial, we characterize such methods by means of a proposed taxonomy, present an overview of some of the most representative ones and discuss open challenges.
{"title":"Automatic Methods for Disambiguating Author Names in Bibliographic Data Repositories","authors":"Anderson A. Ferreira, Marcos André Gonçalves, Alberto H. F. Laender","doi":"10.1145/2756406.2756930","DOIUrl":"https://doi.org/10.1145/2756406.2756930","url":null,"abstract":"Name ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in digital libraries and similar systems. This problem occurs when an author publishes works under distinct names or distinct authors publish works under similar names. The challenges of dealing with author name ambiguity have led to a myriad of name disambiguation methods. In this tutorial, we characterize such methods by means of a proposed taxonomy, present an overview of some of the most representative ones and discuss open challenges.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122674645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When users post links to web pages in Twitter there is a time delta between when the post was shared (t tweet ) and when it was read (t click ). Ideally, when this time delta is small there is often no change in the page's state. However upon reading shared content in the past and due to the dynamic nature of the web, the page's state could change and the intention of the author need to be inferred. In this work, we enhance a prior temporal intention model and tackle its shortcomings by incorporating extended linguistic feature analysis, replacing the prior textual similarity measure with semantic similarity one based on latent topic detection trained on Wikipedia English corpus, and finally by enriching and balancing the training dataset. We uncovered three different intention behaviors in respect to time: Stable Intention, Changing Intention from current to past, and Undefined intention. Using these classes and only the information available at posting time from the tweet and the current state of the resource, we correctly predict the temporal intention classification and strength with 77% accuracy.
{"title":"Predicting Temporal Intention in Resource Sharing","authors":"Hany SalahEldeen, Michael L. Nelson","doi":"10.1145/2756406.2756921","DOIUrl":"https://doi.org/10.1145/2756406.2756921","url":null,"abstract":"When users post links to web pages in Twitter there is a time delta between when the post was shared (t tweet ) and when it was read (t click ). Ideally, when this time delta is small there is often no change in the page's state. However upon reading shared content in the past and due to the dynamic nature of the web, the page's state could change and the intention of the author need to be inferred. In this work, we enhance a prior temporal intention model and tackle its shortcomings by incorporating extended linguistic feature analysis, replacing the prior textual similarity measure with semantic similarity one based on latent topic detection trained on Wikipedia English corpus, and finally by enriching and balancing the training dataset. We uncovered three different intention behaviors in respect to time: Stable Intention, Changing Intention from current to past, and Undefined intention. Using these classes and only the information available at posting time from the tweet and the current state of the resource, we correctly predict the temporal intention classification and strength with 77% accuracy.","PeriodicalId":256118,"journal":{"name":"Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129260922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}