This study examines the concept of “description” and its theoretical foundations. The literature about it is surprisingly limited, and its usage is vague, sometimes even conflicting. Description should be considered in relation to other processes, such as representation, data capturing, and categorizing, which raises the question about what it means to describe something. Description is often used for any type of predication but may better be limited to predications based on observations. Research aims to establish criteria for making optimal descriptions; however, the problems involved in describing something have seldom been addressed. Specific ideals are often followed without examine their fruitfulness. This study provides evidence that description cannot be a neutral, objective activity; rather, it is a theory-laden and interest-based activity. In information science, description occurs in processes such as document description, descriptive metadata assignment, and information resource description. In this field, description has equally been used in conflicting ways that mostly do not evince a recognition of the value- and theory-laden nature of descriptions. It is argued that descriptive activities in information science should always be based on consciously explicit considerations of the goals that descriptions are meant to serve.
{"title":"Description: Its meaning, epistemology, and use with emphasis on information science","authors":"Birger Hjørland","doi":"10.1002/asi.24834","DOIUrl":"10.1002/asi.24834","url":null,"abstract":"<p>This study examines the concept of “description” and its theoretical foundations. The literature about it is surprisingly limited, and its usage is vague, sometimes even conflicting. Description should be considered in relation to other processes, such as representation, data capturing, and categorizing, which raises the question about what it means to describe something. Description is often used for any type of predication but may better be limited to predications based on observations. Research aims to establish criteria for making optimal descriptions; however, the problems involved in describing something have seldom been addressed. Specific ideals are often followed without examine their fruitfulness. This study provides evidence that description cannot be a neutral, objective activity; rather, it is a theory-laden and interest-based activity. In information science, description occurs in processes such as document description, descriptive metadata assignment, and information resource description. In this field, description has equally been used in conflicting ways that mostly do not evince a recognition of the value- and theory-laden nature of descriptions. It is argued that descriptive activities in information science should always be based on consciously explicit considerations of the goals that descriptions are meant to serve.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"74 13","pages":"1532-1549"},"PeriodicalIF":3.5,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://asistdl.onlinelibrary.wiley.com/doi/epdf/10.1002/asi.24834","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136062053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Monica Lestari Paramita, Maria Kasinidou, Styliani Kleanthous, Paolo Rosso, Tsvi Kuflik, Frank Hopfgartner
Bias in news search engines has been shown to influence users' perceptions of a news topic and contribute to the polarisation of society. As a result, there is a need for news search engines that increase user awareness of biases in the search results. While technical approaches have been developed to mitigate biases in search, very few studies have investigated user preferences in interface designs for potentially raising their awareness of biases in news search engines. In this study, we utilized a participatory design methodology to develop eight prototypes with different features that could potentially be used to raise user awareness of biases in news search engines. We conducted three user studies, involving 132 participants with Computer Science backgrounds, to evaluate these prototypes. Our findings indicate the importance of news search engines that (a) inform users of possible biases in the results (bias visualization approach) and (b) allow users to access alternative search results (results-reranking approach). Our study provides further insights into the strengths and possible risks of each approach, which are important for future research on designing interfaces for raising user awareness of biases in news search engines.
{"title":"Towards improving user awareness of search engine biases: A participatory design approach","authors":"Monica Lestari Paramita, Maria Kasinidou, Styliani Kleanthous, Paolo Rosso, Tsvi Kuflik, Frank Hopfgartner","doi":"10.1002/asi.24826","DOIUrl":"10.1002/asi.24826","url":null,"abstract":"<p>Bias in news search engines has been shown to influence users' perceptions of a news topic and contribute to the polarisation of society. As a result, there is a need for news search engines that increase user awareness of biases in the search results. While technical approaches have been developed to mitigate biases in search, very few studies have investigated <i>user</i> preferences in interface designs for potentially raising their awareness of biases in news search engines. In this study, we utilized a participatory design methodology to develop eight prototypes with different features that could potentially be used to raise user awareness of biases in news search engines. We conducted three user studies, involving 132 participants with Computer Science backgrounds, to evaluate these prototypes. Our findings indicate the importance of news search engines that (a) inform users of possible biases in the results (<i>bias visualization approach</i>) and (b) allow users to access alternative search results (<i>results-reranking approach</i>). Our study provides further insights into the strengths and possible risks of each approach, which are important for future research on designing interfaces for raising user awareness of biases in news search engines.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"75 5","pages":"581-599"},"PeriodicalIF":3.5,"publicationDate":"2023-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asi.24826","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136263357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent dramatic increases in the ability to generate, collect, and use datasets have inspired numerous academic and policy discussions regarding the emerging field of human data interaction (HDI). Given the challenges in interacting with open government data (OGD) and the existing research gap in this field, our study intends to explore HDI in the OGD domain and investigate ways HDI can further contribute to OGD promotion. Building upon two existing behavioral models, we proposed an initial conceptual model for OGD interaction, then using this model, conducted two studies to empirically examine users' behaviors when interacting with OGD. Ultimately, we refined this model for OGD interaction and invited three experts to validate it to enhance its understandability, comprehensiveness, and reasonableness. This comprehensive model for human OGD interaction will contribute to the theoretical work of the HDI field as well as the practical design of OGD platforms and data literacy education.
{"title":"Promoting data use through understanding user behaviors: A model for human open government data interaction","authors":"Fanghui Xiao, Yu Chi, Daqing He","doi":"10.1002/asi.24831","DOIUrl":"10.1002/asi.24831","url":null,"abstract":"<p>Recent dramatic increases in the ability to generate, collect, and use datasets have inspired numerous academic and policy discussions regarding the emerging field of human data interaction (HDI). Given the challenges in interacting with open government data (OGD) and the existing research gap in this field, our study intends to explore HDI in the OGD domain and investigate ways HDI can further contribute to OGD promotion. Building upon two existing behavioral models, we proposed an initial conceptual model for OGD interaction, then using this model, conducted two studies to empirically examine users' behaviors when interacting with OGD. Ultimately, we refined this model for OGD interaction and invited three experts to validate it to enhance its understandability, comprehensiveness, and reasonableness. This comprehensive model for human OGD interaction will contribute to the theoretical work of the HDI field as well as the practical design of OGD platforms and data literacy education.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"74 13","pages":"1498-1514"},"PeriodicalIF":3.5,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46964793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As scholars increasingly undertake large-scale analysis of visual materials, advanced computational tools show promise for informing that process. One technique in the toolbox is image recognition, made readily accessible via Google Vision AI, Microsoft Azure Computer Vision, and Amazon's Rekognition service. However, concerns about such issues as bias factors and low reliability have led to warnings against research employing it. A systematic study of cross-service label agreement concretized such issues: using eight datasets, spanning professionally produced and user-generated images, the work showed that image-recognition services disagree on the most suitable labels for images. Beyond supporting caveats expressed in prior literature, the report articulates two mitigation strategies, both involving the use of multiple image-recognition services: Highly explorative research could include all the labels, accepting noisier but less restrictive analysis output. Alternatively, scholars may employ word-embedding-based approaches to identify concepts that are similar enough for their purposes, then focus on those labels filtered in.
{"title":"Do you see what I see? Measuring the semantic differences in image-recognition services' outputs","authors":"Anton Berg, Matti Nelimarkka","doi":"10.1002/asi.24827","DOIUrl":"10.1002/asi.24827","url":null,"abstract":"<p>As scholars increasingly undertake large-scale analysis of visual materials, advanced computational tools show promise for informing that process. One technique in the toolbox is image recognition, made readily accessible via Google Vision AI, Microsoft Azure Computer Vision, and Amazon's Rekognition service. However, concerns about such issues as bias factors and low reliability have led to warnings against research employing it. A systematic study of cross-service label agreement concretized such issues: using eight datasets, spanning professionally produced and user-generated images, the work showed that image-recognition services disagree on the most suitable labels for images. Beyond supporting caveats expressed in prior literature, the report articulates two mitigation strategies, both involving the use of multiple image-recognition services: Highly explorative research could include all the labels, accepting noisier but less restrictive analysis output. Alternatively, scholars may employ word-embedding-based approaches to identify concepts that are similar enough for their purposes, then focus on those labels filtered in.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"74 11","pages":"1307-1324"},"PeriodicalIF":3.5,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asi.24827","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45107685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a conceptual framework for the intellectual virtues in the context of online search. Intellectual virtues are dispositions and skills that enable good thinking and wise reasoning, such as intellectual humility and attentiveness. Despite their importance, today the intellectual virtues tend to be underdeveloped across society. In light of the institutional role that online search plays in life today, there is an opportunity (perhaps an obligation) for online search to facilitate the development of intellectual virtue. The framework presented in this paper locates this development in three areas: the Searcher, the System, and Society. Major issues in information ethics and virtue epistemology are discussed for each of these areas, leading to recommendations for education, design, and research. This paper provides specific suggestions in this regard along with an agenda for future research at the intersection of ethics, epistemology, and online search.
{"title":"Virtuous search: A framework for intellectual virtue in online search","authors":"Tim Gorichanaz","doi":"10.1002/asi.24832","DOIUrl":"10.1002/asi.24832","url":null,"abstract":"<p>This paper presents a conceptual framework for the intellectual virtues in the context of online search. Intellectual virtues are dispositions and skills that enable good thinking and wise reasoning, such as intellectual humility and attentiveness. Despite their importance, today the intellectual virtues tend to be underdeveloped across society. In light of the institutional role that online search plays in life today, there is an opportunity (perhaps an obligation) for online search to facilitate the development of intellectual virtue. The framework presented in this paper locates this development in three areas: the Searcher, the System, and Society. Major issues in information ethics and virtue epistemology are discussed for each of these areas, leading to recommendations for education, design, and research. This paper provides specific suggestions in this regard along with an agenda for future research at the intersection of ethics, epistemology, and online search.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"75 5","pages":"538-549"},"PeriodicalIF":3.5,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43085584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jussara Rowland, Sergi López-Asensio, Ataberk Bagci, Ana Delicado, Ana Prades
Commercial search engines play a central role in shaping, defining, and promoting the information people have access to in contemporary societies. This is particularly true when it comes to emergent technologies, for which there is often limited available information in legacy media and other sources, thus having a strong bearing on public perceptions. In this article, we focus on how the Google search engine promotes information on carbon capture and storage (CCS). We explore how Google's ranking parameters and interface shape the information people access when searching for CCS through a qualitative analysis comparing the results in three countries (France, Spain, and Portugal). We focus on the content of the first search engine result pages (SERP) and consider both Google's ranking criteria and the content and format of promoted sources. The study reveals Google's influence in highlighting Wikipedia pages, Q&A-formatted sources, and prioritizing online specialized media and private corporations. Additionally, we observe country-specific variations in terms of actors and types of content, reflecting the level of interest and investment in the topic at the national level. These findings underscore the significant role of search engine mediations in shaping public perceptions and knowledge about emergent climate change technologies.
{"title":"Shaping information and knowledge on climate change technologies: A cross-country qualitative analysis of carbon capture and storage results on Google search","authors":"Jussara Rowland, Sergi López-Asensio, Ataberk Bagci, Ana Delicado, Ana Prades","doi":"10.1002/asi.24828","DOIUrl":"10.1002/asi.24828","url":null,"abstract":"<p>Commercial search engines play a central role in shaping, defining, and promoting the information people have access to in contemporary societies. This is particularly true when it comes to emergent technologies, for which there is often limited available information in legacy media and other sources, thus having a strong bearing on public perceptions. In this article, we focus on how the Google search engine promotes information on carbon capture and storage (CCS). We explore how Google's ranking parameters and interface shape the information people access when searching for CCS through a qualitative analysis comparing the results in three countries (France, Spain, and Portugal). We focus on the content of the first search engine result pages (SERP) and consider both Google's ranking criteria and the content and format of promoted sources. The study reveals Google's influence in highlighting Wikipedia pages, Q&A-formatted sources, and prioritizing online specialized media and private corporations. Additionally, we observe country-specific variations in terms of actors and types of content, reflecting the level of interest and investment in the topic at the national level. These findings underscore the significant role of search engine mediations in shaping public perceptions and knowledge about emergent climate change technologies.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"75 5","pages":"625-639"},"PeriodicalIF":3.5,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44395450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lin Zhang, Zhenyu Gou, Zhichao Fang, Gunnar Sivertsen, Ying Huang
The purpose of this study is to investigate the validity of tweets about scientific publications as an indicator of societal impact by measuring the degree to which the publications are tweeted beyond academia. We introduce methods that allow for using a much larger and broader data set than in previous validation studies. It covers all areas of research and includes almost 40 million tweets by 2.5 million unique tweeters mentioning almost 4 million scientific publications. We find that, although half of the tweeters are external to academia, most of the tweets are from within academia, and most of the external tweets are responses to original tweets within academia. Only half of the tweeted publications are tweeted outside of academia. We conclude that, in general, the tweeting of scientific publications is not a valid indicator of the societal impact of research. However, publications that continue being tweeted after a few days represent recent scientific achievements that catch attention in society. These publications occur more often in the health sciences and in the social sciences and humanities.
{"title":"Who tweets scientific publications? A large-scale study of tweeting audiences in all areas of research","authors":"Lin Zhang, Zhenyu Gou, Zhichao Fang, Gunnar Sivertsen, Ying Huang","doi":"10.1002/asi.24830","DOIUrl":"10.1002/asi.24830","url":null,"abstract":"<p>The purpose of this study is to investigate the validity of tweets about scientific publications as an indicator of societal impact by measuring the degree to which the publications are tweeted beyond academia. We introduce methods that allow for using a much larger and broader data set than in previous validation studies. It covers all areas of research and includes almost 40 million tweets by 2.5 million unique tweeters mentioning almost 4 million scientific publications. We find that, although half of the tweeters are external to academia, most of the tweets are from within academia, and most of the external tweets are responses to original tweets within academia. Only half of the tweeted publications are tweeted outside of academia. We conclude that, in general, the tweeting of scientific publications is not a valid indicator of the societal impact of research. However, publications that continue being tweeted after a few days represent recent scientific achievements that catch attention in society. These publications occur more often in the health sciences and in the social sciences and humanities.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"74 13","pages":"1485-1497"},"PeriodicalIF":3.5,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46969828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study contributes to the recent discussions on indicating interdisciplinarity, that is, going beyond catch-all metrics of interdisciplinarity. We propose a contextual framework to improve the granularity and usability of the existing methodology for interdisciplinary knowledge flow (IKF) in which scientific disciplines import and export knowledge from/to other disciplines. To characterize the knowledge exchange between disciplines, we recognize three aspects of IKF under this framework, namely broadness, intensity, and homogeneity. We show how to utilize them to uncover different forms of interdisciplinarity, especially between disciplines with the largest volume of IKF. We apply this framework in two use cases, one at the level of disciplines and one at the level of journals, to show how it can offer a more holistic and detailed viewpoint on the interdisciplinarity of scientific entities than aggregated and context-unaware indicators. We further compare our proposed framework, an indicating process, with established indicators and discuss how such information tools on interdisciplinarity can assist science policy practices such as performance-based research funding systems and panel-based peer review processes.
{"title":"Towards indicating interdisciplinarity: Characterizing interdisciplinary knowledge flow","authors":"Hongyu Zhou, Raf Guns, Tim C. E. Engels","doi":"10.1002/asi.24829","DOIUrl":"10.1002/asi.24829","url":null,"abstract":"<p>This study contributes to the recent discussions on indicating interdisciplinarity, that is, going beyond catch-all metrics of interdisciplinarity. We propose a contextual framework to improve the granularity and usability of the existing methodology for interdisciplinary knowledge flow (IKF) in which scientific disciplines import and export knowledge from/to other disciplines. To characterize the knowledge exchange between disciplines, we recognize three aspects of IKF under this framework, namely broadness, intensity, and homogeneity. We show how to utilize them to uncover different forms of interdisciplinarity, especially between disciplines with the largest volume of IKF. We apply this framework in two use cases, one at the level of disciplines and one at the level of journals, to show how it can offer a more holistic and detailed viewpoint on the interdisciplinarity of scientific entities than aggregated and context-unaware indicators. We further compare our proposed framework, an indicating process, with established indicators and discuss how such information tools on interdisciplinarity can assist science policy practices such as performance-based research funding systems and panel-based peer review processes.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"74 11","pages":"1325-1340"},"PeriodicalIF":3.5,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41746602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper explores the use of Fisher and Tronto's four phases of ethics of care (caring about, taking care of, caregiving, and care receiving) to three personal information management (PIM) frameworks, with a focus on PIM maintaining. The author argues that ethics of care can provide a theoretical foundation for PIM by using the four phases of caring to develop a perspective of PIM as a caring process using the categories of PIM as self-care and PIM as caring for others. The paper begins by reviewing Fisher and Tronto's ethics of care, cites research in related fields that have applied ethics of care, and then describes how ethics of care could be applied to PIM research. To conclude, the author offers suggestions for how ethics of care can be applied to future PIM research in the following areas: better understand the motivations for PIM; the ways in which PIM can contribute to the social concepts of equality, justice, and trust and how social institutions can facilitate “good” PIM.
{"title":"PIM as a caring: Using ethics of care to explore personal information management as a caring process","authors":"Amber L. Cushing","doi":"10.1002/asi.24824","DOIUrl":"10.1002/asi.24824","url":null,"abstract":"<p>This paper explores the use of Fisher and Tronto's four phases of ethics of care (caring about, taking care of, caregiving, and care receiving) to three personal information management (PIM) frameworks, with a focus on PIM maintaining. The author argues that ethics of care can provide a theoretical foundation for PIM by using the four phases of caring to develop a perspective of PIM as a caring process using the categories of PIM as self-care and PIM as caring for others. The paper begins by reviewing Fisher and Tronto's ethics of care, cites research in related fields that have applied ethics of care, and then describes how ethics of care could be applied to PIM research. To conclude, the author offers suggestions for how ethics of care can be applied to future PIM research in the following areas: better understand the motivations for PIM; the ways in which PIM can contribute to the social concepts of equality, justice, and trust and how social institutions can facilitate “good” PIM.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"74 11","pages":"1282-1292"},"PeriodicalIF":3.5,"publicationDate":"2023-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asi.24824","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42665303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep learning technologies have brought us many models that outperform human beings on a few benchmarks. An interesting question is: can these models well solve real-world problems with similar settings (e.g., identical input/output) to the benchmark datasets? We argue that a model is trained to answer the same information need in a similar context (e.g., the information available), for which the training dataset is created. The trained model may be used to solve real-world problems for a similar information need in a similar context. However, information need is independent of the format of dataset input/output. Although some datasets may share high structural similarities, they may represent different research tasks aiming for answering different information needs. Examples are question–answer pairs for the question answering (QA) task, and image-caption pairs for the image captioning (IC) task. In this paper, we use the QA task and IC task as two case studies and compare their widely used benchmark datasets. From the perspective of information need in the context of information retrieval, we show the differences in the dataset creation processes and the differences in morphosyntactic properties between datasets. The differences in these datasets can be attributed to the different information needs and contexts of the specific research tasks. We encourage all researchers to consider the information need perspective of a research task when selecting the appropriate datasets to train a model. Likewise, while creating a dataset, researchers may also incorporate the information need perspective as a factor to determine the degree to which the dataset accurately reflects the real-world problem or the research task they intend to tackle.
{"title":"Dataset versus reality: Understanding model performance from the perspective of information need","authors":"Mengying Yu, Aixin Sun","doi":"10.1002/asi.24825","DOIUrl":"10.1002/asi.24825","url":null,"abstract":"<p>Deep learning technologies have brought us many models that outperform human beings on a few benchmarks. An interesting question is: <i>can these models well solve real-world problems with similar settings (e.g., identical input/output) to the benchmark datasets?</i> We argue that a model is trained to answer the <i>same information need</i> in a similar context (e.g., the information available), for which the training dataset is created. The trained model may be used to solve real-world problems for a similar information need in a similar context. However, information need is independent of the format of dataset input/output. Although some datasets may share high structural similarities, they may represent different research tasks aiming for answering different information needs. Examples are question–answer pairs for the question answering (QA) task, and image-caption pairs for the image captioning (IC) task. In this paper, we use the QA task and IC task as two case studies and compare their widely used benchmark datasets. From the perspective of <i>information need</i> in the context of information retrieval, we show the differences in the dataset creation processes and the differences in morphosyntactic properties between datasets. The differences in these datasets can be attributed to the different information needs and contexts of the specific research tasks. We encourage all researchers to consider the <i>information need</i> perspective of a research task when selecting the appropriate datasets to train a model. Likewise, while creating a dataset, researchers may also incorporate the information need perspective as a factor to determine the degree to which the dataset accurately reflects the real-world problem or the research task they intend to tackle.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"74 11","pages":"1293-1306"},"PeriodicalIF":3.5,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46862344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}