Pub Date : 2023-05-05DOI: https://dl.acm.org/doi/10.1145/3588319
Mengtian Guo, Zhilan Zhou, David Gotz, Yue Wang
When people search for information about a new topic within large document collections, they implicitly construct a mental model of the unfamiliar information space to represent what they currently know and guide their exploration into the unknown. Building this mental model can be challenging as it requires not only finding relevant documents but also synthesizing important concepts and the relationships that connect those concepts both within and across documents. This article describes a novel interactive approach designed to help users construct a mental model of an unfamiliar information space during exploratory search. We propose a new semantic search system to organize and visualize important concepts and their relations for a set of search results. A user study (n=20) was conducted to compare the proposed approach against a baseline faceted search system on exploratory literature search tasks. Experimental results show that the proposed approach is more effective in helping users recognize relationships between key concepts, leading to a more sophisticated understanding of the search topic while maintaining similar functionality and usability as a faceted search system.
{"title":"GRAFS: Graphical Faceted Search System to Support Conceptual Understanding in Exploratory Search","authors":"Mengtian Guo, Zhilan Zhou, David Gotz, Yue Wang","doi":"https://dl.acm.org/doi/10.1145/3588319","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3588319","url":null,"abstract":"<p>When people search for information about a new topic within large document collections, they implicitly construct a mental model of the unfamiliar information space to represent what they currently know and guide their exploration into the unknown. Building this mental model can be challenging as it requires not only finding relevant documents but also synthesizing important concepts and the relationships that connect those concepts both within and across documents. This article describes a novel interactive approach designed to help users construct a mental model of an unfamiliar information space during exploratory search. We propose a new semantic search system to organize and visualize important concepts and their relations for a set of search results. A user study (<i>n</i>=20) was conducted to compare the proposed approach against a baseline faceted search system on exploratory literature search tasks. Experimental results show that the proposed approach is more effective in helping users recognize relationships between key concepts, leading to a more sophisticated understanding of the search topic while maintaining similar functionality and usability as a faceted search system.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-17DOI: https://dl.acm.org/doi/10.1145/3588565
John Wenskovitch, Michelle Dowling, Chris North
Direct manipulation interactions on projections are often incorporated in visual analytics applications. These interactions enable analysts to provide incremental feedback to the system in a semi-supervised manner, demonstrating relationships that the analyst wishes to find within the data. However, determining the precise intent of the analyst is a challenge. When an analyst interacts with a projection, the inherent ambiguity of interactions can lead to a variety of possible interpretations that the system can infer. Previous work has demonstrated the utility of clusters as an interaction target to address this “With Respect to What” problem in dimension-reduced projections. However, the introduction of clusters introduces interaction inference challenges as well. In this work, we discuss the interaction space for the simultaneous use of semi-supervised dimension reduction and clustering algorithms. We introduce a novel pipeline representation to disambiguate between interactions on observations and clusters, as well as which underlying model is responding to those analyst interactions. We use a prototype visual analytics tool to demonstrate the effects of these ambiguous interactions, their properties, and the insights that an analyst can glean from each.
{"title":"Towards Addressing Ambiguous Interactions and Inferring User Intent with Dimension Reduction and Clustering Combinations in Visual Analytics","authors":"John Wenskovitch, Michelle Dowling, Chris North","doi":"https://dl.acm.org/doi/10.1145/3588565","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3588565","url":null,"abstract":"<p>Direct manipulation interactions on projections are often incorporated in visual analytics applications. These interactions enable analysts to provide incremental feedback to the system in a semi-supervised manner, demonstrating relationships that the analyst wishes to find within the data. However, determining the precise intent of the analyst is a challenge. When an analyst interacts with a projection, the inherent ambiguity of interactions can lead to a variety of possible interpretations that the system can infer. Previous work has demonstrated the utility of clusters as an interaction target to address this “With Respect to What” problem in dimension-reduced projections. However, the introduction of clusters introduces interaction inference challenges as well. In this work, we discuss the interaction space for the simultaneous use of semi-supervised dimension reduction and clustering algorithms. We introduce a novel pipeline representation to disambiguate between interactions on observations and clusters, as well as which underlying model is responding to those analyst interactions. We use a prototype visual analytics tool to demonstrate the effects of these ambiguous interactions, their properties, and the insights that an analyst can glean from each.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John E. Wenskovitch, Michelle Dowling, Chris North
Direct manipulation interactions on projections are often incorporated in visual analytics applications. These interactions enable analysts to provide incremental feedback to the system in a semi-supervised manner, demonstrating relationships that the analyst wishes to find within the data. However, determining the precise intent of the analyst is a challenge. When an analyst interacts with a projection, the inherent ambiguity of interactions can lead to a variety of possible interpretations that the system can infer. Previous work has demonstrated the utility of clusters as an interaction target to address this “With Respect to What” problem in dimension-reduced projections. However, the introduction of clusters introduces interaction inference challenges as well. In this work, we discuss the interaction space for the simultaneous use of semi-supervised dimension reduction and clustering algorithms. We introduce a novel pipeline representation to disambiguate between interactions on observations and clusters, as well as which underlying model is responding to those analyst interactions. We use a prototype visual analytics tool to demonstrate the effects of these ambiguous interactions, their properties, and the insights that an analyst can glean from each.
{"title":"Towards Addressing Ambiguous Interactions and Inferring User Intent with Dimension Reduction and Clustering Combinations in Visual Analytics","authors":"John E. Wenskovitch, Michelle Dowling, Chris North","doi":"10.1145/3588565","DOIUrl":"https://doi.org/10.1145/3588565","url":null,"abstract":"Direct manipulation interactions on projections are often incorporated in visual analytics applications. These interactions enable analysts to provide incremental feedback to the system in a semi-supervised manner, demonstrating relationships that the analyst wishes to find within the data. However, determining the precise intent of the analyst is a challenge. When an analyst interacts with a projection, the inherent ambiguity of interactions can lead to a variety of possible interpretations that the system can infer. Previous work has demonstrated the utility of clusters as an interaction target to address this “With Respect to What” problem in dimension-reduced projections. However, the introduction of clusters introduces interaction inference challenges as well. In this work, we discuss the interaction space for the simultaneous use of semi-supervised dimension reduction and clustering algorithms. We introduce a novel pipeline representation to disambiguate between interactions on observations and clusters, as well as which underlying model is responding to those analyst interactions. We use a prototype visual analytics tool to demonstrate the effects of these ambiguous interactions, their properties, and the insights that an analyst can glean from each.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89520419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-12DOI: https://dl.acm.org/doi/10.1145/3579541
Diana C. Hernandez-Bocanegra, Jürgen Ziegler
Explaining system-generated recommendations based on user reviews can foster users’ understanding and assessment of the recommended items and the recommender system (RS) as a whole. While up to now explanations have mostly been static, shown in a single presentation unit, some interactive explanatory approaches have emerged in explainable artificial intelligence (XAI), making it easier for users to examine system decisions and to explore arguments according to their information needs. However, little is known about how interactive interfaces should be conceptualized and designed to meet the explanatory aims of transparency, effectiveness, and trust in RS. Thus, we investigate the potential of interactive, conversational explanations in review-based RS and propose an explanation approach inspired by dialog models and formal argument structures. In particular, we investigate users’ perception of two different interface types for presenting explanations, a graphical user interface (GUI)-based dialog consisting of a sequence of explanatory steps, and a chatbot-like natural-language interface.
Since providing explanations by means of natural language conversation is a novel approach, there is a lack of understanding how users would formulate their questions with a corresponding lack of datasets. We thus propose an intent model for explanatory queries and describe the development of ConvEx-DS, a dataset containing intent annotations of 1,806 user questions in the domain of hotels, that can be used to to train intent detection methods as part of the development of conversational agents for explainable RS. We validate the model by measuring user-perceived helpfulness of answers given based on the implemented intent detection. Finally, we report on a user study investigating users’ evaluation of the two types of interactive explanations proposed (GUI and chatbot), and to test the effect of varying degrees of interactivity that result in greater or lesser access to explanatory information. By using Structural Equation Modeling, we reveal details on the relationships between the perceived quality of an explanation and the explanatory objectives of transparency, trust, and effectiveness. Our results show that providing interactive options for scrutinizing explanatory arguments has a significant positive influence on the evaluation by users (compared to low interactive alternatives). Results also suggest that user characteristics such as decision-making style may have a significant influence on the evaluation of different types of interactive explanation interfaces.
{"title":"Explaining Recommendations through Conversations: Dialog Model and the Effects of Interface Type and Degree of Interactivity","authors":"Diana C. Hernandez-Bocanegra, Jürgen Ziegler","doi":"https://dl.acm.org/doi/10.1145/3579541","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3579541","url":null,"abstract":"<p>Explaining system-generated recommendations based on user reviews can foster users’ understanding and assessment of the recommended items and the recommender system (RS) as a whole. While up to now explanations have mostly been static, shown in a single presentation unit, some interactive explanatory approaches have emerged in explainable artificial intelligence (XAI), making it easier for users to examine system decisions and to explore arguments according to their information needs. However, little is known about how interactive interfaces should be conceptualized and designed to meet the explanatory aims of transparency, effectiveness, and trust in RS. Thus, we investigate the potential of interactive, conversational explanations in review-based RS and propose an explanation approach inspired by dialog models and formal argument structures. In particular, we investigate users’ perception of two different interface types for presenting explanations, a graphical user interface (GUI)-based dialog consisting of a sequence of explanatory steps, and a chatbot-like natural-language interface.</p><p>Since providing explanations by means of natural language conversation is a novel approach, there is a lack of understanding how users would formulate their questions with a corresponding lack of datasets. We thus propose an intent model for explanatory queries and describe the development of ConvEx-DS, a dataset containing intent annotations of 1,806 user questions in the domain of hotels, that can be used to to train intent detection methods as part of the development of conversational agents for explainable RS. We validate the model by measuring user-perceived helpfulness of answers given based on the implemented intent detection. Finally, we report on a user study investigating users’ evaluation of the two types of interactive explanations proposed (GUI and chatbot), and to test the effect of varying degrees of interactivity that result in greater or lesser access to explanatory information. By using Structural Equation Modeling, we reveal details on the relationships between the perceived quality of an explanation and the explanatory objectives of transparency, trust, and effectiveness. Our results show that providing interactive options for scrutinizing explanatory arguments has a significant positive influence on the evaluation by users (compared to low interactive alternatives). Results also suggest that user characteristics such as decision-making style may have a significant influence on the evaluation of different types of interactive explanation interfaces.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arthur Sluÿters, S. Lambot, J. Vanderdonckt, Radu-Daniel Vatavu
Microwave radars bring many benefits to mid-air gesture sensing due to their large field of view and independence from environmental conditions, such as ambient light and occlusion. However, radar signals are highly dimensional and usually require complex deep learning approaches. To understand this landscape, we report results from a systematic literature review of (N=118) scientific papers on radar sensing, unveiling a large variety of radar technology of different operating frequencies and bandwidths and antenna configurations but also various gesture recognition techniques. Although highly accurate, these techniques require a large amount of training data that depend on the type of radar. Therefore, the training results cannot be easily transferred to other radars. To address this aspect, we introduce a new gesture recognition pipeline that implements advanced full-wave electromagnetic modeling and inversion to retrieve physical characteristics of gestures that are radar independent, i.e., independent of the source, antennas, and radar-hand interactions. Inversion of radar signals further reduces the size of the dataset by several orders of magnitude, while preserving the essential information. This approach is compatible with conventional gesture recognizers, such as those based on template matching, which only need a few training examples to deliver high recognition accuracy rates. To evaluate our gesture recognition pipeline, we conducted user-dependent and user-independent evaluations on a dataset of 16 gesture types collected with the Walabot, a low-cost off-the-shelf array radar. We contrast these results with those obtained for the same gesture types collected with an ultra-wideband radar made of a vector network analyzer with a single horn antenna and with a computer vision sensor, respectively. Based on our findings, we suggest some design implications to support future development in radar-based gesture recognition.
{"title":"RadarSense: Accurate Recognition of Mid-air Hand Gestures with Radar Sensing and Few Training Examples","authors":"Arthur Sluÿters, S. Lambot, J. Vanderdonckt, Radu-Daniel Vatavu","doi":"10.1145/3589645","DOIUrl":"https://doi.org/10.1145/3589645","url":null,"abstract":"Microwave radars bring many benefits to mid-air gesture sensing due to their large field of view and independence from environmental conditions, such as ambient light and occlusion. However, radar signals are highly dimensional and usually require complex deep learning approaches. To understand this landscape, we report results from a systematic literature review of (N=118) scientific papers on radar sensing, unveiling a large variety of radar technology of different operating frequencies and bandwidths and antenna configurations but also various gesture recognition techniques. Although highly accurate, these techniques require a large amount of training data that depend on the type of radar. Therefore, the training results cannot be easily transferred to other radars. To address this aspect, we introduce a new gesture recognition pipeline that implements advanced full-wave electromagnetic modeling and inversion to retrieve physical characteristics of gestures that are radar independent, i.e., independent of the source, antennas, and radar-hand interactions. Inversion of radar signals further reduces the size of the dataset by several orders of magnitude, while preserving the essential information. This approach is compatible with conventional gesture recognizers, such as those based on template matching, which only need a few training examples to deliver high recognition accuracy rates. To evaluate our gesture recognition pipeline, we conducted user-dependent and user-independent evaluations on a dataset of 16 gesture types collected with the Walabot, a low-cost off-the-shelf array radar. We contrast these results with those obtained for the same gesture types collected with an ultra-wideband radar made of a vector network analyzer with a single horn antenna and with a computer vision sensor, respectively. Based on our findings, we suggest some design implications to support future development in radar-based gesture recognition.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86834309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-31DOI: https://dl.acm.org/doi/10.1145/3589645
Arthur SluŸters, Sébastien Lambot, Jean Vanderdonckt, Radu-Daniel Vatavu
Microwave radars bring many benefits to mid-air gesture sensing due to their large field of view and independence from environmental conditions, such as ambient light and occlusion. However, radar signals are highly dimensional and usually require complex deep learning approaches. To understand this landscape, we report results from a systematic literature review of (N = 118) scientific papers on radar sensing, unveiling a large variety of radar technology of different operating frequencies and bandwidths, antenna configurations, but also various gesture recognition techniques. Although highly accurate, these techniques require a large amount of training data that depend on the type of radar. Therefore, the training results cannot be easily transferred to other radars. To address this aspect, we introduce a new gesture recognition pipeline that implements advanced full-wave electromagnetic modeling and inversion to retrieve physical characteristics of gestures that are radar independent, i.e., independent of the source, antennas, and radar-hand interactions. Inversion of radar signals further reduces the size of the dataset by several orders of magnitude, while preserving the essential information. This approach is compatible with conventional gesture recognizers, such as those based on template matching, which only need a few training examples to deliver high recognition accuracy rates. To evaluate our gesture recognition pipeline, we conducted user-dependent and user-independent evaluations on a dataset of 16 gesture types collected with the Walabot, a low-cost off-the-shelf array radar. We contrast these results with those obtained for the same gesture types collected with an ultra-wideband radar made of a vector network analyzer with a single horn antenna and with a computer vision sensor, respectively. Based on our findings, we suggest some design implications to support future development in radar-based gesture recognition.
{"title":"RadarSense: Accurate Recognition of Mid-Air Hand Gestures with Radar Sensing and Few Training Examples","authors":"Arthur SluŸters, Sébastien Lambot, Jean Vanderdonckt, Radu-Daniel Vatavu","doi":"https://dl.acm.org/doi/10.1145/3589645","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3589645","url":null,"abstract":"<p>Microwave radars bring many benefits to mid-air gesture sensing due to their large field of view and independence from environmental conditions, such as ambient light and occlusion. However, radar signals are highly dimensional and usually require complex deep learning approaches. To understand this landscape, we report results from a systematic literature review of (<i>N</i> = 118) scientific papers on radar sensing, unveiling a large variety of radar technology of different operating frequencies and bandwidths, antenna configurations, but also various gesture recognition techniques. Although highly accurate, these techniques require a large amount of training data that depend on the type of radar. Therefore, the training results cannot be easily transferred to other radars. To address this aspect, we introduce a new gesture recognition pipeline that implements advanced full-wave electromagnetic modeling and inversion to retrieve physical characteristics of gestures that are radar independent, <i>i.e.</i>, independent of the source, antennas, and radar-hand interactions. Inversion of radar signals further reduces the size of the dataset by several orders of magnitude, while preserving the essential information. This approach is compatible with conventional gesture recognizers, such as those based on template matching, which only need a few training examples to deliver high recognition accuracy rates. To evaluate our gesture recognition pipeline, we conducted user-dependent and user-independent evaluations on a dataset of 16 gesture types collected with the Walabot, a low-cost off-the-shelf array radar. We contrast these results with those obtained for the same gesture types collected with an ultra-wideband radar made of a vector network analyzer with a single horn antenna and with a computer vision sensor, respectively. Based on our findings, we suggest some design implications to support future development in radar-based gesture recognition.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-28DOI: https://dl.acm.org/doi/10.1145/3589345
Benjamin Charles Germain Lee, Doug Downey, Kyle Lo, Daniel S. Weld
Research in human-centered AI has shown the benefits of systems that can explain their predictions. Methods that allow an AI to take advice from humans in response to explanations are similarly useful. While both capabilities are well-developed for transparent learning models (e.g., linear models and GA2Ms), and recent techniques (e.g., LIME and SHAP) can generate explanations for opaque models, little attention has been given to advice methods for opaque models. This paper introduces LIMEADE, the first general framework that translates both positive and negative advice (expressed using high-level vocabulary such as that employed by post-hoc explanations) into an update to an arbitrary, underlying opaque model. We demonstrate the generality of our approach with case studies on seventy real-world models across two broad domains: image classification and text recommendation. We show our method improves accuracy compared to a rigorous baseline on the image classification domains. For the text modality, we apply our framework to a neural recommender system for scientific papers on a public website; our user study shows that our framework leads to significantly higher perceived user control, trust, and satisfaction.
{"title":"LIMEADE: From AI Explanations to Advice Taking","authors":"Benjamin Charles Germain Lee, Doug Downey, Kyle Lo, Daniel S. Weld","doi":"https://dl.acm.org/doi/10.1145/3589345","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3589345","url":null,"abstract":"<p>Research in human-centered AI has shown the benefits of systems that can explain their predictions. Methods that allow an AI to take advice from humans in response to explanations are similarly useful. While both capabilities are well-developed for <i>transparent</i> learning models (e.g., linear models and GA<sup>2</sup>Ms), and recent techniques (e.g., LIME and SHAP) can generate explanations for <i>opaque</i> models, little attention has been given to advice methods for opaque models. This paper introduces LIMEADE, the first general framework that translates both positive and negative advice (expressed using high-level vocabulary such as that employed by post-hoc explanations) into an update to an arbitrary, underlying opaque model. We demonstrate the generality of our approach with case studies on seventy real-world models across two broad domains: image classification and text recommendation. We show our method improves accuracy compared to a rigorous baseline on the image classification domains. For the text modality, we apply our framework to a neural recommender system for scientific papers on a public website; our user study shows that our framework leads to significantly higher perceived user control, trust, and satisfaction.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-28DOI: https://dl.acm.org/doi/10.1145/3589346
Carlos Aguirre, Shiye Cao, Amama Mahmood, Chien-Ming Huang
Speech interfaces, such as personal assistants and screen readers, read image captions to users—but typically only one caption is available per image, which may not be adequate for all situations (e.g., browsing large quantities of images). Long captions provide a deeper understanding of an image but require more time to listen to, whereas shorter captions may not allow for such thorough comprehension, yet have the advantage of being faster to consume. We explore how to effectively collect both thumbnail captions—succinct image descriptions meant to be consumed quickly—and comprehensive captions—which allow individuals to understand visual content in greater detail; we consider text-based instructions and time-constrained methods to collect descriptions at these two levels of detail and find that a time-constrained method is the most effective for collecting thumbnail captions while preserving caption accuracy. Additionally, we verify that caption authors using this time-constrained method are still able to focus on the most important regions of an image by tracking their eye gaze. We evaluate our collected captions along human-rated axes—correctness, fluency, amount of detail, and mentions of important concepts—and discuss the potential for model-based metrics to perform large-scale automatic evaluations in the future.
{"title":"Crowdsourcing Thumbnail Captions: Data Collection and Validation","authors":"Carlos Aguirre, Shiye Cao, Amama Mahmood, Chien-Ming Huang","doi":"https://dl.acm.org/doi/10.1145/3589346","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3589346","url":null,"abstract":"<p>Speech interfaces, such as personal assistants and screen readers, read image captions to users—but typically only one caption is available per image, which may not be adequate for all situations (e.g., browsing large quantities of images). Long captions provide a deeper understanding of an image but require more time to listen to, whereas shorter captions may not allow for such thorough comprehension, yet have the advantage of being faster to consume. We explore how to effectively collect both thumbnail captions—succinct image descriptions meant to be consumed quickly—and comprehensive captions—which allow individuals to understand visual content in greater detail; we consider text-based instructions and time-constrained methods to collect descriptions at these two levels of detail and find that a time-constrained method is the most effective for collecting thumbnail captions while preserving caption accuracy. Additionally, we verify that caption authors using this time-constrained method are still able to focus on the most important regions of an image by tracking their eye gaze. We evaluate our collected captions along human-rated axes—correctness, fluency, amount of detail, and mentions of important concepts—and discuss the potential for model-based metrics to perform large-scale automatic evaluations in the future.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carlos Alejandro Aguirre, Shiye Cao, Amama Mahmood, Chien-Ming Huang
Speech interfaces, such as personal assistants and screen readers, read image captions to users. Typically, however, only one caption is available per image, which may not be adequate for all situations (e.g., browsing large quantities of images). Long captions provide a deeper understanding of an image but require more time to listen to, whereas shorter captions may not allow for such thorough comprehension yet have the advantage of being faster to consume. We explore how to effectively collect both thumbnail captions—succinct image descriptions meant to be consumed quickly—and comprehensive captions—which allow individuals to understand visual content in greater detail. We consider text-based instructions and time-constrained methods to collect descriptions at these two levels of detail and find that a time-constrained method is the most effective for collecting thumbnail captions while preserving caption accuracy. Additionally, we verify that caption authors using this time-constrained method are still able to focus on the most important regions of an image by tracking their eye gaze. We evaluate our collected captions along human-rated axes—correctness, fluency, amount of detail, and mentions of important concepts—and discuss the potential for model-based metrics to perform large-scale automatic evaluations in the future.
{"title":"Crowdsourcing Thumbnail Captions: Data Collection and Validation","authors":"Carlos Alejandro Aguirre, Shiye Cao, Amama Mahmood, Chien-Ming Huang","doi":"10.1145/3589346","DOIUrl":"https://doi.org/10.1145/3589346","url":null,"abstract":"Speech interfaces, such as personal assistants and screen readers, read image captions to users. Typically, however, only one caption is available per image, which may not be adequate for all situations (e.g., browsing large quantities of images). Long captions provide a deeper understanding of an image but require more time to listen to, whereas shorter captions may not allow for such thorough comprehension yet have the advantage of being faster to consume. We explore how to effectively collect both thumbnail captions—succinct image descriptions meant to be consumed quickly—and comprehensive captions—which allow individuals to understand visual content in greater detail. We consider text-based instructions and time-constrained methods to collect descriptions at these two levels of detail and find that a time-constrained method is the most effective for collecting thumbnail captions while preserving caption accuracy. Additionally, we verify that caption authors using this time-constrained method are still able to focus on the most important regions of an image by tracking their eye gaze. We evaluate our collected captions along human-rated axes—correctness, fluency, amount of detail, and mentions of important concepts—and discuss the potential for model-based metrics to perform large-scale automatic evaluations in the future.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89153840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-24DOI: https://dl.acm.org/doi/10.1145/3588594
Tim Schrills, Thomas Franke
When interacting with artificial intelligence (AI) in the medical domain, users frequently face automated information processing, which can remain opaque to them. For example, users with diabetes may interact daily with automated insulin delivery (AID). However, effective AID therapy requires traceability of automated decisions for diverse users. Grounded in research on human-automation interaction, we study Subjective Information Processing Awareness (SIPA) as a key construct to research users’ experience of explainable AI. The objective of the present research was to examine how users experience differing levels of traceability of an AI algorithm. We developed a basic AID simulation to create realistic scenarios for an experiment with N = 80, where we examined the effect of three levels of information disclosure on SIPA and performance. Attributes serving as the basis for insulin needs calculation were shown to users, who predicted the AID system’s calculation after over 60 observations. Results showed a difference in SIPA after repeated observations, associated with a general decline of SIPA ratings over time. Supporting scale validity, SIPA was strongly correlated with trust and satisfaction with explanations. The present research indicates that the effect of different levels of information disclosure may need several repetitions before it manifests. Additionally, high levels of information disclosure may lead to a miscalibration between SIPA and performance in predicting the system’s results. The results indicate that for a responsible design of XAI, system designers could utilize prediction tasks in order to calibrate experienced traceability.
{"title":"How do Users Experience Traceability of AI Systems? Examining Subjective Information Processing Awareness in Automated Insulin Delivery (AID) Systems","authors":"Tim Schrills, Thomas Franke","doi":"https://dl.acm.org/doi/10.1145/3588594","DOIUrl":"https://doi.org/https://dl.acm.org/doi/10.1145/3588594","url":null,"abstract":"<p>When interacting with artificial intelligence (AI) in the medical domain, users frequently face automated information processing, which can remain opaque to them. For example, users with diabetes may interact daily with automated insulin delivery (AID). However, effective AID therapy requires traceability of automated decisions for diverse users. Grounded in research on human-automation interaction, we study Subjective Information Processing Awareness (SIPA) as a key construct to research users’ experience of explainable AI. The objective of the present research was to examine how users experience differing levels of traceability of an AI algorithm. We developed a basic AID simulation to create realistic scenarios for an experiment with <i>N</i> = 80, where we examined the effect of three levels of information disclosure on SIPA and performance. Attributes serving as the basis for insulin needs calculation were shown to users, who predicted the AID system’s calculation after over 60 observations. Results showed a difference in SIPA after repeated observations, associated with a general decline of SIPA ratings over time. Supporting scale validity, SIPA was strongly correlated with trust and satisfaction with explanations. The present research indicates that the effect of different levels of information disclosure may need several repetitions before it manifests. Additionally, high levels of information disclosure may lead to a miscalibration between SIPA and performance in predicting the system’s results. The results indicate that for a responsible design of XAI, system designers could utilize prediction tasks in order to calibrate experienced traceability.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":null,"pages":null},"PeriodicalIF":3.4,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}