Microwave radars bring many benefits to mid-air gesture sensing due to their large field of view and independence from environmental conditions, such as ambient light and occlusion. However, radar signals are highly dimensional and usually require complex deep learning approaches. To understand this landscape, we report results from a systematic literature review of (N = 118) scientific papers on radar sensing, unveiling a large variety of radar technology of different operating frequencies and bandwidths, antenna configurations, but also various gesture recognition techniques. Although highly accurate, these techniques require a large amount of training data that depend on the type of radar. Therefore, the training results cannot be easily transferred to other radars. To address this aspect, we introduce a new gesture recognition pipeline that implements advanced full-wave electromagnetic modeling and inversion to retrieve physical characteristics of gestures that are radar independent, i.e., independent of the source, antennas, and radar-hand interactions. Inversion of radar signals further reduces the size of the dataset by several orders of magnitude, while preserving the essential information. This approach is compatible with conventional gesture recognizers, such as those based on template matching, which only need a few training examples to deliver high recognition accuracy rates. To evaluate our gesture recognition pipeline, we conducted user-dependent and user-independent evaluations on a dataset of 16 gesture types collected with the Walabot, a low-cost off-the-shelf array radar. We contrast these results with those obtained for the same gesture types collected with an ultra-wideband radar made of a vector network analyzer with a single horn antenna and with a computer vision sensor, respectively. Based on our findings, we suggest some design implications to support future development in radar-based gesture recognition.
Research in human-centered AI has shown the benefits of systems that can explain their predictions. Methods that allow an AI to take advice from humans in response to explanations are similarly useful. While both capabilities are well-developed for transparent learning models (e.g., linear models and GA2Ms), and recent techniques (e.g., LIME and SHAP) can generate explanations for opaque models, little attention has been given to advice methods for opaque models. This paper introduces LIMEADE, the first general framework that translates both positive and negative advice (expressed using high-level vocabulary such as that employed by post-hoc explanations) into an update to an arbitrary, underlying opaque model. We demonstrate the generality of our approach with case studies on seventy real-world models across two broad domains: image classification and text recommendation. We show our method improves accuracy compared to a rigorous baseline on the image classification domains. For the text modality, we apply our framework to a neural recommender system for scientific papers on a public website; our user study shows that our framework leads to significantly higher perceived user control, trust, and satisfaction.
Speech interfaces, such as personal assistants and screen readers, read image captions to users—but typically only one caption is available per image, which may not be adequate for all situations (e.g., browsing large quantities of images). Long captions provide a deeper understanding of an image but require more time to listen to, whereas shorter captions may not allow for such thorough comprehension, yet have the advantage of being faster to consume. We explore how to effectively collect both thumbnail captions—succinct image descriptions meant to be consumed quickly—and comprehensive captions—which allow individuals to understand visual content in greater detail; we consider text-based instructions and time-constrained methods to collect descriptions at these two levels of detail and find that a time-constrained method is the most effective for collecting thumbnail captions while preserving caption accuracy. Additionally, we verify that caption authors using this time-constrained method are still able to focus on the most important regions of an image by tracking their eye gaze. We evaluate our collected captions along human-rated axes—correctness, fluency, amount of detail, and mentions of important concepts—and discuss the potential for model-based metrics to perform large-scale automatic evaluations in the future.
When interacting with artificial intelligence (AI) in the medical domain, users frequently face automated information processing, which can remain opaque to them. For example, users with diabetes may interact daily with automated insulin delivery (AID). However, effective AID therapy requires traceability of automated decisions for diverse users. Grounded in research on human-automation interaction, we study Subjective Information Processing Awareness (SIPA) as a key construct to research users’ experience of explainable AI. The objective of the present research was to examine how users experience differing levels of traceability of an AI algorithm. We developed a basic AID simulation to create realistic scenarios for an experiment with N = 80, where we examined the effect of three levels of information disclosure on SIPA and performance. Attributes serving as the basis for insulin needs calculation were shown to users, who predicted the AID system’s calculation after over 60 observations. Results showed a difference in SIPA after repeated observations, associated with a general decline of SIPA ratings over time. Supporting scale validity, SIPA was strongly correlated with trust and satisfaction with explanations. The present research indicates that the effect of different levels of information disclosure may need several repetitions before it manifests. Additionally, high levels of information disclosure may lead to a miscalibration between SIPA and performance in predicting the system’s results. The results indicate that for a responsible design of XAI, system designers could utilize prediction tasks in order to calibrate experienced traceability.
When people are talking together in front of digital signage, advertisements that are aware of the context of the dialogue will work the most effectively. However, it has been challenging for computer systems to retrieve the appropriate advertisement from among the many options presented in large databases. Our proposed system, the Conversational Context-sensitive Advertisement generator (CoCoA), is the first attempt to apply masked word prediction to web information retrieval that takes into account the dialogue context. The novelty of CoCoA is that advertisers simply need to prepare a few abstract phrases, called Core-Queries, and then CoCoA automatically generates a context-sensitive expression as a complete search query by utilizing a masked word prediction technique that adds a word related to the dialogue context to one of the prepared Core-Queries. This automatic generation frees the advertisers from having to come up with context-sensitive phrases to attract users’ attention. Another unique point is that the modified Core-Query offers users speaking in front of the CoCoA system a list of context-sensitive advertisements. CoCoA was evaluated by crowd workers regarding the context-sensitivity of the generated search queries against the dialogue text of multiple domains prepared in advance. The results indicated that CoCoA could present more contextual and practical advertisements than other web-retrieval systems. Moreover, CoCoA acquired a higher evaluation in a particular conversation that included many travel topics to which the Core-Queries were designated, implying that it succeeded in adapting the Core-Queries for the specific ongoing context better than the compared method without any effort on the part of the advertisers. In addition, case studies with users and advertisers revealed that the context-sensitive advertisements generated by CoCoA also had an effect on the content of the ongoing dialogue. Specifically, since pairs unfamiliar with each other more frequently referred to the advertisement CoCoA displayed, the advertisements had an effect on the topics about which the pairs spoke. Moreover, participants of an advertiser role recognized that some of the search queries generated by CoCoA fitted the context of a conversation and that CoCoA improved the effect of the advertisement. In particular, they assimilated the hang of designing a good Core-Query at ease by observing the users’ response to the advertisements retrieved with the generated search queries.