Ruijia Cheng, Ruotong Wang, Thomas Zimmermann, Denae Ford
While revolutionary AI-powered code generation tools have been rising rapidly, we know little about how and how to help software developers form appropriate trust in those AI tools. Through a two-phase formative study, we investigate how online communities shape developers’ trust in AI tools and how we can leverage community features to facilitate appropriate user trust. Through interviewing 17 developers, we find that developers collectively make sense of AI tools using the experiences shared by community members and leverage community signals to evaluate AI suggestions. We then surface design opportunities and conduct 11 design probe sessions to explore the design space of using community features to support user trust in AI code generation systems. We synthesize our findings and extend an existing model of user trust in AI technologies with sociotechnical factors. We map out the design considerations for integrating user community into the AI code generation experience.
{"title":"“It would work for me too”: How Online Communities Shape Software Developers’ Trust in AI-Powered Code Generation Tools","authors":"Ruijia Cheng, Ruotong Wang, Thomas Zimmermann, Denae Ford","doi":"10.1145/3651990","DOIUrl":"https://doi.org/10.1145/3651990","url":null,"abstract":"<p>While revolutionary AI-powered code generation tools have been rising rapidly, we know little about how and how to help software developers form appropriate trust in those AI tools. Through a two-phase formative study, we investigate how online communities shape developers’ trust in AI tools and how we can leverage community features to facilitate appropriate user trust. Through interviewing 17 developers, we find that developers collectively make sense of AI tools using the experiences shared by community members and leverage community signals to evaluate AI suggestions. We then surface design opportunities and conduct 11 design probe sessions to explore the design space of using community features to support user trust in AI code generation systems. We synthesize our findings and extend an existing model of user trust in AI technologies with sociotechnical factors. We map out the design considerations for integrating user community into the AI code generation experience.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"31 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140072632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Querying structured databases with natural language (NL2SQL) has remained a difficult problem for years. Recently, the advancement of machine learning (ML), natural language processing (NLP), and large language models (LLM) have led to significant improvements in performance, with the best model achieving ∼ 85% percent accuracy on the benchmark Spider dataset. However, there is a lack of a systematic understanding of the types, causes, and effectiveness of error-handling mechanisms of errors for erroneous queries nowadays. To bridge the gap, a taxonomy of errors made by four representative NL2SQL models was built in this work, along with an in-depth analysis of the errors. Second, the causes of model errors were explored by analyzing the model-human attention alignment to the natural language query. Last, a within-subjects user study with 26 participants was conducted to investigate the effectiveness of three interactive error-handling mechanisms in NL2SQL. Findings from this paper shed light on the design of model structure and error discovery and repair strategies for natural language data query interfaces in the future.
{"title":"Insights into Natural Language Database Query Errors: From Attention Misalignment to User Handling Strategies","authors":"Zheng Ning, Yuan Tian, Zheng Zhang, Tianyi Zhang, Toby Jia-Jun Li","doi":"10.1145/3650114","DOIUrl":"https://doi.org/10.1145/3650114","url":null,"abstract":"<p>Querying structured databases with natural language (NL2SQL) has remained a difficult problem for years. Recently, the advancement of machine learning (ML), natural language processing (NLP), and large language models (LLM) have led to significant improvements in performance, with the best model achieving ∼ 85% percent accuracy on the benchmark Spider dataset. However, there is a lack of a systematic understanding of the types, causes, and effectiveness of error-handling mechanisms of errors for erroneous queries nowadays. To bridge the gap, a taxonomy of errors made by four representative NL2SQL models was built in this work, along with an in-depth analysis of the errors. Second, the causes of model errors were explored by analyzing the model-human attention alignment to the natural language query. Last, a within-subjects user study with 26 participants was conducted to investigate the effectiveness of three interactive error-handling mechanisms in NL2SQL. Findings from this paper shed light on the design of model structure and error discovery and repair strategies for natural language data query interfaces in the future.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"59 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140019034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
AI-assisted interactive annotation is a powerful way to facilitate data annotation – a prerequisite for constructing robust AI models. While AI-assisted interactive annotation has been extensively studied in static settings, less is known about its usage in dynamic scenarios where the annotators operate under time and cognitive constraints, e.g., while detecting suspicious or dangerous activities from real-time surveillance feeds. Understanding how AI can assist annotators in these tasks and facilitate consistent annotation is paramount to ensure high performance for AI models trained on these data. We address this gap in interactive machine learning (IML) research, contributing an extensive investigation of the benefits, limitations, and challenges of AI-assisted annotation in dynamic application use cases. We address both the effects of AI on annotators and the effects of (AI) annotations on the performance of AI models trained on annotated data in real-time video annotations. We conduct extensive experiments that compare annotation performance at two annotator levels (expert and non-expert) and two interactive labelling techniques (with and without AI-assistance). In a controlled study with N = 34 annotators and a follow up study with 51963 images and their annotation labels being input to the AI model, we demonstrate that the benefits of AI-assisted models are greatest for non-expert users and for cases where targets are only partially or briefly visible. The expert users tend to outperform or achieve similar performance as AI model. Labels combining AI and expert annotations result in the best overall performance as the AI reduces overflow and latency in the expert annotations. We derive guidelines for the use of AI-assisted human annotation in real-time dynamic use cases.
人工智能辅助交互式注释是一种促进数据注释的强大方法,也是构建强大人工智能模型的先决条件。虽然人工智能辅助交互式注释已在静态环境中得到广泛研究,但对其在动态场景中的应用却知之甚少,在动态场景中,注释者的操作受到时间和认知能力的限制,例如从实时监控馈送中检测可疑或危险活动。要确保在这些数据上训练的人工智能模型的高性能,了解人工智能如何协助注释者完成这些任务并促进注释的一致性至关重要。我们针对交互式机器学习(IML)研究中的这一空白,对动态应用案例中人工智能辅助标注的优势、局限性和挑战进行了广泛的调查。我们既探讨了人工智能对注释者的影响,也探讨了(人工智能)注释对在实时视频注释中根据注释数据训练的人工智能模型性能的影响。我们进行了广泛的实验,比较了两种注释者水平(专家和非专家)和两种交互式标签技术(有人工智能辅助和无人工智能辅助)下的注释性能。在一项由 N = 34 名标注者进行的对照研究和一项由 51963 张图像及其标注标签输入人工智能模型的后续研究中,我们证明了人工智能辅助模型对非专家用户以及目标仅部分可见或短暂可见的情况的优势最大。专家用户的表现往往优于人工智能模型或与之相近。由于人工智能减少了专家注释的溢出和延迟,因此结合人工智能和专家注释的标签可获得最佳的整体性能。我们得出了在实时动态用例中使用人工智能辅助人类注释的指导原则。
{"title":"Man and the Machine: Effects of AI-assisted Human Labeling on Interactive Annotation of Real-Time Video Streams","authors":"Marko Radeta, Ruben Freitas, Claudio Rodrigues, Agustin Zuniga, Ngoc Thi Nguyen, Huber Flores, Petteri Nurmi","doi":"10.1145/3649457","DOIUrl":"https://doi.org/10.1145/3649457","url":null,"abstract":"<p>AI-assisted interactive annotation is a powerful way to facilitate data annotation – a prerequisite for constructing robust AI models. While AI-assisted interactive annotation has been extensively studied in static settings, less is known about its usage in dynamic scenarios where the annotators operate under time and cognitive constraints, e.g., while detecting suspicious or dangerous activities from real-time surveillance feeds. Understanding how AI can assist annotators in these tasks and facilitate consistent annotation is paramount to ensure high performance for AI models trained on these data. We address this gap in interactive machine learning (IML) research, contributing an extensive investigation of the benefits, limitations, and challenges of AI-assisted annotation in dynamic application use cases. We address both the effects of AI on annotators and the effects of (AI) annotations on the performance of AI models trained on annotated data in real-time video annotations. We conduct extensive experiments that compare annotation performance at two annotator levels (expert and non-expert) and two interactive labelling techniques (with and without AI-assistance). In a controlled study with <i>N</i> = 34 annotators and a follow up study with 51963 images and their annotation labels being input to the AI model, we demonstrate that the benefits of AI-assisted models are greatest for non-expert users and for cases where targets are only partially or briefly visible. The expert users tend to outperform or achieve similar performance as AI model. Labels combining AI and expert annotations result in the best overall performance as the AI reduces overflow and latency in the expert annotations. We derive guidelines for the use of AI-assisted human annotation in real-time dynamic use cases.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"19 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140009301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Guo, Danqing Shi, Mingjuan Guo, Yanqiu Wu, Nan Cao, Qing Chen
Through a natural language interface (NLI) for exploratory visual analysis, users can directly “ask” analytical questions about the given tabular data. This process greatly improves user experience and lowers the technical barriers of data analysis. Existing techniques focus on generating a visualization from a concrete question. However, complex questions, requiring multiple data queries and visualizations to answer, are frequently asked in data exploration and analysis, which cannot be easily solved with the existing techniques. To address this issue, in this paper, we introduce Talk2Data, a natural language interface for exploratory visual analysis that supports answering complex questions. It leverages an advanced deep-learning model to resolve complex questions into a series of simple questions that could gradually elaborate on the users’ requirements. To present answers, we design a set of annotated and captioned visualizations to represent the answers in a form that supports interpretation and narration. We conducted an ablation study and a controlled user study to evaluate the Talk2Data’s effectiveness and usefulness.
{"title":"Talk2Data : A Natural Language Interface for Exploratory Visual Analysis via Question Decomposition","authors":"Yi Guo, Danqing Shi, Mingjuan Guo, Yanqiu Wu, Nan Cao, Qing Chen","doi":"10.1145/3643894","DOIUrl":"https://doi.org/10.1145/3643894","url":null,"abstract":"<p>Through a natural language interface (NLI) for exploratory visual analysis, users can directly “ask” analytical questions about the given tabular data. This process greatly improves user experience and lowers the technical barriers of data analysis. Existing techniques focus on generating a visualization from a concrete question. However, complex questions, requiring multiple data queries and visualizations to answer, are frequently asked in data exploration and analysis, which cannot be easily solved with the existing techniques. To address this issue, in this paper, we introduce Talk2Data, a natural language interface for exploratory visual analysis that supports answering complex questions. It leverages an advanced deep-learning model to resolve complex questions into a series of simple questions that could gradually elaborate on the users’ requirements. To present answers, we design a set of annotated and captioned visualizations to represent the answers in a form that supports interpretation and narration. We conducted an ablation study and a controlled user study to evaluate the Talk2Data’s effectiveness and usefulness.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"1 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139767287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zeinab R. Yousefi, Tung Vuong, Marie AlGhossein, Tuukka Ruotsalo, Giulio Jaccuci, Samuel Kaski
Our digital life consists of activities that are organized around tasks and exhibit different user states in the digital contexts around these activities. Previous works have shown that digital activity monitoring can be used to predict entities that users will need to perform digital tasks. There have been methods developed to automatically detect the tasks of a user. However, these studies typically support only specific applications and tasks and relatively little research has been conducted on real-life digital activities. This paper introduces user state modeling and prediction with contextual information captured as entities, recorded from real-world digital user behavior, called entity footprinting; a system that records users’ digital activities on their screens and proactively provides useful entities across application boundaries without requiring explicit query formulation. Our methodology is to detect contextual user states using latent representations of entities occurring in digital activities. Using topic models and recurrent neural networks, the model learns the latent representation of concurrent entities and their sequential relationships. We report a field study in which the digital activities of thirteen people were recorded continuously for 14 days. The model learned from this data is used to 1) predict contextual user states, and 2) predict relevant entities for the detected states. The results show improved user state detection accuracy and entity prediction performance compared to static, heuristic, and basic topic models. Our findings have implications for the design of proactive recommendation systems that can implicitly infer users’ contextual state by monitoring users’ digital activities and proactively recommending the right information at the right time.
{"title":"Entity Footprinting: Modeling Contextual User States via Digital Activity Monitoring","authors":"Zeinab R. Yousefi, Tung Vuong, Marie AlGhossein, Tuukka Ruotsalo, Giulio Jaccuci, Samuel Kaski","doi":"10.1145/3643893","DOIUrl":"https://doi.org/10.1145/3643893","url":null,"abstract":"<p>Our digital life consists of activities that are organized around tasks and exhibit different user states in the digital contexts around these activities. Previous works have shown that digital activity monitoring can be used to predict entities that users will need to perform digital tasks. There have been methods developed to automatically detect the tasks of a user. However, these studies typically support only specific applications and tasks and relatively little research has been conducted on real-life digital activities. This paper introduces user state modeling and prediction with contextual information captured as entities, recorded from real-world digital user behavior, called <i>entity footprinting</i>; a system that records users’ digital activities on their screens and proactively provides useful entities across application boundaries without requiring explicit query formulation. Our methodology is to detect contextual user states using latent representations of entities occurring in digital activities. Using topic models and recurrent neural networks, the model learns the latent representation of concurrent entities and their sequential relationships. We report a field study in which the digital activities of thirteen people were recorded continuously for 14 days. The model learned from this data is used to 1) predict contextual user states, and 2) predict relevant entities for the detected states. The results show improved user state detection accuracy and entity prediction performance compared to static, heuristic, and basic topic models. Our findings have implications for the design of proactive recommendation systems that can implicitly infer users’ contextual state by monitoring users’ digital activities and proactively recommending the right information at the right time.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"4 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139767234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Group recommender systems (GRSs) identify items to recommend to a group of people by aggregating group members’ individual preferences into a group profile, and selecting the items that have the largest score in the group profile. The GRS predicts that these recommendations would be chosen by the group, by assuming that the group is applying the same preference aggregation strategy as the one adopted by the GRS. However, predicting the choice of a group is more complex since the GRS is not aware of the exact preference aggregation strategy that is going to be used by the group.
To this end, the aim of this paper is to validate the research hypothesis that, by using a machine learning approach and a data set of observed group choices, it is possible to predict a group’s final choice, better than by using a standard preference aggregation strategy. Inspired by the Decision Scheme theory, which first tried to address the group choice prediction problem, we search for a group profile definition that, in conjunction with a machine learning model, can be used to accurately predict a group choice. Moreover, to cope with the data scarcity problem, we propose two data augmentation methods, which add synthetic group profiles to the training data, and we hypothesize they can further improve the choice prediction accuracy.
We validate our research hypotheses by using a data set containing 282 participants organized in 79 groups. The experiments indicate that the proposed method outperforms baseline aggregation strategies when used for group choice prediction. The method we propose is robust with the presence of missing preference data and achieves a performance superior to what humans can achieve on the group choice prediction task. Finally, the proposed data augmentation method can also improve the prediction accuracy. Our approach can be exploited in novel GRSs to identify the items that the group is likely to choose and to help groups to make even better and fairer choices.
{"title":"Predicting Group Choices from Group Profiles","authors":"Hanif Emamgholizadeh, Amra Delić, Francesco Ricci","doi":"10.1145/3639710","DOIUrl":"https://doi.org/10.1145/3639710","url":null,"abstract":"<p>Group recommender systems (GRSs) identify items to recommend to a group of people by aggregating group members’ individual preferences into a group profile, and selecting the items that have the largest score in the group profile. The GRS predicts that these recommendations would be chosen by the group, by assuming that the group is applying the same preference aggregation strategy as the one adopted by the GRS. However, predicting the choice of a group is more complex since the GRS is not aware of the exact preference aggregation strategy that is going to be used by the group. </p><p>To this end, the aim of this paper is to validate the research hypothesis that, by using a machine learning approach and a data set of observed group choices, it is possible to predict a group’s final choice, better than by using a standard preference aggregation strategy. Inspired by the Decision Scheme theory, which first tried to address the group choice prediction problem, we search for a group profile definition that, in conjunction with a machine learning model, can be used to accurately predict a group choice. Moreover, to cope with the data scarcity problem, we propose two data augmentation methods, which add synthetic group profiles to the training data, and we hypothesize they can further improve the choice prediction accuracy. </p><p>We validate our research hypotheses by using a data set containing 282 participants organized in 79 groups. The experiments indicate that the proposed method outperforms baseline aggregation strategies when used for group choice prediction. The method we propose is robust with the presence of missing preference data and achieves a performance superior to what humans can achieve on the group choice prediction task. Finally, the proposed data augmentation method can also improve the prediction accuracy. Our approach can be exploited in novel GRSs to identify the items that the group is likely to choose and to help groups to make even better and fairer choices.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"12 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139409668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carolina Centeio Jorge, Catholijn M. Jonker, Myrthe L. Tielman
In teams composed of humans, we use trust in others to make decisions, such as what to do next, who to help and who to ask for help. When a team member is artificial, they should also be able to assess whether a human teammate is trustworthy for a certain task. We see trustworthiness as the combination of (1) whether someone will do a task and (2) whether they can do it. With building beliefs in trustworthiness as an ultimate goal, we explore which internal factors (krypta) of the human may play a role (e.g. ability, benevolence and integrity) in determining trustworthiness, according to existing literature. Furthermore, we investigate which observable metrics (manifesta) an agent may take into account as cues for the human teammate’s krypta in an online 2D grid-world experiment (n=54). Results suggest that cues of ability, benevolence and integrity influence trustworthiness. However, we observed that trustworthiness is mainly influenced by human’s playing strategy and cost-benefit analysis, which deserves further investigation. This is a first step towards building informed beliefs of human trustworthiness in human-AI teamwork.
{"title":"How should an AI trust its human teammates? Exploring possible cues of artificial trust","authors":"Carolina Centeio Jorge, Catholijn M. Jonker, Myrthe L. Tielman","doi":"10.1145/3635475","DOIUrl":"https://doi.org/10.1145/3635475","url":null,"abstract":"<p>In teams composed of humans, we use trust in others to make decisions, such as what to do next, who to help and who to ask for help. When a team member is artificial, they should also be able to assess whether a human teammate is trustworthy for a certain task. We see trustworthiness as the combination of (1) whether someone will do a task and (2) whether they can do it. With building beliefs in trustworthiness as an ultimate goal, we explore which internal factors (krypta) of the human may play a role (e.g. ability, benevolence and integrity) in determining trustworthiness, according to existing literature. Furthermore, we investigate which observable metrics (manifesta) an agent may take into account as cues for the human teammate’s krypta in an online 2D grid-world experiment (n=54). Results suggest that cues of ability, benevolence and integrity influence trustworthiness. However, we observed that trustworthiness is mainly influenced by human’s playing strategy and cost-benefit analysis, which deserves further investigation. This is a first step towards building informed beliefs of human trustworthiness in human-AI teamwork.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"16 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138545567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Zhang, Christopher Flathmann, Geoff Musick, Beau Schelble, Nathan J. McNeese, Bart Knijnenburg, Wen Duan
Explanation of artificial intelligence (AI) decision-making has become an important research area in human-computer interaction (HCI) and computer-supported teamwork research. While plenty of research has investigated AI explanations with an intent to improve AI transparency and human trust in AI, how AI explanations function in teaming environments remains unclear. Given that a major benefit of AI giving explanations is to increase human trust understanding how AI explanations impact human trust is crucial to effective human-AI teamwork. An online experiment was conducted with 156 participants to explore this question by examining how a teammate’s explanations impact the perceived trust of the teammate and the effectiveness of the team and how these impacts vary based on whether the teammate is a human or an AI. This study shows that explanations facilitate trust in AI teammates when explaining why AI disobeyed humans’ orders but hindered trust when explaining why an AI lied to humans. In addition, participants’ personal characteristics (e.g., their gender and the individual’s ethical framework) impacted their perceptions of AI teammates both directly and indirectly in different scenarios. Our study contributes to interactive intelligent systems and HCI by shedding light on how an AI teammate’s actions and corresponding explanations are perceived by humans while identifying factors that impact trust and perceived effectiveness. This work provides an initial understanding of AI explanations in human-AI teams, which can be used for future research to build upon in exploring AI explanation implementation in collaborative environments.
{"title":"I Know This Looks Bad, But I Can Explain: Understanding When AI Should Explain Actions In Human-AI Teams","authors":"Rui Zhang, Christopher Flathmann, Geoff Musick, Beau Schelble, Nathan J. McNeese, Bart Knijnenburg, Wen Duan","doi":"10.1145/3635474","DOIUrl":"https://doi.org/10.1145/3635474","url":null,"abstract":"<p>Explanation of artificial intelligence (AI) decision-making has become an important research area in human-computer interaction (HCI) and computer-supported teamwork research. While plenty of research has investigated AI explanations with an intent to improve AI transparency and human trust in AI, how AI explanations function in teaming environments remains unclear. Given that a major benefit of AI giving explanations is to increase human trust understanding how AI explanations impact human trust is crucial to effective human-AI teamwork. An online experiment was conducted with 156 participants to explore this question by examining how a teammate’s explanations impact the perceived trust of the teammate and the effectiveness of the team and how these impacts vary based on whether the teammate is a human or an AI. This study shows that explanations facilitate trust in AI teammates when explaining why AI disobeyed humans’ orders but hindered trust when explaining why an AI lied to humans. In addition, participants’ personal characteristics (e.g., their gender and the individual’s ethical framework) impacted their perceptions of AI teammates both directly and indirectly in different scenarios. Our study contributes to interactive intelligent systems and HCI by shedding light on how an AI teammate’s actions and corresponding explanations are perceived by humans while identifying factors that impact trust and perceived effectiveness. This work provides an initial understanding of AI explanations in human-AI teams, which can be used for future research to build upon in exploring AI explanation implementation in collaborative environments.</p>","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"54 3","pages":""},"PeriodicalIF":3.4,"publicationDate":"2023-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Whereas most research in AI system explanation for healthcare applications looks at developing algorithmic explanations targeted at AI experts or medical professionals, the question we raise is: How do we build meaningful explanations for laypeople? And how does a meaningful explanation affect user’s trust perceptions? Our research investigates how the key factors affecting human-AI trust change in the light of human expertise, and how to design explanations specifically targeted at non-experts. By means of a stage-based design method, we map the ways laypeople understand AI explanations in a User Explanation Model. We also map both medical professionals and AI experts’ practice in an Expert Explanation Model. A Target Explanation Model is then proposed, which represents how experts’ practice and layperson’s understanding can be combined to design meaningful explanations. Design guidelines for meaningful AI explanations are proposed, and a prototype of AI system explanation for non-expert users in a breast cancer scenario is presented and assessed on how it affect users’ trust perceptions.
{"title":"Meaningful Explanation Effect on User’s Trust in an AI Medical System: Designing Explanations for Non-Expert Users","authors":"Retno Larasati, Anna De Liddo, Enrico Motta","doi":"10.1145/3631614","DOIUrl":"https://doi.org/10.1145/3631614","url":null,"abstract":"Whereas most research in AI system explanation for healthcare applications looks at developing algorithmic explanations targeted at AI experts or medical professionals, the question we raise is: How do we build meaningful explanations for laypeople? And how does a meaningful explanation affect user’s trust perceptions? Our research investigates how the key factors affecting human-AI trust change in the light of human expertise, and how to design explanations specifically targeted at non-experts. By means of a stage-based design method, we map the ways laypeople understand AI explanations in a User Explanation Model. We also map both medical professionals and AI experts’ practice in an Expert Explanation Model. A Target Explanation Model is then proposed, which represents how experts’ practice and layperson’s understanding can be combined to design meaningful explanations. Design guidelines for meaningful AI explanations are proposed, and a prototype of AI system explanation for non-expert users in a breast cancer scenario is presented and assessed on how it affect users’ trust perceptions.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135390620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chiradeep Roy, Mahsan Nourani, Shivvrat Arya, Mahesh Shanbhag, Tahrima Rahman, Eric D. Ragan, Nicholas Ruozzi, Vibhav Gogate
We consider the following video activity recognition (VAR) task: given a video, infer the set of activities being performed in the video and assign each frame to an activity. Although VAR can be solved accurately using existing deep learning techniques, deep networks are neither interpretable nor explainable and as a result their use is problematic in high stakes decision-making applications (e.g., in healthcare, experimental Biology, aviation, law, etc.). In such applications, failure may lead to disastrous consequences and therefore it is necessary that the user is able to either understand the inner workings of the model or probe it to understand its reasoning patterns for a given decision. We address these limitations of deep networks by proposing a new approach that feeds the output of a deep model into a tractable, interpretable probabilistic model called a dynamic conditional cutset network that is defined over the explanatory and output variables and then performing joint inference over the combined model. The two key benefits of using cutset networks are: (a) they explicitly model the relationship between the output and explanatory variables and as a result the combined model is likely to be more accurate than the vanilla deep model and (b) they can answer reasoning queries in polynomial time and as a result they can derive meaningful explanations by efficiently answering explanation queries. We demonstrate the efficacy of our approach on two datasets, Textually Annotated Cooking Scenes (TACoS), and wet lab, using conventional evaluation measures such as the Jaccard Index and Hamming Loss, as well as a human-subjects study.
{"title":"Explainable Activity Recognition in Videos using Deep Learning and Tractable Probabilistic Models","authors":"Chiradeep Roy, Mahsan Nourani, Shivvrat Arya, Mahesh Shanbhag, Tahrima Rahman, Eric D. Ragan, Nicholas Ruozzi, Vibhav Gogate","doi":"10.1145/3626961","DOIUrl":"https://doi.org/10.1145/3626961","url":null,"abstract":"We consider the following video activity recognition (VAR) task: given a video, infer the set of activities being performed in the video and assign each frame to an activity. Although VAR can be solved accurately using existing deep learning techniques, deep networks are neither interpretable nor explainable and as a result their use is problematic in high stakes decision-making applications (e.g., in healthcare, experimental Biology, aviation, law, etc.). In such applications, failure may lead to disastrous consequences and therefore it is necessary that the user is able to either understand the inner workings of the model or probe it to understand its reasoning patterns for a given decision. We address these limitations of deep networks by proposing a new approach that feeds the output of a deep model into a tractable, interpretable probabilistic model called a dynamic conditional cutset network that is defined over the explanatory and output variables and then performing joint inference over the combined model. The two key benefits of using cutset networks are: (a) they explicitly model the relationship between the output and explanatory variables and as a result the combined model is likely to be more accurate than the vanilla deep model and (b) they can answer reasoning queries in polynomial time and as a result they can derive meaningful explanations by efficiently answering explanation queries. We demonstrate the efficacy of our approach on two datasets, Textually Annotated Cooking Scenes (TACoS), and wet lab, using conventional evaluation measures such as the Jaccard Index and Hamming Loss, as well as a human-subjects study.","PeriodicalId":48574,"journal":{"name":"ACM Transactions on Interactive Intelligent Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136012607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}