arXiv - CS - Human-Computer Interaction最新文献_第3页

Questioning AI: Promoting Decision-Making Autonomy Through Reflection 质疑人工智能：通过反思促进决策自主性

arXiv - CS - Human-Computer Interaction

Pub Date : 2024-09-16 DOI: arxiv-2409.10250

Simon WS Fischer

Decision-making is increasingly supported by machine recommendations. Inhealthcare, for example, a clinical decision support system is used by thephysician to find a treatment option for a patient. In doing so, people canrely too much on these systems, which impairs their own reasoning process. TheEuropean AI Act addresses the risk of over-reliance and postulates in Article14 on human oversight that people should be able "to remain aware of thepossible tendency of automatically relying or over-relying on the output".Similarly, the EU High-Level Expert Group identifies human agency and oversightas the first of seven key requirements for trustworthy AI. The followingposition paper proposes a conceptual approach to generate machine questionsabout the decision at hand, in order to promote decision-making autonomy. Thisengagement in turn allows for oversight of recommender systems. The systematicand interdisciplinary investigation (e.g., machine learning, user experiencedesign, psychology, philosophy of technology) of human-machine interaction inrelation to decision-making provides insights to questions like: how toincrease human oversight and calibrate over- and under-reliance on machinerecommendations; how to increase decision-making autonomy and remain aware ofother possibilities beyond automated suggestions that repeat the status-quo?

决策制定越来越多地得到机器建议的支持。例如，在医疗保健领域，医生使用临床决策支持系统为病人寻找治疗方案。在此过程中，人们可能会过度依赖这些系统，从而影响自己的推理过程。欧洲人工智能法案》提到了过度依赖的风险，并在关于人类监督的第 14 条中规定，人们应该能够 "始终意识到自动依赖或过度依赖输出结果的可能趋势"。下面的立场文件提出了一种概念性方法，让机器对手头的决策产生疑问，以促进决策的自主性。这种参与反过来又允许对推荐系统进行监督。与决策相关的人机交互的系统性和跨学科研究（如机器学习、用户体验设计、心理学、技术哲学）为以下问题提供了启示：如何加强人类监督并校准对机器推荐的过度依赖和不足；如何提高决策自主性并在重复现状的自动化建议之外保持对其他可能性的认识？

{"title":"Questioning AI: Promoting Decision-Making Autonomy Through Reflection","authors":"Simon WS Fischer","doi":"arxiv-2409.10250","DOIUrl":"https://doi.org/arxiv-2409.10250","url":null,"abstract":"Decision-making is increasingly supported by machine recommendations. In\u0000healthcare, for example, a clinical decision support system is used by the\u0000physician to find a treatment option for a patient. In doing so, people can\u0000rely too much on these systems, which impairs their own reasoning process. The\u0000European AI Act addresses the risk of over-reliance and postulates in Article\u000014 on human oversight that people should be able \"to remain aware of the\u0000possible tendency of automatically relying or over-relying on the output\".\u0000Similarly, the EU High-Level Expert Group identifies human agency and oversight\u0000as the first of seven key requirements for trustworthy AI. The following\u0000position paper proposes a conceptual approach to generate machine questions\u0000about the decision at hand, in order to promote decision-making autonomy. This\u0000engagement in turn allows for oversight of recommender systems. The systematic\u0000and interdisciplinary investigation (e.g., machine learning, user experience\u0000design, psychology, philosophy of technology) of human-machine interaction in\u0000relation to decision-making provides insights to questions like: how to\u0000increase human oversight and calibrate over- and under-reliance on machine\u0000recommendations; how to increase decision-making autonomy and remain aware of\u0000other possibilities beyond automated suggestions that repeat the status-quo?","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"101 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Comprehensive Study on Sentiment Analysis: From Rule-based to modern LLM based system 情感分析综合研究：从基于规则的系统到基于 LLM 的现代系统

arXiv - CS - Human-Computer Interaction

Pub Date : 2024-09-16 DOI: arxiv-2409.09989

Shailja Gupta, Rajesh Ranjan, Surya Narayan Singh

This paper provides a comprehensive survey of sentiment analysis within thecontext of artificial intelligence (AI) and large language models (LLMs).Sentiment analysis, a critical aspect of natural language processing (NLP), hasevolved significantly from traditional rule-based methods to advanced deeplearning techniques. This study examines the historical development ofsentiment analysis, highlighting the transition from lexicon-based andpattern-based approaches to more sophisticated machine learning and deeplearning models. Key challenges are discussed, including handling bilingualtexts, detecting sarcasm, and addressing biases. The paper reviewsstate-of-the-art approaches, identifies emerging trends, and outlines futureresearch directions to advance the field. By synthesizing current methodologiesand exploring future opportunities, this survey aims to understand sentimentanalysis in the AI and LLM context thoroughly.

情感分析是自然语言处理（NLP）的一个重要方面，从传统的基于规则的方法到先进的深度学习技术，情感分析有了长足的发展。本研究考察了情感分析的历史发展，重点介绍了从基于词典和模式的方法到更复杂的机器学习和深度学习模型的过渡。本文讨论了关键挑战，包括处理双语文本、检测讽刺和解决偏见等。论文回顾了最先进的方法，指出了新兴趋势，并概述了推动该领域发展的未来研究方向。通过综合当前的方法和探索未来的机会，本调查旨在深入了解人工智能和 LLM 背景下的情感分析。

引用次数: 0

Aligning Judgment Using Task Context and Explanations to Improve Human-Recommender System Performance 利用任务背景和解释调整判断，提高人类推荐系统的性能

arXiv - CS - Human-Computer Interaction

Pub Date : 2024-09-16 DOI: arxiv-2409.10717

Divya Srivastava, Karen M. Feigh

Recommender systems, while a powerful decision making tool, are oftenoperationalized as black box models, such that their AI algorithms are notaccessible or interpretable by human operators. This in turn can causeconfusion and frustration for the operator and result in unsatisfactoryoutcomes. While the field of explainable AI has made remarkable strides inaddressing this challenge by focusing on interpreting and explaining thealgorithms to human operators, there are remaining gaps in the human'sunderstanding of the recommender system. This paper investigates the relativeimpact of using context, properties of the decision making task andenvironment, to align human and AI algorithm understanding of the state of theworld, i.e. judgment, to improve joint human-recommender performance ascompared to utilizing post-hoc algorithmic explanations. We conducted anempirical, between-subjects experiment in which participants were asked to workwith an automated recommender system to complete a decision making task. Wemanipulated the method of transparency (shared contextual information tosupport shared judgment vs algorithmic explanations) and record the human'sunderstanding of the task, the recommender system, and their overallperformance. We found that both techniques yielded equivalent agreement onfinal decisions. However, those who saw task context had less tendency toover-rely on the recommender system and were able to better pinpoint in whatconditions the AI erred. Both methods improved participants' confidence intheir own decision making, and increased mental demand equally and frustrationnegligibly. These results present an alternative approach to improving teamperformance to post-hoc explanations and illustrate the impact of judgment onhuman cognition in working with recommender systems.

推荐系统虽然是一种功能强大的决策工具，但在操作上往往是黑盒模型，人工智能算法无法被人类操作员使用或解释。这反过来又会给操作者带来困惑和挫败感，并导致令人不满意的结果。虽然可解释人工智能领域在应对这一挑战方面取得了显著进展，重点是向人类操作员解释和说明算法，但人类对推荐系统的理解仍然存在差距。与利用事后算法解释相比，本文研究了利用上下文、决策任务和环境的属性来协调人类和人工智能算法对世界状况的理解（即判断），从而提高人类与推荐器联合性能的相对影响。我们进行了一项主体间实证实验，要求参与者与自动推荐系统合作完成一项决策制定任务。我们改变了透明度的方法（共享上下文信息以支持共同判断与算法解释），并记录了人类对任务、推荐系统的理解以及他们的总体表现。我们发现，这两种技术在最终决策上的一致性相当。然而，看到任务背景的人不太倾向于过度依赖推荐系统，他们能够更好地指出人工智能在哪些条件下出错。这两种方法都提高了参与者对自己决策的信心，同时也同样增加了心理需求和挫折感。这些结果提供了一种替代事后解释的提高团队绩效的方法，并说明了在使用推荐系统时判断力对人类认知的影响。

{"title":"Aligning Judgment Using Task Context and Explanations to Improve Human-Recommender System Performance","authors":"Divya Srivastava, Karen M. Feigh","doi":"arxiv-2409.10717","DOIUrl":"https://doi.org/arxiv-2409.10717","url":null,"abstract":"Recommender systems, while a powerful decision making tool, are often\u0000operationalized as black box models, such that their AI algorithms are not\u0000accessible or interpretable by human operators. This in turn can cause\u0000confusion and frustration for the operator and result in unsatisfactory\u0000outcomes. While the field of explainable AI has made remarkable strides in\u0000addressing this challenge by focusing on interpreting and explaining the\u0000algorithms to human operators, there are remaining gaps in the human's\u0000understanding of the recommender system. This paper investigates the relative\u0000impact of using context, properties of the decision making task and\u0000environment, to align human and AI algorithm understanding of the state of the\u0000world, i.e. judgment, to improve joint human-recommender performance as\u0000compared to utilizing post-hoc algorithmic explanations. We conducted an\u0000empirical, between-subjects experiment in which participants were asked to work\u0000with an automated recommender system to complete a decision making task. We\u0000manipulated the method of transparency (shared contextual information to\u0000support shared judgment vs algorithmic explanations) and record the human's\u0000understanding of the task, the recommender system, and their overall\u0000performance. We found that both techniques yielded equivalent agreement on\u0000final decisions. However, those who saw task context had less tendency to\u0000over-rely on the recommender system and were able to better pinpoint in what\u0000conditions the AI erred. Both methods improved participants' confidence in\u0000their own decision making, and increased mental demand equally and frustration\u0000negligibly. These results present an alternative approach to improving team\u0000performance to post-hoc explanations and illustrate the impact of judgment on\u0000human cognition in working with recommender systems.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"65 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

UADAPy: An Uncertainty-Aware Visualization and Analysis Toolbox UADAPy：不确定性感知可视化和分析工具箱

arXiv - CS - Human-Computer Interaction

Pub Date : 2024-09-16 DOI: arxiv-2409.10217

Patrick Paetzold, David Hägele, Marina Evers, Daniel Weiskopf, Oliver Deussen

Current research provides methods to communicate uncertainty and adaptsclassical algorithms of the visualization pipeline to take the uncertainty intoaccount. Various existing visualization frameworks include methods to presentuncertain data but do not offer transformation techniques tailored to uncertaindata. Therefore, we propose a software package for uncertainty-aware dataanalysis in Python (UADAPy) offering methods for uncertain data along thevisualization pipeline. We aim to provide a platform that is the foundation forfurther integration of uncertainty algorithms and visualizations. It providescommon utility functionality to support research in uncertainty-awarevisualization algorithms and makes state-of-the-art research results accessibleto the end user. The project is available athttps://github.com/UniStuttgart-VISUS/uadapy.

目前的研究提供了传达不确定性的方法，并调整了可视化管道的经典算法，以将不确定性考虑在内。现有的各种可视化框架都包含呈现不确定数据的方法，但没有提供针对不确定数据的转换技术。因此，我们提出了一个 Python 不确定性感知数据分析软件包（UADAPy），为可视化流程中的不确定性数据提供方法。我们的目标是提供一个平台，作为进一步整合不确定性算法和可视化的基础。它提供了通用的实用功能，以支持不确定性感知可视化算法的研究，并使终端用户能够访问最先进的研究成果。该项目的网址是：https://github.com/UniStuttgart-VISUS/uadapy。

引用次数: 0

Precise Tool to Target Positioning Widgets (TOTTA) in Spatial Environments: A Systematic Review 空间环境中目标定位小部件的精确工具 (TOTTA)：系统回顾

arXiv - CS - Human-Computer Interaction

Pub Date : 2024-09-16 DOI: arxiv-2409.10239

Mine Dastan, Michele Fiorentino, Antonio E. Uva

TOTTA outlines the spatial position and rotation guidance of a real/virtualtool (TO) towards a real/virtual target (TA), which is a key task in MixedReality applications. The task error can have critical consequences regardingsafety, performance, and quality, such as in surgical implantology orindustrial maintenance scenarios. The TOTTA problem lacks a dedicated study andis scattered across different domains with isolated designs. This workcontributes to a systematic review of the TOTTA visual widgets, studying 70unique designs from 24 papers. TOTTA is commonly guided by visual overlap anintuitive, pre-attentive 'collimation' feedback of simple-shaped widgets: Box,3D Axes, 3D Model, 2D Crosshair, Globe, Tetrahedron, Line, and Plane. Ourresearch discovers that TO and TA are often represented with the same shape.They are distinguished by topological elements (e.g., edges, vertices, faces),colors, transparency levels, and added shapes, widget quantity, and size.Meanwhile, some designs provide continuous 'during manipulation feedback'relative to the distance between TO and TA by text, dynamic color,sonification, and amplified graphical visualization. Some approaches triggerdiscrete 'TA reached feedback,' such as color alteration, added sound, TA shapechange, and added text. We found a lack of golden standards, including intesting procedures, as current ones are limited to partial sets with differentand incomparable setups (different target configurations, avatar, background,etc.). We also found a bias in participants: right-handed, young male,non-color impaired.

TOTTA概述了真实/虚拟工具（TO）对真实/虚拟目标（TA）的空间位置和旋转引导，这是混合现实应用中的一项关键任务。任务错误会对安全、性能和质量造成严重后果，例如在外科植入手术或工业维护场景中。TOTTA 问题缺乏专门的研究，而且分散在不同的领域，设计孤立。本研究对 TOTTA 视觉小部件进行了系统回顾，研究了 24 篇论文中的 70 个独特设计。TOTTA通常由简单形状小部件的视觉重叠、直觉、预注意力 "准直 "反馈所引导：这些小部件包括：方框、三维轴线、三维模型、二维十字线、地球仪、四面体、线条和平面。我们的研究发现，"TO "和 "TA "通常用相同的形状来表示，它们通过拓扑元素（如边、顶点、面）、颜色、透明度级别以及添加的形状、小部件数量和大小来区分。同时，一些设计通过文字、动态颜色、声音和放大的图形可视化等方式，在 "TO "和 "TA "之间的距离上提供连续的 "操作过程反馈"。有些方法会触发离散的 "TA 到达反馈"，如颜色改变、声音增加、TA 形状改变和文字增加。我们发现缺乏黄金标准，包括令人感兴趣的程序，因为目前的标准仅限于具有不同且不可比较的设置（不同的目标配置、头像、背景等）的部分集合。我们还发现了参与者的偏见：右撇子、年轻男性、非色觉障碍者。

{"title":"Precise Tool to Target Positioning Widgets (TOTTA) in Spatial Environments: A Systematic Review","authors":"Mine Dastan, Michele Fiorentino, Antonio E. Uva","doi":"arxiv-2409.10239","DOIUrl":"https://doi.org/arxiv-2409.10239","url":null,"abstract":"TOTTA outlines the spatial position and rotation guidance of a real/virtual\u0000tool (TO) towards a real/virtual target (TA), which is a key task in Mixed\u0000Reality applications. The task error can have critical consequences regarding\u0000safety, performance, and quality, such as in surgical implantology or\u0000industrial maintenance scenarios. The TOTTA problem lacks a dedicated study and\u0000is scattered across different domains with isolated designs. This work\u0000contributes to a systematic review of the TOTTA visual widgets, studying 70\u0000unique designs from 24 papers. TOTTA is commonly guided by visual overlap an\u0000intuitive, pre-attentive 'collimation' feedback of simple-shaped widgets: Box,\u00003D Axes, 3D Model, 2D Crosshair, Globe, Tetrahedron, Line, and Plane. Our\u0000research discovers that TO and TA are often represented with the same shape.\u0000They are distinguished by topological elements (e.g., edges, vertices, faces),\u0000colors, transparency levels, and added shapes, widget quantity, and size.\u0000Meanwhile, some designs provide continuous 'during manipulation feedback'\u0000relative to the distance between TO and TA by text, dynamic color,\u0000sonification, and amplified graphical visualization. Some approaches trigger\u0000discrete 'TA reached feedback,' such as color alteration, added sound, TA shape\u0000change, and added text. We found a lack of golden standards, including in\u0000testing procedures, as current ones are limited to partial sets with different\u0000and incomparable setups (different target configurations, avatar, background,\u0000etc.). We also found a bias in participants: right-handed, young male,\u0000non-color impaired.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

KoroT-3E: A Personalized Musical Mnemonics Tool for Enhancing Memory Retention of Complex Computer Science Concepts KoroT-3E：增强复杂计算机科学概念记忆的个性化音乐记忆工具

arXiv - CS - Human-Computer Interaction

Pub Date : 2024-09-16 DOI: arxiv-2409.10446

Xiangzhe Yuan, Jiajun Wang, Siying Hu, Andrew Cheung, Zhicong Lu

As the demand for computer science (CS) skills grows, mastering foundationalconcepts is crucial yet challenging for novice learners. To address thischallenge, we present KoroT-3E, an AI-based system that creates personalizedmusical mnemonics to enhance both memory retention and understanding ofconcepts in CS. KoroT-3E enables users to transform complex concepts intomemorable lyrics and compose melodies that suit their musical preferences. Weconducted semi-structured interviews (n=12) to investigate why novice learnersfind it challenging to memorize and understand CS concepts. The findings,combined with constructivist learning theory, established our initial design,which was then refined following consultations with CS education experts. Anempirical experiment(n=36) showed that those using KoroT-3E (n=18)significantly outperformed the control group (n=18), with improved memoryefficiency, increased motivation, and a positive learning experience. Thesefindings demonstrate the effectiveness of integrating multimodal generative AIinto CS education to create personalized and interactive learning experiences.

随着对计算机科学（CS）技能需求的增长，掌握基础概念对于初学者来说至关重要，但也极具挑战性。为了应对这一挑战，我们推出了 KoroT-3E，这是一个基于人工智能的系统，可以创建个性化的音乐记忆法，从而增强对计算机科学概念的记忆和理解。KoroT-3E 使用户能够将复杂的概念转化为可记忆的歌词，并根据自己的音乐喜好创作旋律。我们进行了半结构式访谈（n=12），以调查为什么新手学习者发现记忆和理解 CS 概念具有挑战性。调查结果与建构主义学习理论相结合，确定了我们的初步设计，并在咨询 CS 教育专家后对其进行了改进。实证实验（36 人）显示，使用 KoroT-3E 的学习者（18 人）的成绩明显优于对照组（18 人），他们的记忆效率得到提高，学习动力增强，并获得了积极的学习体验。这些发现证明了将多模态生成式人工智能融入计算机科学教育以创造个性化和交互式学习体验的有效性。

{"title":"KoroT-3E: A Personalized Musical Mnemonics Tool for Enhancing Memory Retention of Complex Computer Science Concepts","authors":"Xiangzhe Yuan, Jiajun Wang, Siying Hu, Andrew Cheung, Zhicong Lu","doi":"arxiv-2409.10446","DOIUrl":"https://doi.org/arxiv-2409.10446","url":null,"abstract":"As the demand for computer science (CS) skills grows, mastering foundational\u0000concepts is crucial yet challenging for novice learners. To address this\u0000challenge, we present KoroT-3E, an AI-based system that creates personalized\u0000musical mnemonics to enhance both memory retention and understanding of\u0000concepts in CS. KoroT-3E enables users to transform complex concepts into\u0000memorable lyrics and compose melodies that suit their musical preferences. We\u0000conducted semi-structured interviews (n=12) to investigate why novice learners\u0000find it challenging to memorize and understand CS concepts. The findings,\u0000combined with constructivist learning theory, established our initial design,\u0000which was then refined following consultations with CS education experts. An\u0000empirical experiment(n=36) showed that those using KoroT-3E (n=18)\u0000significantly outperformed the control group (n=18), with improved memory\u0000efficiency, increased motivation, and a positive learning experience. These\u0000findings demonstrate the effectiveness of integrating multimodal generative AI\u0000into CS education to create personalized and interactive learning experiences.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot 从大规模部署 LLM 驱动的专家在线医疗聊天机器人中汲取经验

arXiv - CS - Human-Computer Interaction

Pub Date : 2024-09-16 DOI: arxiv-2409.10354

Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain

Large Language Models (LLMs) are widely used in healthcare, but limitationslike hallucinations, incomplete information, and bias hinder their reliability.To address these, researchers released the Build Your Own expert Bot (BYOeB)platform, enabling developers to create LLM-powered chatbots with integratedexpert verification. CataractBot, its first implementation, providesexpert-verified responses to cataract surgery questions. A pilot evaluationshowed its potential; however the study had a small sample size and wasprimarily qualitative. In this work, we conducted a large-scale 24-weekdeployment of CataractBot involving 318 patients and attendants who sent 1,992messages, with 91.71% of responses verified by seven experts. Analysis ofinteraction logs revealed that medical questions significantly outnumberedlogistical ones, hallucinations were negligible, and experts rated 84.52% ofmedical answers as accurate. As the knowledge base expanded with expertcorrections, system performance improved by 19.02%, reducing expert workload.These insights guide the design of future LLM-powered chatbots.

大语言模型（LLM）被广泛应用于医疗保健领域，但幻觉、信息不完整和偏见等局限性阻碍了其可靠性。为了解决这些问题，研究人员发布了 "打造你自己的专家机器人"（BYOeB）平台，使开发人员能够创建由 LLM 驱动的聊天机器人，并集成专家验证功能。白内障机器人（CataractBot）是该平台的首款应用，它为白内障手术问题提供了经过专家验证的回复。一项试点评估显示了它的潜力，但这项研究的样本量很小，而且主要是定性研究。在这项工作中，我们对 CataractBot 进行了为期 24 周的大规模部署，共有 318 名患者和护理人员参与，他们发送了 1,992 条信息，其中 91.71% 的回复经过了七位专家的验证。对交互日志的分析表明，医疗问题明显多于逻辑问题，幻觉几乎可以忽略不计，专家们认为 84.52% 的医疗回答是准确的。随着知识库在专家纠正下不断扩大，系统性能提高了 19.02%，减少了专家的工作量。

{"title":"Learnings from a Large-Scale Deployment of an LLM-Powered Expert-in-the-Loop Healthcare Chatbot","authors":"Bhuvan Sachdeva, Pragnya Ramjee, Geeta Fulari, Kaushik Murali, Mohit Jain","doi":"arxiv-2409.10354","DOIUrl":"https://doi.org/arxiv-2409.10354","url":null,"abstract":"Large Language Models (LLMs) are widely used in healthcare, but limitations\u0000like hallucinations, incomplete information, and bias hinder their reliability.\u0000To address these, researchers released the Build Your Own expert Bot (BYOeB)\u0000platform, enabling developers to create LLM-powered chatbots with integrated\u0000expert verification. CataractBot, its first implementation, provides\u0000expert-verified responses to cataract surgery questions. A pilot evaluation\u0000showed its potential; however the study had a small sample size and was\u0000primarily qualitative. In this work, we conducted a large-scale 24-week\u0000deployment of CataractBot involving 318 patients and attendants who sent 1,992\u0000messages, with 91.71% of responses verified by seven experts. Analysis of\u0000interaction logs revealed that medical questions significantly outnumbered\u0000logistical ones, hallucinations were negligible, and experts rated 84.52% of\u0000medical answers as accurate. As the knowledge base expanded with expert\u0000corrections, system performance improved by 19.02%, reducing expert workload.\u0000These insights guide the design of future LLM-powered chatbots.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

"The Data Says Otherwise"-Towards Automated Fact-checking and Communication of Data Claims "数据并非如此"--实现自动事实核查和数据声明交流

arXiv - CS - Human-Computer Interaction

Pub Date : 2024-09-16 DOI: arxiv-2409.10713

Yu Fu, Shunan Guo, Jane Hoffswell, Victor S. Bursztyn, Ryan Rossi, John Stasko

Fact-checking data claims requires data evidence retrieval and analysis,which can become tedious and intractable when done manually. This work presentsAletheia, an automated fact-checking prototype designed to facilitate dataclaims verification and enhance data evidence communication. For verification,we utilize a pre-trained LLM to parse the semantics for evidence retrieval. Toeffectively communicate the data evidence, we design representations in twoforms: data tables and visualizations, tailored to various data fact types.Additionally, we design interactions that showcase a real-world application ofthese techniques. We evaluate the performance of two core NLP tasks with acurated dataset comprising 400 data claims and compare the two representationforms regarding viewers' assessment time, confidence, and preference via a userstudy with 20 participants. The evaluation offers insights into the feasibilityand bottlenecks of using LLMs for data fact-checking tasks, potentialadvantages and disadvantages of using visualizations over data tables, anddesign recommendations for presenting data evidence.

对数据索赔进行事实核查需要进行数据证据检索和分析，而人工操作可能会变得乏味和棘手。这项工作提出了一个自动事实核查原型--Aletheia，旨在促进数据索赔核查并加强数据证据交流。在验证方面，我们利用预先训练好的 LLM 来解析语义，以便进行证据检索。为了有效地交流数据证据，我们设计了两种形式的表示方法：数据表和可视化，适合各种数据事实类型。我们通过一项有 20 名参与者参加的用户研究，评估了两个核心 NLP 任务的性能，并比较了这两种表示形式对查看者的评估时间、信心和偏好的影响。评估深入揭示了在数据事实检查任务中使用 LLM 的可行性和瓶颈、使用可视化而非数据表格的潜在优势和劣势，以及展示数据证据的设计建议。

{"title":"\"The Data Says Otherwise\"-Towards Automated Fact-checking and Communication of Data Claims","authors":"Yu Fu, Shunan Guo, Jane Hoffswell, Victor S. Bursztyn, Ryan Rossi, John Stasko","doi":"arxiv-2409.10713","DOIUrl":"https://doi.org/arxiv-2409.10713","url":null,"abstract":"Fact-checking data claims requires data evidence retrieval and analysis,\u0000which can become tedious and intractable when done manually. This work presents\u0000Aletheia, an automated fact-checking prototype designed to facilitate data\u0000claims verification and enhance data evidence communication. For verification,\u0000we utilize a pre-trained LLM to parse the semantics for evidence retrieval. To\u0000effectively communicate the data evidence, we design representations in two\u0000forms: data tables and visualizations, tailored to various data fact types.\u0000Additionally, we design interactions that showcase a real-world application of\u0000these techniques. We evaluate the performance of two core NLP tasks with a\u0000curated dataset comprising 400 data claims and compare the two representation\u0000forms regarding viewers' assessment time, confidence, and preference via a user\u0000study with 20 participants. The evaluation offers insights into the feasibility\u0000and bottlenecks of using LLMs for data fact-checking tasks, potential\u0000advantages and disadvantages of using visualizations over data tables, and\u0000design recommendations for presenting data evidence.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MindGuard: Towards Accessible and Sitgma-free Mental Health First Aid via Edge LLM MindGuard：通过 Edge LLM 实现无障碍、无情景的心理健康急救

arXiv - CS - Human-Computer Interaction

Pub Date : 2024-09-16 DOI: arxiv-2409.10064

Sijie Ji, Xinzhe Zheng, Jiawei Sun, Renqi Chen, Wei Gao, Mani Srivastava

Mental health disorders are among the most prevalent diseases worldwide,affecting nearly one in four people. Despite their widespread impact, theintervention rate remains below 25%, largely due to the significant cooperationrequired from patients for both diagnosis and intervention. The core issuebehind this low treatment rate is stigma, which discourages over half of thoseaffected from seeking help. This paper presents MindGuard, an accessible,stigma-free, and professional mobile mental healthcare system designed toprovide mental health first aid. The heart of MindGuard is an innovative edgeLLM, equipped with professional mental health knowledge, that seamlesslyintegrates objective mobile sensor data with subjective Ecological MomentaryAssessment records to deliver personalized screening and interventionconversations. We conduct a broad evaluation of MindGuard using open datasetsspanning four years and real-world deployment across various mobile devicesinvolving 20 subjects for two weeks. Remarkably, MindGuard achieves resultscomparable to GPT-4 and outperforms its counterpart with more than 10 times themodel size. We believe that MindGuard paves the way for mobile LLMapplications, potentially revolutionizing mental healthcare practices bysubstituting self-reporting and intervention conversations with passive,integrated monitoring within daily life, thus ensuring accessible andstigma-free mental health support.

精神疾病是全球最普遍的疾病之一，几乎每四个人中就有一人受到影响。尽管其影响广泛，但干预率仍低于 25%，这主要是由于诊断和干预都需要患者的大力配合。造成这种低治疗率的核心问题是耻辱感，它阻碍了一半以上的患者寻求帮助。本文介绍了 "心灵卫士"（MindGuard）--一个方便、无污名、专业的移动心理保健系统，旨在提供心理健康急救。MindGuard 的核心是一个创新的边缘LLM，它配备了专业的心理健康知识，能将客观的移动传感器数据与主观的生态瞬间评估记录无缝整合，提供个性化的筛查和干预对话。我们利用开放数据集对 MindGuard 进行了广泛的评估，评估时间跨度长达四年，并在各种移动设备上进行了实际部署，涉及 20 名受试者，为期两周。值得注意的是，MindGuard 取得了与 GPT-4 不相上下的结果，并且在模型规模超过 GPT-4 10 倍的情况下，MindGuard 的表现也优于 GPT-4。我们相信，MindGuard 为移动 LLM 应用铺平了道路，通过在日常生活中以被动的综合监测取代自我报告和干预对话，从而确保提供无障碍、无污名化的心理健康支持，MindGuard 有可能彻底改变心理保健实践。

{"title":"MindGuard: Towards Accessible and Sitgma-free Mental Health First Aid via Edge LLM","authors":"Sijie Ji, Xinzhe Zheng, Jiawei Sun, Renqi Chen, Wei Gao, Mani Srivastava","doi":"arxiv-2409.10064","DOIUrl":"https://doi.org/arxiv-2409.10064","url":null,"abstract":"Mental health disorders are among the most prevalent diseases worldwide,\u0000affecting nearly one in four people. Despite their widespread impact, the\u0000intervention rate remains below 25%, largely due to the significant cooperation\u0000required from patients for both diagnosis and intervention. The core issue\u0000behind this low treatment rate is stigma, which discourages over half of those\u0000affected from seeking help. This paper presents MindGuard, an accessible,\u0000stigma-free, and professional mobile mental healthcare system designed to\u0000provide mental health first aid. The heart of MindGuard is an innovative edge\u0000LLM, equipped with professional mental health knowledge, that seamlessly\u0000integrates objective mobile sensor data with subjective Ecological Momentary\u0000Assessment records to deliver personalized screening and intervention\u0000conversations. We conduct a broad evaluation of MindGuard using open datasets\u0000spanning four years and real-world deployment across various mobile devices\u0000involving 20 subjects for two weeks. Remarkably, MindGuard achieves results\u0000comparable to GPT-4 and outperforms its counterpart with more than 10 times the\u0000model size. We believe that MindGuard paves the way for mobile LLM\u0000applications, potentially revolutionizing mental healthcare practices by\u0000substituting self-reporting and intervention conversations with passive,\u0000integrated monitoring within daily life, thus ensuring accessible and\u0000stigma-free mental health support.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficiently Crowdsourcing Visual Importance with Punch-Hole Annotation 利用打孔注释高效地众包视觉重要性

arXiv - CS - Human-Computer Interaction

Pub Date : 2024-09-16 DOI: arxiv-2409.10459

Minsuk Chang, Soohyun Lee, Aeri Cho, Hyeon Jeon, Seokhyeon Park, Cindy Xiong Bearfield, Jinwook Seo

We introduce a novel crowdsourcing method for identifying important areas ingraphical images through punch-hole labeling. Traditional methods, such as gazetrackers and mouse-based annotations, which generate continuous data, can beimpractical in crowdsourcing scenarios. They require many participants, and theoutcome data can be noisy. In contrast, our method first segments the graphicalimage with a grid and drops a portion of the patches (punch holes). Then, weiteratively ask the labeler to validate each annotation with holes, narrowingdown the annotation only having the most important area. This approach aims toreduce annotation noise in crowdsourcing by standardizing the annotations whileenhancing labeling efficiency and reliability. Preliminary findings fromfundamental charts demonstrate that punch-hole labeling can effectivelypinpoint critical regions. This also highlights its potential for broaderapplication in visualization research, particularly in studying large-scaleusers' graphical perception. Our future work aims to enhance the algorithm toachieve faster labeling speed and prove its utility through large-scaleexperiments.

我们介绍了一种新颖的众包方法，通过打孔标注来识别图形图像中的重要区域。传统方法（如地名追踪器和基于鼠标的注释）会生成连续数据，但在众包场景中并不实用。这些方法需要许多参与者，而且结果数据可能存在噪声。相比之下，我们的方法首先用网格分割图形图像，并丢弃部分补丁（打孔）。然后，我们会不断要求标注者验证每个带孔的注释，缩小注释范围，只保留最重要的区域。这种方法旨在通过标准化注释来减少众包中的注释噪音，同时提高标注效率和可靠性。基础图表的初步研究结果表明，打孔标注能有效地指出关键区域。这也凸显了其在可视化研究中的广泛应用潜力，尤其是在研究大规模用户的图形感知方面。我们未来的工作目标是改进算法，实现更快的标注速度，并通过大规模实验证明其实用性。

{"title":"Efficiently Crowdsourcing Visual Importance with Punch-Hole Annotation","authors":"Minsuk Chang, Soohyun Lee, Aeri Cho, Hyeon Jeon, Seokhyeon Park, Cindy Xiong Bearfield, Jinwook Seo","doi":"arxiv-2409.10459","DOIUrl":"https://doi.org/arxiv-2409.10459","url":null,"abstract":"We introduce a novel crowdsourcing method for identifying important areas in\u0000graphical images through punch-hole labeling. Traditional methods, such as gaze\u0000trackers and mouse-based annotations, which generate continuous data, can be\u0000impractical in crowdsourcing scenarios. They require many participants, and the\u0000outcome data can be noisy. In contrast, our method first segments the graphical\u0000image with a grid and drops a portion of the patches (punch holes). Then, we\u0000iteratively ask the labeler to validate each annotation with holes, narrowing\u0000down the annotation only having the most important area. This approach aims to\u0000reduce annotation noise in crowdsourcing by standardizing the annotations while\u0000enhancing labeling efficiency and reliability. Preliminary findings from\u0000fundamental charts demonstrate that punch-hole labeling can effectively\u0000pinpoint critical regions. This also highlights its potential for broader\u0000application in visualization research, particularly in studying large-scale\u0000users' graphical perception. Our future work aims to enhance the algorithm to\u0000achieve faster labeling speed and prove its utility through large-scale\u0000experiments.","PeriodicalId":501541,"journal":{"name":"arXiv - CS - Human-Computer Interaction","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142252434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0