From Classification to Clinical Insights

IF 3.6 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Pub Date : 2024-05-13 DOI:10.1145/3659604

Zachary Englhardt, Chengqian Ma, Margaret E. Morris, Chun-Cheng Chang, Xuhai "Orson" Xu, Lianhui Qin, Daniel McDuff, Xin Liu, Shwetak Patel, Vikram Iyer

{"title":"From Classification to Clinical Insights","authors":"Zachary Englhardt, Chengqian Ma, Margaret E. Morris, Chun-Cheng Chang, Xuhai \"Orson\" Xu, Lianhui Qin, Daniel McDuff, Xin Liu, Shwetak Patel, Vikram Iyer","doi":"10.1145/3659604","DOIUrl":null,"url":null,"abstract":"Passively collected behavioral health data from ubiquitous sensors could provide mental health professionals valuable insights into patient's daily lives, but such efforts are impeded by disparate metrics, lack of interoperability, and unclear correlations between the measured signals and an individual's mental health. To address these challenges, we pioneer the exploration of large language models (LLMs) to synthesize clinically relevant insights from multi-sensor data. We develop chain-of-thought prompting methods to generate LLM reasoning on how data pertaining to activity, sleep and social interaction relate to conditions such as depression and anxiety. We then prompt the LLM to perform binary classification, achieving accuracies of 61.1%, exceeding the state of the art. We find models like GPT-4 correctly reference numerical data 75% of the time.\n While we began our investigation by developing methods to use LLMs to output binary classifications for conditions like depression, we find instead that their greatest potential value to clinicians lies not in diagnostic classification, but rather in rigorous analysis of diverse self-tracking data to generate natural language summaries that synthesize multiple data streams and identify potential concerns. Clinicians envisioned using these insights in a variety of ways, principally for fostering collaborative investigation with patients to strengthen the therapeutic alliance and guide treatment. We describe this collaborative engagement, additional envisioned uses, and associated concerns that must be addressed before adoption in real-world contexts.","PeriodicalId":20553,"journal":{"name":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","volume":null,"pages":null},"PeriodicalIF":3.6000,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3659604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Passively collected behavioral health data from ubiquitous sensors could provide mental health professionals valuable insights into patient's daily lives, but such efforts are impeded by disparate metrics, lack of interoperability, and unclear correlations between the measured signals and an individual's mental health. To address these challenges, we pioneer the exploration of large language models (LLMs) to synthesize clinically relevant insights from multi-sensor data. We develop chain-of-thought prompting methods to generate LLM reasoning on how data pertaining to activity, sleep and social interaction relate to conditions such as depression and anxiety. We then prompt the LLM to perform binary classification, achieving accuracies of 61.1%, exceeding the state of the art. We find models like GPT-4 correctly reference numerical data 75% of the time. While we began our investigation by developing methods to use LLMs to output binary classifications for conditions like depression, we find instead that their greatest potential value to clinicians lies not in diagnostic classification, but rather in rigorous analysis of diverse self-tracking data to generate natural language summaries that synthesize multiple data streams and identify potential concerns. Clinicians envisioned using these insights in a variety of ways, principally for fostering collaborative investigation with patients to strengthen the therapeutic alliance and guide treatment. We describe this collaborative engagement, additional envisioned uses, and associated concerns that must be addressed before adoption in real-world contexts.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

从分类到临床启示

从无处不在的传感器中被动收集到的行为健康数据可为心理健康专业人员提供有关病人日常生活的宝贵见解，但由于衡量标准不同、缺乏互操作性以及测量信号与个人心理健康之间的相关性不明确，这些工作受到了阻碍。为了应对这些挑战，我们率先探索了大型语言模型（LLM），以便从多传感器数据中综合出与临床相关的见解。我们开发了思维链提示方法，以生成 LLM 推理，说明有关活动、睡眠和社交互动的数据如何与抑郁和焦虑等症状相关。然后，我们促使 LLM 执行二元分类，准确率达到 61.1%，超过了目前的技术水平。我们发现，GPT-4 等模型在 75% 的情况下都能正确引用数字数据。虽然我们一开始是通过开发方法来使用 LLMs 对抑郁症等疾病进行二元分类，但我们发现，LLMs 对临床医生的最大潜在价值不在于诊断分类，而在于对各种自我跟踪数据进行严格分析，以生成自然语言摘要，综合多个数据流并识别潜在问题。临床医生设想以多种方式利用这些见解，主要用于促进与患者的合作调查，以加强治疗联盟并指导治疗。我们将介绍这种协作参与、其他设想用途以及在实际应用前必须解决的相关问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊