首页 > 最新文献

JMIR AI最新文献

英文 中文
How Explainable Artificial Intelligence Can Increase or Decrease Clinicians' Trust in AI Applications in Health Care: Systematic Review. 可解释的人工智能如何增加或减少临床医生对医疗领域人工智能应用的信任?系统回顾。
Pub Date : 2024-10-30 DOI: 10.2196/53207
Rikard Rosenbacke, Åsa Melhus, Martin McKee, David Stuckler
<p><strong>Background: </strong>Artificial intelligence (AI) has significant potential in clinical practice. However, its "black box" nature can lead clinicians to question its value. The challenge is to create sufficient trust for clinicians to feel comfortable using AI, but not so much that they defer to it even when it produces results that conflict with their clinical judgment in ways that lead to incorrect decisions. Explainable AI (XAI) aims to address this by providing explanations of how AI algorithms reach their conclusions. However, it remains unclear whether such explanations foster an appropriate degree of trust to ensure the optimal use of AI in clinical practice.</p><p><strong>Objective: </strong>This study aims to systematically review and synthesize empirical evidence on the impact of XAI on clinicians' trust in AI-driven clinical decision-making.</p><p><strong>Methods: </strong>A systematic review was conducted in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, searching PubMed and Web of Science databases. Studies were included if they empirically measured the impact of XAI on clinicians' trust using cognition- or affect-based measures. Out of 778 articles screened, 10 met the inclusion criteria. We assessed the risk of bias using standard tools appropriate to the methodology of each paper.</p><p><strong>Results: </strong>The risk of bias in all papers was moderate or moderate to high. All included studies operationalized trust primarily through cognitive-based definitions, with 2 also incorporating affect-based measures. Out of these, 5 studies reported that XAI increased clinicians' trust compared with standard AI, particularly when the explanations were clear, concise, and relevant to clinical practice. In addition, 3 studies found no significant effect of XAI on trust, and the presence of explanations does not automatically improve trust. Notably, 2 studies highlighted that XAI could either enhance or diminish trust, depending on the complexity and coherence of the provided explanations. The majority of studies suggest that XAI has the potential to enhance clinicians' trust in recommendations generated by AI. However, complex or contradictory explanations can undermine this trust. More critically, trust in AI is not inherently beneficial, as AI recommendations are not infallible. These findings underscore the nuanced role of explanation quality and suggest that trust can be modulated through the careful design of XAI systems.</p><p><strong>Conclusions: </strong>Excessive trust in incorrect advice generated by AI can adversely impact clinical accuracy, just as can happen when correct advice is distrusted. Future research should focus on refining both cognitive and affect-based measures of trust and on developing strategies to achieve an appropriate balance in terms of trust, preventing both blind trust and undue skepticism. Optimizing trust in AI systems is essential for
背景:人工智能(AI)在临床实践中具有巨大潜力。然而,人工智能的 "黑箱 "特性会让临床医生质疑其价值。我们面临的挑战是如何建立足够的信任,让临床医生能够放心使用人工智能,但又不能过度依赖人工智能,即使人工智能得出的结果与他们的临床判断相冲突,从而导致错误的决策。可解释的人工智能(XAI)旨在通过解释人工智能算法如何得出结论来解决这一问题。然而,这种解释是否能促进适当程度的信任,以确保在临床实践中优化使用人工智能,目前仍不清楚:本研究旨在系统回顾和综合 XAI 对临床医生信任人工智能驱动的临床决策的影响的实证证据:方法:根据PRISMA(系统综述和Meta分析的首选报告项目)指南,搜索PubMed和Web of Science数据库,进行系统综述。如果研究使用基于认知或情感的测量方法实证测量了 XAI 对临床医生信任度的影响,则被纳入研究。在筛选出的 778 篇文章中,有 10 篇符合纳入标准。我们使用适合每篇论文方法的标准工具对偏倚风险进行了评估:所有论文的偏倚风险均为中度或中度至高度。所有纳入的研究都主要通过基于认知的定义对信任进行操作,其中两篇研究还采用了基于情感的测量方法。其中,5 项研究报告称,与标准人工智能相比,XAI 增加了临床医生的信任度,尤其是在解释清晰、简明且与临床实践相关的情况下。此外,3 项研究发现 XAI 对信任度没有显著影响,而且解释的存在并不会自动提高信任度。值得注意的是,有 2 项研究强调,根据所提供解释的复杂性和连贯性,XAI 可以增强或削弱信任。大多数研究表明,XAI 有可能提高临床医生对人工智能生成的建议的信任度。然而,复杂或自相矛盾的解释可能会破坏这种信任。更关键的是,对人工智能的信任并非天生有益,因为人工智能的建议并非无懈可击。这些发现强调了解释质量的微妙作用,并表明可以通过精心设计 XAI 系统来调节信任度:结论:过度信任人工智能生成的错误建议会对临床准确性产生不利影响,正如不信任正确建议一样。未来的研究应侧重于完善基于认知和情感的信任测量方法,并制定策略以实现信任方面的适当平衡,防止盲目信任和过度怀疑。优化对人工智能系统的信任对其有效融入临床实践至关重要。
{"title":"How Explainable Artificial Intelligence Can Increase or Decrease Clinicians' Trust in AI Applications in Health Care: Systematic Review.","authors":"Rikard Rosenbacke, Åsa Melhus, Martin McKee, David Stuckler","doi":"10.2196/53207","DOIUrl":"10.2196/53207","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Artificial intelligence (AI) has significant potential in clinical practice. However, its \"black box\" nature can lead clinicians to question its value. The challenge is to create sufficient trust for clinicians to feel comfortable using AI, but not so much that they defer to it even when it produces results that conflict with their clinical judgment in ways that lead to incorrect decisions. Explainable AI (XAI) aims to address this by providing explanations of how AI algorithms reach their conclusions. However, it remains unclear whether such explanations foster an appropriate degree of trust to ensure the optimal use of AI in clinical practice.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This study aims to systematically review and synthesize empirical evidence on the impact of XAI on clinicians' trust in AI-driven clinical decision-making.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;A systematic review was conducted in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, searching PubMed and Web of Science databases. Studies were included if they empirically measured the impact of XAI on clinicians' trust using cognition- or affect-based measures. Out of 778 articles screened, 10 met the inclusion criteria. We assessed the risk of bias using standard tools appropriate to the methodology of each paper.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;The risk of bias in all papers was moderate or moderate to high. All included studies operationalized trust primarily through cognitive-based definitions, with 2 also incorporating affect-based measures. Out of these, 5 studies reported that XAI increased clinicians' trust compared with standard AI, particularly when the explanations were clear, concise, and relevant to clinical practice. In addition, 3 studies found no significant effect of XAI on trust, and the presence of explanations does not automatically improve trust. Notably, 2 studies highlighted that XAI could either enhance or diminish trust, depending on the complexity and coherence of the provided explanations. The majority of studies suggest that XAI has the potential to enhance clinicians' trust in recommendations generated by AI. However, complex or contradictory explanations can undermine this trust. More critically, trust in AI is not inherently beneficial, as AI recommendations are not infallible. These findings underscore the nuanced role of explanation quality and suggest that trust can be modulated through the careful design of XAI systems.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;Excessive trust in incorrect advice generated by AI can adversely impact clinical accuracy, just as can happen when correct advice is distrusted. Future research should focus on refining both cognitive and affect-based measures of trust and on developing strategies to achieve an appropriate balance in terms of trust, preventing both blind trust and undue skepticism. Optimizing trust in AI systems is essential for","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e53207"},"PeriodicalIF":0.0,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11561425/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding AI's Role in Endometriosis Patient Education and Evaluating Its Information and Accuracy: Systematic Review. 了解人工智能在子宫内膜异位症患者教育中的作用并评估其信息和准确性:系统综述。
Pub Date : 2024-10-30 DOI: 10.2196/64593
Juliana Almeida Oliveira, Karine Eskandar, Emre Kar, Flávia Ribeiro de Oliveira, Agnaldo Lopes da Silva Filho
<p><strong>Background: </strong>Endometriosis is a chronic gynecological condition that affects a significant portion of women of reproductive age, leading to debilitating symptoms such as chronic pelvic pain and infertility. Despite advancements in diagnosis and management, patient education remains a critical challenge. With the rapid growth of digital platforms, artificial intelligence (AI) has emerged as a potential tool to enhance patient education and access to information.</p><p><strong>Objective: </strong>This systematic review aims to explore the role of AI in facilitating education and improving information accessibility for individuals with endometriosis.</p><p><strong>Methods: </strong>This review followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines to ensure rigorous and transparent reporting. We conducted a comprehensive search of PubMed; Embase; the Regional Online Information System for Scientific Journals of Latin America, the Caribbean, Spain and Portugal (LATINDEX); Latin American and Caribbean Literature in Health Sciences (LILACS); Institute of Electrical and Electronics Engineers (IEEE) Xplore, and the Cochrane Central Register of Controlled Trials using the terms "endometriosis" and "artificial intelligence." Studies were selected based on their focus on AI applications in patient education or information dissemination regarding endometriosis. We included studies that evaluated AI-driven tools for assessing patient knowledge and addressed frequently asked questions related to endometriosis. Data extraction and quality assessment were conducted independently by 2 authors, with discrepancies resolved through consensus.</p><p><strong>Results: </strong>Out of 400 initial search results, 11 studies met the inclusion criteria and were fully reviewed. We ultimately included 3 studies, 1 of which was an abstract. The studies examined the use of AI models, such as ChatGPT (OpenAI), machine learning, and natural language processing, in providing educational resources and answering common questions about endometriosis. The findings indicated that AI tools, particularly large language models, offer accurate responses to frequently asked questions with varying degrees of sufficiency across different categories. AI's integration with social media platforms also highlights its potential to identify patients' needs and enhance information dissemination.</p><p><strong>Conclusions: </strong>AI holds promise in advancing patient education and information access for endometriosis, providing accurate and comprehensive answers to common queries, and facilitating a better understanding of the condition. However, challenges remain in ensuring ethical use, equitable access, and maintaining accuracy across diverse patient populations. Future research should focus on developing standardized approaches for evaluating AI's impact on patient education and exploring its integration into clinical practice to
背景:子宫内膜异位症是一种慢性妇科疾病,影响着相当一部分育龄妇女,导致慢性盆腔疼痛和不孕症等使人衰弱的症状。尽管在诊断和管理方面取得了进步,但患者教育仍是一项严峻的挑战。随着数字平台的快速发展,人工智能(AI)已成为加强患者教育和信息获取的潜在工具:本系统综述旨在探讨人工智能在促进子宫内膜异位症患者教育和提高信息可及性方面的作用:本综述遵循系统综述和荟萃分析首选报告项目(PRISMA)指南,以确保报告的严谨性和透明度。我们使用 "子宫内膜异位症 "和 "人工智能 "这两个词对 PubMed、Embase、拉丁美洲、加勒比海、西班牙和葡萄牙科学期刊区域在线信息系统(LATINDEX)、拉丁美洲和加勒比海健康科学文献(LILACS)、电气和电子工程师协会(IEEE)Xplore 以及 Cochrane 对照试验中央登记册进行了全面检索。我们根据人工智能在子宫内膜异位症患者教育或信息传播中的应用重点来选择研究。我们纳入的研究评估了用于评估患者知识的人工智能驱动工具,并解决了与子宫内膜异位症相关的常见问题。数据提取和质量评估由两位作者独立完成,不一致之处通过共识解决:在 400 项初步搜索结果中,有 11 项研究符合纳入标准,并进行了全面审查。我们最终纳入了 3 项研究,其中 1 项为摘要。这些研究考察了人工智能模型(如 ChatGPT (OpenAI))、机器学习和自然语言处理在提供教育资源和回答子宫内膜异位症常见问题方面的应用。研究结果表明,人工智能工具,尤其是大型语言模型,可以准确回答常见问题,但不同类别的问题回答的充分程度各不相同。人工智能与社交媒体平台的整合也凸显了它在确定患者需求和加强信息传播方面的潜力:结论:人工智能有望推动子宫内膜异位症的患者教育和信息获取,为常见问题提供准确而全面的答案,并促进人们更好地了解这种疾病。然而,在确保道德使用、公平获取以及在不同患者群体中保持准确性方面仍存在挑战。未来的研究应侧重于开发标准化方法,以评估人工智能对患者教育的影响,并探索将其融入临床实践,以加强对子宫内膜异位症患者的支持。
{"title":"Understanding AI's Role in Endometriosis Patient Education and Evaluating Its Information and Accuracy: Systematic Review.","authors":"Juliana Almeida Oliveira, Karine Eskandar, Emre Kar, Flávia Ribeiro de Oliveira, Agnaldo Lopes da Silva Filho","doi":"10.2196/64593","DOIUrl":"10.2196/64593","url":null,"abstract":"&lt;p&gt;&lt;strong&gt;Background: &lt;/strong&gt;Endometriosis is a chronic gynecological condition that affects a significant portion of women of reproductive age, leading to debilitating symptoms such as chronic pelvic pain and infertility. Despite advancements in diagnosis and management, patient education remains a critical challenge. With the rapid growth of digital platforms, artificial intelligence (AI) has emerged as a potential tool to enhance patient education and access to information.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Objective: &lt;/strong&gt;This systematic review aims to explore the role of AI in facilitating education and improving information accessibility for individuals with endometriosis.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Methods: &lt;/strong&gt;This review followed the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines to ensure rigorous and transparent reporting. We conducted a comprehensive search of PubMed; Embase; the Regional Online Information System for Scientific Journals of Latin America, the Caribbean, Spain and Portugal (LATINDEX); Latin American and Caribbean Literature in Health Sciences (LILACS); Institute of Electrical and Electronics Engineers (IEEE) Xplore, and the Cochrane Central Register of Controlled Trials using the terms \"endometriosis\" and \"artificial intelligence.\" Studies were selected based on their focus on AI applications in patient education or information dissemination regarding endometriosis. We included studies that evaluated AI-driven tools for assessing patient knowledge and addressed frequently asked questions related to endometriosis. Data extraction and quality assessment were conducted independently by 2 authors, with discrepancies resolved through consensus.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Results: &lt;/strong&gt;Out of 400 initial search results, 11 studies met the inclusion criteria and were fully reviewed. We ultimately included 3 studies, 1 of which was an abstract. The studies examined the use of AI models, such as ChatGPT (OpenAI), machine learning, and natural language processing, in providing educational resources and answering common questions about endometriosis. The findings indicated that AI tools, particularly large language models, offer accurate responses to frequently asked questions with varying degrees of sufficiency across different categories. AI's integration with social media platforms also highlights its potential to identify patients' needs and enhance information dissemination.&lt;/p&gt;&lt;p&gt;&lt;strong&gt;Conclusions: &lt;/strong&gt;AI holds promise in advancing patient education and information access for endometriosis, providing accurate and comprehensive answers to common queries, and facilitating a better understanding of the condition. However, challenges remain in ensuring ethical use, equitable access, and maintaining accuracy across diverse patient populations. Future research should focus on developing standardized approaches for evaluating AI's impact on patient education and exploring its integration into clinical practice to ","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e64593"},"PeriodicalIF":0.0,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11561426/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Targeting COVID-19 and Human Resources for Health News Information Extraction: Algorithm Development and Validation. 针对 COVID-19 和人力资源的健康新闻信息提取:算法开发与验证。
Pub Date : 2024-10-30 DOI: 10.2196/55059
Mathieu Ravaut, Ruochen Zhao, Duy Phung, Vicky Mengqi Qin, Dusan Milovanovic, Anita Pienkowska, Iva Bojic, Josip Car, Shafiq Joty

Background: Global pandemics like COVID-19 put a high amount of strain on health care systems and health workers worldwide. These crises generate a vast amount of news information published online across the globe. This extensive corpus of articles has the potential to provide valuable insights into the nature of ongoing events and guide interventions and policies. However, the sheer volume of information is beyond the capacity of human experts to process and analyze effectively.

Objective: The aim of this study was to explore how natural language processing (NLP) can be leveraged to build a system that allows for quick analysis of a high volume of news articles. Along with this, the objective was to create a workflow comprising human-computer symbiosis to derive valuable insights to support health workforce strategic policy dialogue, advocacy, and decision-making.

Methods: We conducted a review of open-source news coverage from January 2020 to June 2022 on COVID-19 and its impacts on the health workforce from the World Health Organization (WHO) Epidemic Intelligence from Open Sources (EIOS) by synergizing NLP models, including classification and extractive summarization, and human-generated analyses. Our DeepCovid system was trained on 2.8 million news articles in English from more than 3000 internet sources across hundreds of jurisdictions.

Results: Rules-based classification with hand-designed rules narrowed the data set to 8508 articles with high relevancy confirmed in the human-led evaluation. DeepCovid's automated information targeting component reached a very strong binary classification performance of 98.98 for the area under the receiver operating characteristic curve (ROC-AUC) and 47.21 for the area under the precision recall curve (PR-AUC). Its information extraction component attained good performance in automatic extractive summarization with a mean Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score of 47.76. DeepCovid's final summaries were used by human experts to write reports on the COVID-19 pandemic.

Conclusions: It is feasible to synergize high-performing NLP models and human-generated analyses to benefit open-source health workforce intelligence. The DeepCovid approach can contribute to an agile and timely global view, providing complementary information to scientific literature.

背景:COVID-19 等全球性流行病给世界各地的医疗保健系统和医务工作者带来了巨大压力。这些危机在全球范围内产生了大量在线发布的新闻信息。这些大量的文章有可能为了解正在发生的事件的性质提供有价值的见解,并为干预措施和政策提供指导。然而,庞大的信息量超出了人类专家有效处理和分析的能力:本研究的目的是探索如何利用自然语言处理(NLP)来建立一个系统,以便对大量新闻文章进行快速分析。与此同时,我们的目标是创建一个包含人机共生的工作流程,以获得有价值的见解,从而支持卫生工作者的战略政策对话、宣传和决策:我们对世界卫生组织(WHO)公开来源流行病情报(EIOS)中 2020 年 1 月至 2022 年 6 月期间有关 COVID-19 及其对卫生工作者影响的公开来源新闻报道进行了审查,方法是协同 NLP 模型(包括分类和提取摘要)和人工生成的分析。我们的DeepCovid系统在来自数百个辖区3000多个互联网来源的280万篇英文新闻文章上进行了训练:结果:人工设计的基于规则的分类将数据集缩小到 8508 篇文章,这些文章的高相关性得到了人工评估的确认。DeepCovid 的自动信息定位组件在二元分类方面表现出色,接收者操作特征曲线下面积(ROC-AUC)为 98.98,精确召回曲线下面积(PR-AUC)为 47.21。其信息提取组件在自动提取摘要方面表现出色,面向召回的摘要评估(ROUGE)平均得分为 47.76。DeepCovid 的最终摘要被人类专家用于撰写 COVID-19 大流行病报告:将高性能的 NLP 模型与人类生成的分析协同起来,使开源卫生劳动力情报受益是可行的。DeepCovid方法有助于敏捷、及时地了解全球情况,为科学文献提供补充信息。
{"title":"Targeting COVID-19 and Human Resources for Health News Information Extraction: Algorithm Development and Validation.","authors":"Mathieu Ravaut, Ruochen Zhao, Duy Phung, Vicky Mengqi Qin, Dusan Milovanovic, Anita Pienkowska, Iva Bojic, Josip Car, Shafiq Joty","doi":"10.2196/55059","DOIUrl":"10.2196/55059","url":null,"abstract":"<p><strong>Background: </strong>Global pandemics like COVID-19 put a high amount of strain on health care systems and health workers worldwide. These crises generate a vast amount of news information published online across the globe. This extensive corpus of articles has the potential to provide valuable insights into the nature of ongoing events and guide interventions and policies. However, the sheer volume of information is beyond the capacity of human experts to process and analyze effectively.</p><p><strong>Objective: </strong>The aim of this study was to explore how natural language processing (NLP) can be leveraged to build a system that allows for quick analysis of a high volume of news articles. Along with this, the objective was to create a workflow comprising human-computer symbiosis to derive valuable insights to support health workforce strategic policy dialogue, advocacy, and decision-making.</p><p><strong>Methods: </strong>We conducted a review of open-source news coverage from January 2020 to June 2022 on COVID-19 and its impacts on the health workforce from the World Health Organization (WHO) Epidemic Intelligence from Open Sources (EIOS) by synergizing NLP models, including classification and extractive summarization, and human-generated analyses. Our DeepCovid system was trained on 2.8 million news articles in English from more than 3000 internet sources across hundreds of jurisdictions.</p><p><strong>Results: </strong>Rules-based classification with hand-designed rules narrowed the data set to 8508 articles with high relevancy confirmed in the human-led evaluation. DeepCovid's automated information targeting component reached a very strong binary classification performance of 98.98 for the area under the receiver operating characteristic curve (ROC-AUC) and 47.21 for the area under the precision recall curve (PR-AUC). Its information extraction component attained good performance in automatic extractive summarization with a mean Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score of 47.76. DeepCovid's final summaries were used by human experts to write reports on the COVID-19 pandemic.</p><p><strong>Conclusions: </strong>It is feasible to synergize high-performing NLP models and human-generated analyses to benefit open-source health workforce intelligence. The DeepCovid approach can contribute to an agile and timely global view, providing complementary information to scientific literature.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e55059"},"PeriodicalIF":0.0,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11561429/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142549352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying Marijuana Use Behaviors Among Youth Experiencing Homelessness Using a Machine Learning-Based Framework: Development and Evaluation Study. 使用基于机器学习的框架识别无家可归青少年吸食大麻的行为:开发与评估研究。
Pub Date : 2024-10-17 DOI: 10.2196/53488
Tianjie Deng, Andrew Urbaczewski, Young Jin Lee, Anamika Barman-Adhikari, Rinku Dewri

Background: Youth experiencing homelessness face substance use problems disproportionately compared to other youth. A study found that 69% of youth experiencing homelessness meet the criteria for dependence on at least 1 substance, compared to 1.8% for all US adolescents. In addition, they experience major structural and social inequalities, which further undermine their ability to receive the care they need.

Objective: The goal of this study was to develop a machine learning-based framework that uses the social media content (posts and interactions) of youth experiencing homelessness to predict their substance use behaviors (ie, the probability of using marijuana). With this framework, social workers and care providers can identify and reach out to youth experiencing homelessness who are at a higher risk of substance use.

Methods: We recruited 133 young people experiencing homelessness at a nonprofit organization located in a city in the western United States. After obtaining their consent, we collected the participants' social media conversations for the past year before they were recruited, and we asked the participants to complete a survey on their demographic information, health conditions, sexual behaviors, and substance use behaviors. Building on the social sharing of emotions theory and social support theory, we identified important features that can potentially predict substance use. Then, we used natural language processing techniques to extract such features from social media conversations and reactions and built a series of machine learning models to predict participants' marijuana use.

Results: We evaluated our models based on their predictive performance as well as their conformity with measures of fairness. Without predictive features from survey information, which may introduce sex and racial biases, our machine learning models can reach an area under the curve of 0.72 and an accuracy of 0.81 using only social media data when predicting marijuana use. We also evaluated the false-positive rate for each sex and age segment.

Conclusions: We showed that textual interactions among youth experiencing homelessness and their friends on social media can serve as a powerful resource to predict their substance use. The framework we developed allows care providers to allocate resources efficiently to youth experiencing homelessness in the greatest need while costing minimal overhead. It can be extended to analyze and predict other health-related behaviors and conditions observed in this vulnerable community.

背景:与其他青少年相比,无家可归的青少年面临着更多的药物使用问题。一项研究发现,69% 的无家可归青少年符合至少依赖一种药物的标准,而美国所有青少年的这一比例仅为 1.8%。此外,他们还经历着严重的结构性和社会性不平等,这进一步削弱了他们获得所需护理的能力:本研究的目标是开发一个基于机器学习的框架,利用无家可归青少年的社交媒体内容(帖子和互动)来预测他们的药物使用行为(即使用大麻的概率)。有了这个框架,社会工作者和医疗服务提供者就能识别并接触到药物使用风险较高的无家可归青少年:我们在美国西部城市的一家非营利组织招募了 133 名无家可归的青少年。在征得他们的同意后,我们收集了参与者在被招募前一年在社交媒体上的对话,并要求参与者完成一份关于其人口信息、健康状况、性行为和药物使用行为的调查。在情感社会共享理论和社会支持理论的基础上,我们确定了有可能预测药物使用的重要特征。然后,我们使用自然语言处理技术从社交媒体对话和反应中提取这些特征,并建立了一系列机器学习模型来预测参与者的大麻使用情况:我们根据模型的预测性能以及是否符合公平性标准对其进行了评估。如果没有来自调查信息的预测特征(调查信息可能会带来性别和种族偏见),我们的机器学习模型在预测大麻使用情况时,仅使用社交媒体数据就能达到 0.72 的曲线下面积和 0.81 的准确率。我们还评估了每个性别和年龄段的假阳性率:我们的研究表明,无家可归的青少年与其社交媒体上的朋友之间的文字互动可以作为预测其药物使用情况的有力资源。我们开发的框架允许医疗机构将资源有效地分配给最需要帮助的无家可归青年,同时将管理费用降到最低。该框架还可扩展用于分析和预测在这一弱势群体中观察到的其他健康相关行为和状况。
{"title":"Identifying Marijuana Use Behaviors Among Youth Experiencing Homelessness Using a Machine Learning-Based Framework: Development and Evaluation Study.","authors":"Tianjie Deng, Andrew Urbaczewski, Young Jin Lee, Anamika Barman-Adhikari, Rinku Dewri","doi":"10.2196/53488","DOIUrl":"10.2196/53488","url":null,"abstract":"<p><strong>Background: </strong>Youth experiencing homelessness face substance use problems disproportionately compared to other youth. A study found that 69% of youth experiencing homelessness meet the criteria for dependence on at least 1 substance, compared to 1.8% for all US adolescents. In addition, they experience major structural and social inequalities, which further undermine their ability to receive the care they need.</p><p><strong>Objective: </strong>The goal of this study was to develop a machine learning-based framework that uses the social media content (posts and interactions) of youth experiencing homelessness to predict their substance use behaviors (ie, the probability of using marijuana). With this framework, social workers and care providers can identify and reach out to youth experiencing homelessness who are at a higher risk of substance use.</p><p><strong>Methods: </strong>We recruited 133 young people experiencing homelessness at a nonprofit organization located in a city in the western United States. After obtaining their consent, we collected the participants' social media conversations for the past year before they were recruited, and we asked the participants to complete a survey on their demographic information, health conditions, sexual behaviors, and substance use behaviors. Building on the social sharing of emotions theory and social support theory, we identified important features that can potentially predict substance use. Then, we used natural language processing techniques to extract such features from social media conversations and reactions and built a series of machine learning models to predict participants' marijuana use.</p><p><strong>Results: </strong>We evaluated our models based on their predictive performance as well as their conformity with measures of fairness. Without predictive features from survey information, which may introduce sex and racial biases, our machine learning models can reach an area under the curve of 0.72 and an accuracy of 0.81 using only social media data when predicting marijuana use. We also evaluated the false-positive rate for each sex and age segment.</p><p><strong>Conclusions: </strong>We showed that textual interactions among youth experiencing homelessness and their friends on social media can serve as a powerful resource to predict their substance use. The framework we developed allows care providers to allocate resources efficiently to youth experiencing homelessness in the greatest need while costing minimal overhead. It can be extended to analyze and predict other health-related behaviors and conditions observed in this vulnerable community.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e53488"},"PeriodicalIF":0.0,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528171/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning-Based Prediction for High Health Care Utilizers by Using a Multi-Institutional Diabetes Registry: Model Training and Evaluation. 利用多机构糖尿病登记处,基于机器学习预测医疗服务高利用率者:模型训练与评估。
Pub Date : 2024-10-17 DOI: 10.2196/58463
Joshua Kuan Tan, Le Quan, Nur Nasyitah Mohamed Salim, Jen Hong Tan, Su-Yen Goh, Julian Thumboo, Yong Mong Bee

Background: The cost of health care in many countries is increasing rapidly. There is a growing interest in using machine learning for predicting high health care utilizers for population health initiatives. Previous studies have focused on individuals who contribute to the highest financial burden. However, this group is small and represents a limited opportunity for long-term cost reduction.

Objective: We developed a collection of models that predict future health care utilization at various thresholds.

Methods: We utilized data from a multi-institutional diabetes database from the year 2019 to develop binary classification models. These models predict health care utilization in the subsequent year across 6 different outcomes: patients having a length of stay of ≥7, ≥14, and ≥30 days and emergency department attendance of ≥3, ≥5, and ≥10 visits. To address class imbalance, random and synthetic minority oversampling techniques were employed. The models were then applied to unseen data from 2020 and 2021 to predict health care utilization in the following year. A portfolio of performance metrics, with priority on area under the receiver operating characteristic curve, sensitivity, and positive predictive value, was used for comparison. Explainability analyses were conducted on the best performing models.

Results: When trained with random oversampling, 4 models, that is, logistic regression, multivariate adaptive regression splines, boosted trees, and multilayer perceptron consistently achieved high area under the receiver operating characteristic curve (>0.80) and sensitivity (>0.60) across training-validation and test data sets. Correcting for class imbalance proved critical for model performance. Important predictors for all outcomes included age, number of emergency department visits in the present year, chronic kidney disease stage, inpatient bed days in the present year, and mean hemoglobin A1c levels. Explainability analyses using partial dependence plots demonstrated that for the best performing models, the learned patterns were consistent with real-world knowledge, thereby supporting the validity of the models.

Conclusions: We successfully developed machine learning models capable of predicting high service level utilization with strong performance and valid explainability. These models can be integrated into wider diabetes-related population health initiatives.

背景:许多国家的医疗成本正在迅速增加。越来越多的人开始关注利用机器学习预测高医疗使用率的人群,以促进人口健康。以往的研究侧重于造成最高经济负担的个人。然而,这一群体人数较少,长期降低成本的机会有限:我们开发了一系列模型,可预测不同阈值下的未来医疗使用情况:我们利用多机构糖尿病数据库中 2019 年的数据开发了二元分类模型。这些模型通过 6 种不同的结果预测下一年的医疗利用率:住院时间≥7 天、≥14 天和≥30 天的患者,以及急诊就诊次数≥3 次、≥5 次和≥10 次的患者。为解决类不平衡问题,采用了随机和合成少数群体超采样技术。然后将模型应用于 2020 年和 2021 年的未见数据,以预测下一年的医疗利用率。为了进行比较,使用了一系列性能指标,重点是接收者工作特征曲线下面积、灵敏度和阳性预测值。对表现最好的模型进行了可解释性分析:当使用随机超采样进行训练时,4 个模型,即逻辑回归、多元自适应回归样条、助推树和多层感知器,在训练-验证和测试数据集上始终达到较高的接收者操作特征曲线下面积(>0.80)和灵敏度(>0.60)。事实证明,校正类别不平衡对模型性能至关重要。所有结果的重要预测因素包括年龄、当年急诊就诊次数、慢性肾脏病分期、当年住院天数和平均血红蛋白 A1c 水平。使用偏倚图进行的可解释性分析表明,对于表现最好的模型,学习到的模式与现实世界的知识是一致的,从而支持了模型的有效性:我们成功地开发了能够预测高服务水平利用率的机器学习模型,这些模型具有强大的性能和有效的可解释性。这些模型可以整合到更广泛的糖尿病相关人群健康计划中。
{"title":"Machine Learning-Based Prediction for High Health Care Utilizers by Using a Multi-Institutional Diabetes Registry: Model Training and Evaluation.","authors":"Joshua Kuan Tan, Le Quan, Nur Nasyitah Mohamed Salim, Jen Hong Tan, Su-Yen Goh, Julian Thumboo, Yong Mong Bee","doi":"10.2196/58463","DOIUrl":"10.2196/58463","url":null,"abstract":"<p><strong>Background: </strong>The cost of health care in many countries is increasing rapidly. There is a growing interest in using machine learning for predicting high health care utilizers for population health initiatives. Previous studies have focused on individuals who contribute to the highest financial burden. However, this group is small and represents a limited opportunity for long-term cost reduction.</p><p><strong>Objective: </strong>We developed a collection of models that predict future health care utilization at various thresholds.</p><p><strong>Methods: </strong>We utilized data from a multi-institutional diabetes database from the year 2019 to develop binary classification models. These models predict health care utilization in the subsequent year across 6 different outcomes: patients having a length of stay of ≥7, ≥14, and ≥30 days and emergency department attendance of ≥3, ≥5, and ≥10 visits. To address class imbalance, random and synthetic minority oversampling techniques were employed. The models were then applied to unseen data from 2020 and 2021 to predict health care utilization in the following year. A portfolio of performance metrics, with priority on area under the receiver operating characteristic curve, sensitivity, and positive predictive value, was used for comparison. Explainability analyses were conducted on the best performing models.</p><p><strong>Results: </strong>When trained with random oversampling, 4 models, that is, logistic regression, multivariate adaptive regression splines, boosted trees, and multilayer perceptron consistently achieved high area under the receiver operating characteristic curve (>0.80) and sensitivity (>0.60) across training-validation and test data sets. Correcting for class imbalance proved critical for model performance. Important predictors for all outcomes included age, number of emergency department visits in the present year, chronic kidney disease stage, inpatient bed days in the present year, and mean hemoglobin A<sub>1c</sub> levels. Explainability analyses using partial dependence plots demonstrated that for the best performing models, the learned patterns were consistent with real-world knowledge, thereby supporting the validity of the models.</p><p><strong>Conclusions: </strong>We successfully developed machine learning models capable of predicting high service level utilization with strong performance and valid explainability. These models can be integrated into wider diabetes-related population health initiatives.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e58463"},"PeriodicalIF":0.0,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11528163/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Behavioral Nudging With Generative AI for Content Development in SMS Health Care Interventions: Case Study. 利用生成式人工智能进行行为引导,开发短信保健干预内容:案例研究。
Pub Date : 2024-10-15 DOI: 10.2196/52974
Rachel M Harrison, Ekaterina Lapteva, Anton Bibin

Background: Brief message interventions have demonstrated immense promise in health care, yet the development of these messages has suffered from a dearth of transparency and a scarcity of publicly accessible data sets. Moreover, the researcher-driven content creation process has raised resource allocation issues, necessitating a more efficient and transparent approach to content development.

Objective: This research sets out to address the challenges of content development for SMS interventions by showcasing the use of generative artificial intelligence (AI) as a tool for content creation, transparently explaining the prompt design and content generation process, and providing the largest publicly available data set of brief messages and source code for future replication of our process.

Methods: Leveraging the pretrained large language model GPT-3.5 (OpenAI), we generate a collection of messages in the context of medication adherence for individuals with type 2 diabetes using evidence-derived behavior change techniques identified in a prior systematic review. We create an attributed prompt designed to adhere to content (readability and tone) and SMS (character count and encoder type) standards while encouraging message variability to reflect differences in behavior change techniques.

Results: We deliver the most extensive repository of brief messages for a singular health care intervention and the first library of messages crafted with generative AI. In total, our method yields a data set comprising 1150 messages, with 89.91% (n=1034) meeting character length requirements and 80.7% (n=928) meeting readability requirements. Furthermore, our analysis reveals that all messages exhibit diversity comparable to an existing publicly available data set created under the same theoretical framework for a similar setting.

Conclusions: This research provides a novel approach to content creation for health care interventions using state-of-the-art generative AI tools. Future research is needed to assess the generated content for ethical, safety, and research standards, as well as to determine whether the intervention is successful in improving the target behaviors.

背景:简短信息干预已在医疗保健领域展现出巨大的前景,然而,这些信息的开发却缺乏透明度,也缺少可公开获取的数据集。此外,由研究人员主导的内容创作过程也引发了资源分配问题,因此需要一种更高效、更透明的内容开发方法:本研究旨在通过展示将人工智能(AI)作为内容创建工具的使用,透明地解释提示设计和内容创建过程,并提供最大的公开简短信息数据集和源代码,以便将来复制我们的过程,从而解决短信干预内容开发所面临的挑战:利用预训练的大型语言模型 GPT-3.5 (OpenAI),我们使用先前系统综述中确定的循证行为改变技术,为 2 型糖尿病患者生成了一系列有关坚持用药的信息。我们创建了一个归属提示,旨在遵守内容(可读性和语气)和短信(字符数和编码器类型)标准,同时鼓励信息的可变性,以反映行为改变技术的差异:结果:我们为单一的医疗保健干预措施提供了最广泛的简短信息库,并提供了首个使用生成式人工智能制作的信息库。我们的方法总共产生了包含 1150 条信息的数据集,89.91%(n=1034)的信息符合字符长度要求,80.7%(n=928)的信息符合可读性要求。此外,我们的分析表明,所有信息表现出的多样性可与在相同理论框架下为类似环境创建的现有公开数据集相媲美:这项研究为使用最先进的生成式人工智能工具创建医疗干预内容提供了一种新方法。未来的研究需要对生成的内容进行道德、安全和研究标准评估,并确定干预措施是否成功改善了目标行为。
{"title":"Behavioral Nudging With Generative AI for Content Development in SMS Health Care Interventions: Case Study.","authors":"Rachel M Harrison, Ekaterina Lapteva, Anton Bibin","doi":"10.2196/52974","DOIUrl":"10.2196/52974","url":null,"abstract":"<p><strong>Background: </strong>Brief message interventions have demonstrated immense promise in health care, yet the development of these messages has suffered from a dearth of transparency and a scarcity of publicly accessible data sets. Moreover, the researcher-driven content creation process has raised resource allocation issues, necessitating a more efficient and transparent approach to content development.</p><p><strong>Objective: </strong>This research sets out to address the challenges of content development for SMS interventions by showcasing the use of generative artificial intelligence (AI) as a tool for content creation, transparently explaining the prompt design and content generation process, and providing the largest publicly available data set of brief messages and source code for future replication of our process.</p><p><strong>Methods: </strong>Leveraging the pretrained large language model GPT-3.5 (OpenAI), we generate a collection of messages in the context of medication adherence for individuals with type 2 diabetes using evidence-derived behavior change techniques identified in a prior systematic review. We create an attributed prompt designed to adhere to content (readability and tone) and SMS (character count and encoder type) standards while encouraging message variability to reflect differences in behavior change techniques.</p><p><strong>Results: </strong>We deliver the most extensive repository of brief messages for a singular health care intervention and the first library of messages crafted with generative AI. In total, our method yields a data set comprising 1150 messages, with 89.91% (n=1034) meeting character length requirements and 80.7% (n=928) meeting readability requirements. Furthermore, our analysis reveals that all messages exhibit diversity comparable to an existing publicly available data set created under the same theoretical framework for a similar setting.</p><p><strong>Conclusions: </strong>This research provides a novel approach to content creation for health care interventions using state-of-the-art generative AI tools. Future research is needed to assess the generated content for ethical, safety, and research standards, as well as to determine whether the intervention is successful in improving the target behaviors.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e52974"},"PeriodicalIF":0.0,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11522651/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Dual Nature of AI in Information Dissemination: Ethical Considerations. 人工智能在信息传播中的双重性质:伦理考虑。
Pub Date : 2024-10-15 DOI: 10.2196/53505
Federico Germani, Giovanni Spitale, Nikola Biller-Andorno

Infodemics pose significant dangers to public health and to the societal fabric, as the spread of misinformation can have far-reaching consequences. While artificial intelligence (AI) systems have the potential to craft compelling and valuable information campaigns with positive repercussions for public health and democracy, concerns have arisen regarding the potential use of AI systems to generate convincing disinformation. The consequences of this dual nature of AI, capable of both illuminating and obscuring the information landscape, are complex and multifaceted. We contend that the rapid integration of AI into society demands a comprehensive understanding of its ethical implications and the development of strategies to harness its potential for the greater good while mitigating harm. Thus, in this paper we explore the ethical dimensions of AI's role in information dissemination and impact on public health, arguing that potential strategies to deal with AI and disinformation encompass generating regulated and transparent data sets used to train AI models, regulating content outputs, and promoting information literacy.

由于错误信息的传播会产生深远的后果,因此信息欺骗对公众健康和社会结构构成重大威胁。虽然人工智能(AI)系统有可能制作出令人信服和有价值的信息宣传,对公众健康和民主产生积极影响,但人们也担心人工智能系统有可能被用来制造令人信服的虚假信息。人工智能具有双重属性,既能照亮信息环境,也能掩盖信息环境,其后果是复杂和多方面的。我们认为,人工智能与社会的快速融合要求我们全面了解其伦理影响,并制定战略来利用其潜力,在减少危害的同时实现更大的利益。因此,在本文中,我们探讨了人工智能在信息传播中的作用以及对公共健康影响的伦理层面,认为应对人工智能和虚假信息的潜在策略包括生成用于训练人工智能模型的规范、透明的数据集,规范内容输出,以及促进信息扫盲。
{"title":"The Dual Nature of AI in Information Dissemination: Ethical Considerations.","authors":"Federico Germani, Giovanni Spitale, Nikola Biller-Andorno","doi":"10.2196/53505","DOIUrl":"10.2196/53505","url":null,"abstract":"<p><p>Infodemics pose significant dangers to public health and to the societal fabric, as the spread of misinformation can have far-reaching consequences. While artificial intelligence (AI) systems have the potential to craft compelling and valuable information campaigns with positive repercussions for public health and democracy, concerns have arisen regarding the potential use of AI systems to generate convincing disinformation. The consequences of this dual nature of AI, capable of both illuminating and obscuring the information landscape, are complex and multifaceted. We contend that the rapid integration of AI into society demands a comprehensive understanding of its ethical implications and the development of strategies to harness its potential for the greater good while mitigating harm. Thus, in this paper we explore the ethical dimensions of AI's role in information dissemination and impact on public health, arguing that potential strategies to deal with AI and disinformation encompass generating regulated and transparent data sets used to train AI models, regulating content outputs, and promoting information literacy.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e53505"},"PeriodicalIF":0.0,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11522648/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Utility and Implications of Ambient Scribes in Primary Care. 基层医疗保健中的常备抄写员的效用和意义。
Pub Date : 2024-10-04 DOI: 10.2196/57673
Puneet Seth, Romina Carretas, Frank Rudzicz

Ambient scribe technology, utilizing large language models, represents an opportunity for addressing several current pain points in the delivery of primary care. We explore the evolution of ambient scribes and their current use in primary care. We discuss the suitability of primary care for ambient scribe integration, considering the varied nature of patient presentations and the emphasis on comprehensive care. We also propose the stages of maturation in the use of ambient scribes in primary care and their impact on care delivery. Finally, we call for focused research on safety, bias, patient impact, and privacy in ambient scribe technology, emphasizing the need for early training and education of health care providers in artificial intelligence and digital health tools.

利用大型语言模型的环境抄写员技术为解决目前初级医疗服务中的几个痛点提供了机会。我们探讨了环境抄写员的发展及其目前在初级医疗中的应用。考虑到病人表现的多样性和对综合护理的重视,我们讨论了初级医疗是否适合集成环境抄写员。我们还提出了在初级医疗中使用环境抄写员的成熟阶段及其对医疗服务的影响。最后,我们呼吁对环境抄写员技术的安全性、偏差、对患者的影响和隐私进行重点研究,强调需要对医疗服务提供者进行人工智能和数字医疗工具方面的早期培训和教育。
{"title":"The Utility and Implications of Ambient Scribes in Primary Care.","authors":"Puneet Seth, Romina Carretas, Frank Rudzicz","doi":"10.2196/57673","DOIUrl":"10.2196/57673","url":null,"abstract":"<p><p>Ambient scribe technology, utilizing large language models, represents an opportunity for addressing several current pain points in the delivery of primary care. We explore the evolution of ambient scribes and their current use in primary care. We discuss the suitability of primary care for ambient scribe integration, considering the varied nature of patient presentations and the emphasis on comprehensive care. We also propose the stages of maturation in the use of ambient scribes in primary care and their impact on care delivery. Finally, we call for focused research on safety, bias, patient impact, and privacy in ambient scribe technology, emphasizing the need for early training and education of health care providers in artificial intelligence and digital health tools.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e57673"},"PeriodicalIF":0.0,"publicationDate":"2024-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11489790/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging Temporal Trends for Training Contextual Word Embeddings to Address Bias in Biomedical Applications: Development Study. 利用时态趋势训练上下文单词嵌入,解决生物医学应用中的偏差问题:开发研究。
Pub Date : 2024-10-02 DOI: 10.2196/49546
Shunit Agmon, Uriel Singer, Kira Radinsky

Background: Women have been underrepresented in clinical trials for many years. Machine-learning models trained on clinical trial abstracts may capture and amplify biases in the data. Specifically, word embeddings are models that enable representing words as vectors and are the building block of most natural language processing systems. If word embeddings are trained on clinical trial abstracts, predictive models that use the embeddings will exhibit gender performance gaps.

Objective: We aim to capture temporal trends in clinical trials through temporal distribution matching on contextual word embeddings (specifically, BERT) and explore its effect on the bias manifested in downstream tasks.

Methods: We present TeDi-BERT, a method to harness the temporal trend of increasing women's inclusion in clinical trials to train contextual word embeddings. We implement temporal distribution matching through an adversarial classifier, trying to distinguish old from new clinical trial abstracts based on their embeddings. The temporal distribution matching acts as a form of domain adaptation from older to more recent clinical trials. We evaluate our model on 2 clinical tasks: prediction of unplanned readmission to the intensive care unit and hospital length of stay prediction. We also conduct an algorithmic analysis of the proposed method.

Results: In readmission prediction, TeDi-BERT achieved area under the receiver operating characteristic curve of 0.64 for female patients versus the baseline of 0.62 (P<.001), and 0.66 for male patients versus the baseline of 0.64 (P<.001). In the length of stay regression, TeDi-BERT achieved a mean absolute error of 4.56 (95% CI 4.44-4.68) for female patients versus 4.62 (95% CI 4.50-4.74, P<.001) and 4.54 (95% CI 4.44-4.65) for male patients versus 4.6 (95% CI 4.50-4.71, P<.001).

Conclusions: In both clinical tasks, TeDi-BERT improved performance for female patients, as expected; but it also improved performance for male patients. Our results show that accuracy for one gender does not need to be exchanged for bias reduction, but rather that good science improves clinical results for all. Contextual word embedding models trained to capture temporal trends can help mitigate the effects of bias that changes over time in the training data.

背景:多年来,女性在临床试验中的代表性一直不足。在临床试验摘要上训练的机器学习模型可能会捕捉并放大数据中的偏差。具体来说,单词嵌入是一种能将单词表示为向量的模型,是大多数自然语言处理系统的组成部分。如果在临床试验摘要中训练单词嵌入,那么使用嵌入的预测模型将表现出性别性能差距:我们旨在通过上下文词嵌入(特别是 BERT)的时间分布匹配来捕捉临床试验的时间趋势,并探索其对下游任务中表现出的偏差的影响:我们提出了 TeDi-BERT 方法,这是一种利用女性参与临床试验人数增加的时间趋势来训练上下文词嵌入的方法。我们通过对抗分类器实现时间分布匹配,试图根据嵌入词来区分新旧临床试验摘要。时间分布匹配是一种从较旧临床试验到较新临床试验的领域适应形式。我们在两项临床任务中评估了我们的模型:重症监护室意外再入院预测和住院时间预测。我们还对所提出的方法进行了算法分析:结果:在再入院预测中,TeDi-BERT 对女性患者的接收者操作特征曲线下面积为 0.64,而基线为 0.62(结论:TeDi-BERT 对女性患者的接收者操作特征曲线下面积为 0.64,而基线为 0.62):在这两项临床任务中,TeDi-BERT都提高了女性患者的表现,这是意料之中的;但它也提高了男性患者的表现。我们的研究结果表明,不需要以减少偏差为代价来换取某一性别的准确性,良好的科学性可以改善所有性别的临床结果。为捕捉时间趋势而训练的上下文词嵌入模型有助于减轻训练数据中随时间变化的偏差的影响。
{"title":"Leveraging Temporal Trends for Training Contextual Word Embeddings to Address Bias in Biomedical Applications: Development Study.","authors":"Shunit Agmon, Uriel Singer, Kira Radinsky","doi":"10.2196/49546","DOIUrl":"10.2196/49546","url":null,"abstract":"<p><strong>Background: </strong>Women have been underrepresented in clinical trials for many years. Machine-learning models trained on clinical trial abstracts may capture and amplify biases in the data. Specifically, word embeddings are models that enable representing words as vectors and are the building block of most natural language processing systems. If word embeddings are trained on clinical trial abstracts, predictive models that use the embeddings will exhibit gender performance gaps.</p><p><strong>Objective: </strong>We aim to capture temporal trends in clinical trials through temporal distribution matching on contextual word embeddings (specifically, BERT) and explore its effect on the bias manifested in downstream tasks.</p><p><strong>Methods: </strong>We present TeDi-BERT, a method to harness the temporal trend of increasing women's inclusion in clinical trials to train contextual word embeddings. We implement temporal distribution matching through an adversarial classifier, trying to distinguish old from new clinical trial abstracts based on their embeddings. The temporal distribution matching acts as a form of domain adaptation from older to more recent clinical trials. We evaluate our model on 2 clinical tasks: prediction of unplanned readmission to the intensive care unit and hospital length of stay prediction. We also conduct an algorithmic analysis of the proposed method.</p><p><strong>Results: </strong>In readmission prediction, TeDi-BERT achieved area under the receiver operating characteristic curve of 0.64 for female patients versus the baseline of 0.62 (P<.001), and 0.66 for male patients versus the baseline of 0.64 (P<.001). In the length of stay regression, TeDi-BERT achieved a mean absolute error of 4.56 (95% CI 4.44-4.68) for female patients versus 4.62 (95% CI 4.50-4.74, P<.001) and 4.54 (95% CI 4.44-4.65) for male patients versus 4.6 (95% CI 4.50-4.71, P<.001).</p><p><strong>Conclusions: </strong>In both clinical tasks, TeDi-BERT improved performance for female patients, as expected; but it also improved performance for male patients. Our results show that accuracy for one gender does not need to be exchanged for bias reduction, but rather that good science improves clinical results for all. Contextual word embedding models trained to capture temporal trends can help mitigate the effects of bias that changes over time in the training data.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e49546"},"PeriodicalIF":0.0,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483253/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142367742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Impact of a Digital Scribe System on Clinical Documentation Time and Quality: Usability Study. 数字抄写系统对临床文档记录时间和质量的影响:可用性研究。
Pub Date : 2024-09-23 DOI: 10.2196/60020
Marieke Meija van Buchem, Ilse M J Kant, Liza King, Jacqueline Kazmaier, Ewout W Steyerberg, Martijn P Bauer

Background: Physicians spend approximately half of their time on administrative tasks, which is one of the leading causes of physician burnout and decreased work satisfaction. The implementation of natural language processing-assisted clinical documentation tools may provide a solution.

Objective: This study investigates the impact of a commercially available Dutch digital scribe system on clinical documentation efficiency and quality.

Methods: Medical students with experience in clinical practice and documentation (n=22) created a total of 430 summaries of mock consultations and recorded the time they spent on this task. The consultations were summarized using 3 methods: manual summaries, fully automated summaries, and automated summaries with manual editing. We then randomly reassigned the summaries and evaluated their quality using a modified version of the Physician Documentation Quality Instrument (PDQI-9). We compared the differences between the 3 methods in descriptive statistics, quantitative text metrics (word count and lexical diversity), the PDQI-9, Recall-Oriented Understudy for Gisting Evaluation scores, and BERTScore.

Results: The median time for manual summarization was 202 seconds against 186 seconds for editing an automatic summary. Without editing, the automatic summaries attained a poorer PDQI-9 score than manual summaries (median PDQI-9 score 25 vs 31, P<.001, ANOVA test). Automatic summaries were found to have higher word counts but lower lexical diversity than manual summaries (P<.001, independent t test). The study revealed variable impacts on PDQI-9 scores and summarization time across individuals. Generally, students viewed the digital scribe system as a potentially useful tool, noting its ease of use and time-saving potential, though some criticized the summaries for their greater length and rigid structure.

Conclusions: This study highlights the potential of digital scribes in improving clinical documentation processes by offering a first summary draft for physicians to edit, thereby reducing documentation time without compromising the quality of patient records. Furthermore, digital scribes may be more beneficial to some physicians than to others and could play a role in improving the reusability of clinical documentation. Future studies should focus on the impact and quality of such a system when used by physicians in clinical practice.

背景:医生将大约一半的时间花在行政工作上,这是导致医生职业倦怠和工作满意度下降的主要原因之一。实施自然语言处理辅助临床文档编制工具可能是一种解决方案:本研究调查了市场上销售的荷兰数字抄写员系统对临床文档效率和质量的影响:方法:具有临床实践和文档记录经验的医科学生(22 人)共创建了 430 份模拟会诊摘要,并记录了他们在这项任务上花费的时间。会诊总结采用了 3 种方法:手动总结、全自动总结和带手动编辑的自动总结。然后,我们随机重新分配摘要,并使用修订版的医生文档质量量表(PDQI-9)对其质量进行评估。我们比较了 3 种方法在描述性统计、定量文本指标(字数和词汇多样性)、PDQI-9、以回忆为导向的摘要评估评分和 BERTScore 方面的差异:结果:人工摘要的中位时间为 202 秒,而自动摘要的编辑时间为 186 秒。在没有编辑的情况下,自动摘要的 PDQI-9 得分低于人工摘要(PDQI-9 的中位数为 25 分,而人工摘要为 31 分):这项研究强调了数字抄写员在改善临床文档记录流程方面的潜力,它提供了第一份摘要草稿供医生编辑,从而在不影响病历质量的情况下减少了文档记录时间。此外,数字抄写员可能对某些医生比对另一些医生更有利,并能在提高临床文档的可重用性方面发挥作用。未来的研究应侧重于医生在临床实践中使用这种系统时的影响和质量。
{"title":"Impact of a Digital Scribe System on Clinical Documentation Time and Quality: Usability Study.","authors":"Marieke Meija van Buchem, Ilse M J Kant, Liza King, Jacqueline Kazmaier, Ewout W Steyerberg, Martijn P Bauer","doi":"10.2196/60020","DOIUrl":"10.2196/60020","url":null,"abstract":"<p><strong>Background: </strong>Physicians spend approximately half of their time on administrative tasks, which is one of the leading causes of physician burnout and decreased work satisfaction. The implementation of natural language processing-assisted clinical documentation tools may provide a solution.</p><p><strong>Objective: </strong>This study investigates the impact of a commercially available Dutch digital scribe system on clinical documentation efficiency and quality.</p><p><strong>Methods: </strong>Medical students with experience in clinical practice and documentation (n=22) created a total of 430 summaries of mock consultations and recorded the time they spent on this task. The consultations were summarized using 3 methods: manual summaries, fully automated summaries, and automated summaries with manual editing. We then randomly reassigned the summaries and evaluated their quality using a modified version of the Physician Documentation Quality Instrument (PDQI-9). We compared the differences between the 3 methods in descriptive statistics, quantitative text metrics (word count and lexical diversity), the PDQI-9, Recall-Oriented Understudy for Gisting Evaluation scores, and BERTScore.</p><p><strong>Results: </strong>The median time for manual summarization was 202 seconds against 186 seconds for editing an automatic summary. Without editing, the automatic summaries attained a poorer PDQI-9 score than manual summaries (median PDQI-9 score 25 vs 31, P<.001, ANOVA test). Automatic summaries were found to have higher word counts but lower lexical diversity than manual summaries (P<.001, independent t test). The study revealed variable impacts on PDQI-9 scores and summarization time across individuals. Generally, students viewed the digital scribe system as a potentially useful tool, noting its ease of use and time-saving potential, though some criticized the summaries for their greater length and rigid structure.</p><p><strong>Conclusions: </strong>This study highlights the potential of digital scribes in improving clinical documentation processes by offering a first summary draft for physicians to edit, thereby reducing documentation time without compromising the quality of patient records. Furthermore, digital scribes may be more beneficial to some physicians than to others and could play a role in improving the reusability of clinical documentation. Future studies should focus on the impact and quality of such a system when used by physicians in clinical practice.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e60020"},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11459111/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142302660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
JMIR AI
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1