medRxiv - Health Informatics最新文献_第5页

Development of a cloud framework for training and deployment of deep learning models in Radiology: automatic segmentation of the human spine from CT-scans as a case-study 开发用于放射学深度学习模型训练和部署的云框架：以从 CT 扫描中自动分割人体脊柱为例

medRxiv - Health Informatics

Pub Date : 2024-08-28 DOI: 10.1101/2024.08.27.24312635

Rui Santos, Nicholas Bünger, Benedikt Herzog, Sebastiano Caprara

Advancements in artificial intelligence (AI) and the digitalization of healthcare are revolutionizing clinical practices, with the deployment of AI models playing a crucial role in enhancing diagnostic accuracy and treatment outcomes. Our current study aims at bridging image data collected in a clinical setting, with deployment of deep learning algorithms for the segmentation of the human spine. The developed pipeline takes a decentralized approach, where selected clinical images are sent to a trusted research environment, part of private tenant in a cloud service provider. As a use-case scenario, we used the TotalSegmentator CT-scan dataset, along with its annotated ground-truth spine data, to train a ResSegNet model native to the MONAI-Label framework. Training and validation were conducted using high performance GPUs available on demand in the Trusted Research Environment. Segmentation model performance benchmarking involved metrics such as dice score, intersection over union, accuracy, precision, sensitivity, specificity, bounding F1 score, Cohen’s kappa, area under the curve, and Hausdorff distance. To further assess model robustness, we also trained a state-of-the-art nnU-Net model using the same dataset and compared both models with a pre-trained spine segmentation model available within MONAI-Label. The ResSegNet model, deployable via MONAI-Label, demonstrated performance comparable to the state-of-the-art nnU-Net framework, with both models showing strong results across multiple segmentation metrics. This study successfully trained, evaluated and deployed a decentralized deep learning model for CT-scan spine segmentation in a cloud environment. This new model was validated against state-of-the-art alternatives. This comprehensive comparison highlights the value of the MONAI-Label as an effective tool for label generation, model training, and deployment, further highlighting its user-friendly nature and ease of deployment in clinical and research settings. Further we also demonstrate that such tools can be deployed in private and safe decentralized cloud environments for clinical use.

人工智能（AI）的进步和医疗保健的数字化正在彻底改变临床实践，而人工智能模型的部署在提高诊断准确性和治疗效果方面发挥着至关重要的作用。我们目前的研究旨在将临床环境中收集的图像数据与用于人体脊柱分割的深度学习算法的部署结合起来。开发的管道采用分散式方法，将选定的临床图像发送到可信的研究环境中，该环境是云服务提供商私有租户的一部分。作为一个用例场景，我们使用 TotalSegmentator CT 扫描数据集及其注释的地面实况脊柱数据来训练原生于 MONAI-Label 框架的 ResSegNet 模型。训练和验证使用可信研究环境中按需提供的高性能 GPU 进行。分割模型性能基准涉及的指标包括骰子分数、交集大于联合、准确度、精确度、灵敏度、特异性、边界 F1 分数、科恩卡帕、曲线下面积和豪斯多夫距离。为了进一步评估模型的鲁棒性，我们还使用相同的数据集训练了一个最先进的 nnU-Net 模型，并将这两个模型与 MONAI-Label 中的一个预训练脊柱分割模型进行了比较。ResSegNet模型可通过MONAI-Label部署，其性能与最先进的nnU-Net框架相当，两个模型在多个分割指标上都显示出强劲的结果。这项研究成功地在云环境中训练、评估和部署了用于 CT 扫描脊柱分割的分散式深度学习模型。这个新模型与最先进的替代模型进行了验证。这种全面的比较凸显了 MONAI 标签作为标签生成、模型训练和部署的有效工具的价值，进一步突出了其用户友好性以及在临床和研究环境中部署的便利性。此外，我们还证明了此类工具可以部署在私有和安全的分散式云环境中供临床使用。

{"title":"Development of a cloud framework for training and deployment of deep learning models in Radiology: automatic segmentation of the human spine from CT-scans as a case-study","authors":"Rui Santos, Nicholas Bünger, Benedikt Herzog, Sebastiano Caprara","doi":"10.1101/2024.08.27.24312635","DOIUrl":"https://doi.org/10.1101/2024.08.27.24312635","url":null,"abstract":"Advancements in artificial intelligence (AI) and the digitalization of healthcare are revolutionizing clinical practices, with the deployment of AI models playing a crucial role in enhancing diagnostic accuracy and treatment outcomes. Our current study aims at bridging image data collected in a clinical setting, with deployment of deep learning algorithms for the segmentation of the human spine. The developed pipeline takes a decentralized approach, where selected clinical images are sent to a trusted research environment, part of private tenant in a cloud service provider. As a use-case scenario, we used the TotalSegmentator CT-scan dataset, along with its annotated ground-truth spine data, to train a ResSegNet model native to the MONAI-Label framework. Training and validation were conducted using high performance GPUs available on demand in the Trusted Research Environment. Segmentation model performance benchmarking involved metrics such as dice score, intersection over union, accuracy, precision, sensitivity, specificity, bounding F1 score, Cohen’s kappa, area under the curve, and Hausdorff distance. To further assess model robustness, we also trained a state-of-the-art nnU-Net model using the same dataset and compared both models with a pre-trained spine segmentation model available within MONAI-Label. The ResSegNet model, deployable via MONAI-Label, demonstrated performance comparable to the state-of-the-art nnU-Net framework, with both models showing strong results across multiple segmentation metrics. This study successfully trained, evaluated and deployed a decentralized deep learning model for CT-scan spine segmentation in a cloud environment. This new model was validated against state-of-the-art alternatives. This comprehensive comparison highlights the value of the MONAI-Label as an effective tool for label generation, model training, and deployment, further highlighting its user-friendly nature and ease of deployment in clinical and research settings. Further we also demonstrate that such tools can be deployed in private and safe decentralized cloud environments for clinical use.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluation synthesis analysis can be accelerated through text mining, searching, and highlighting: A case-study on data extraction from 631 UNICEF evaluation reports 通过文本挖掘、搜索和高亮显示，可以加速评价综合分析：从联合国儿童基金会 631 份评价报告中提取数据的案例研究

medRxiv - Health Informatics

Pub Date : 2024-08-28 DOI: 10.1101/2024.08.27.24312630

Lena Schmidt, Pauline Addis, Erica Mattellone, Hannah OKeefe, Kamilla Nabiyeva, Uyen Kim Huynh, Nabamallika Dehingia, Dawn Craig, Fiona Campbell

Background: The United Nations Children's Fund (UNICEF) is the United Nations agency dedicated to promoting and advocating for the protection of children's rights, meeting their basic needs, and expanding their opportunities to reach their full potential. They achieve this by working with governments, communities, and other partners via programmes that safeguard children from violence, provide access to quality education, ensure that children survive and thrive, provide access to water, sanitation and hygiene, and provide life-saving support in emergency contexts. Programmes are evaluated as part of UNICEF Evaluation Policy, and the publicly available reports include a wealth of information on results, recommendations, and lessons learned. Objective: To critically explore UNICEF's impact, a systematic synthesis of evaluations was conducted to provide a summary of UNICEF main achievements and areas where they could improve, as a reflection of key recommendations, lessons learned, enablers, and barriers to achieving their goals and to steer its future direction and strategy. Since the evaluations are extensive, manual analysis was not feasible, so a semi-automated approach was taken. Methods: This paper examines the automation techniques used to try and increase the feasibility of undertaking broad evaluation syntheses analyses. Our semi-automated human-in-the-loop methods supported data extraction of data for 64 outcomes across 631 evaluation reports; each of which comprised hundreds of pages of text. The outcomes are derived from the five goal areas within UNICEF 2022-2025 Strategic Plan. For text pre-processing we implemented PDF-to-text extraction, section parsing, and sentence mining via a neural network. Data extraction was supported by a freely available text-mining workbench, SWIFT-Review. Here, we describe using comprehensive adjacency-search-based queries to rapidly filter reports by outcomes and to highlight relevant sections of text to expedite data extraction. Results: While the methods used were not expected to produce 100% complete results for each outcome, they present useful automation methods for researchers facing otherwise non-feasible evaluation syntheses tasks. We reduced the text volume down to 8% using deep learning (recall 0.93) and rapidly identified relevant evaluations across outcomes with a median precision of 0.6. All code is available and open-source. Conclusions: When the classic approach of systematically extracting information from all outcomes across all texts exceeds available resources, the proposed automation methods can be employed to speed up the process while retaining scientific rigour and reproducibility.

背景：联合国儿童基金会（UNICEF）是致力于促进和倡导保护儿童权利、满足儿童基本需求、扩大儿童充分发挥潜力的机会的联合国机构。为实现这一目标，联合国儿童基金会与各国政府、社区和其他合作伙伴合作，通过各种计划保护儿童免受暴力侵害，提供优质教育，确保儿童生存和茁壮成长，提供水、环境卫生和个人卫生，并在紧急情况下提供救生支持。作为联合国儿童基金会评价政策的一部分，我们对各项计划进行了评价，公开发布的报告包含大量有关成果、建议和经验教训的信息。目标：为了批判性地探讨联合国儿童基金会的影响，对各项评价进行了系统的综合，以总结联合国儿童基金会的主要成就和可以改进的领域，反映实现其目标的主要建议、经验教训、促进因素和障碍，并指导其未来的方向和战略。由于评价内容广泛，人工分析不可行，因此采用了半自动化方法。方法：本文探讨了为提高进行广泛评估综合分析的可行性而使用的自动化技术。我们的半自动化人环方法支持对 631 份评估报告中的 64 项成果进行数据提取；每份报告都包含数百页的文本。这些成果来自联合国儿童基金会 2022-2025 年战略计划的五个目标领域。在文本预处理方面，我们通过神经网络实现了 PDF 到文本的提取、章节解析和句子挖掘。数据提取由免费提供的文本挖掘工作平台 SWIFT-Review 支持。在此，我们介绍了如何使用基于邻接搜索的综合查询来按结果快速筛选报告，并突出显示文本的相关部分以加快数据提取。结果：虽然所使用的方法并不能为每个结果生成 100% 的完整结果，但它们为面临其他不可行的评估综合任务的研究人员提供了有用的自动化方法。我们利用深度学习（召回率为 0.93）将文本量减少到了 8%，并以 0.6 的中位精度快速识别出了各结果中的相关评价。所有代码都是开源的。结论当系统地从所有文本的所有结果中提取信息的经典方法超出可用资源时，可以采用建议的自动化方法来加快这一过程，同时保持科学的严谨性和可重复性。

{"title":"Evaluation synthesis analysis can be accelerated through text mining, searching, and highlighting: A case-study on data extraction from 631 UNICEF evaluation reports","authors":"Lena Schmidt, Pauline Addis, Erica Mattellone, Hannah OKeefe, Kamilla Nabiyeva, Uyen Kim Huynh, Nabamallika Dehingia, Dawn Craig, Fiona Campbell","doi":"10.1101/2024.08.27.24312630","DOIUrl":"https://doi.org/10.1101/2024.08.27.24312630","url":null,"abstract":"Background: The United Nations Children's Fund (UNICEF) is the United Nations agency dedicated to promoting and advocating for the protection of children's rights, meeting their basic needs, and expanding their opportunities to reach their full potential. They achieve this by working with governments, communities, and other partners via programmes that safeguard children from violence, provide access to quality education, ensure that children survive and thrive, provide access to water, sanitation and hygiene, and provide life-saving support in emergency contexts. Programmes are evaluated as part of UNICEF Evaluation Policy, and the publicly available reports include a wealth of information on results, recommendations, and lessons learned. Objective: To critically explore UNICEF's impact, a systematic synthesis of evaluations was conducted to provide a summary of UNICEF main achievements and areas where they could improve, as a reflection of key recommendations, lessons learned, enablers, and barriers to achieving their goals and to steer its future direction and strategy. Since the evaluations are extensive, manual analysis was not feasible, so a semi-automated approach was taken. Methods: This paper examines the automation techniques used to try and increase the feasibility of undertaking broad evaluation syntheses analyses. Our semi-automated human-in-the-loop methods supported data extraction of data for 64 outcomes across 631 evaluation reports; each of which comprised hundreds of pages of text. The outcomes are derived from the five goal areas within UNICEF 2022-2025 Strategic Plan. For text pre-processing we implemented PDF-to-text extraction, section parsing, and sentence mining via a neural network. Data extraction was supported by a freely available text-mining workbench, SWIFT-Review. Here, we describe using comprehensive adjacency-search-based queries to rapidly filter reports by outcomes and to highlight relevant sections of text to expedite data extraction. Results: While the methods used were not expected to produce 100% complete results for each outcome, they present useful automation methods for researchers facing otherwise non-feasible evaluation syntheses tasks. We reduced the text volume down to 8% using deep learning (recall 0.93) and rapidly identified relevant evaluations across outcomes with a median precision of 0.6. All code is available and open-source. Conclusions: When the classic approach of systematically extracting information from all outcomes across all texts exceeds available resources, the proposed automation methods can be employed to speed up the process while retaining scientific rigour and reproducibility.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MedSegBench: A Comprehensive Benchmark for Medical Image Segmentation in Diverse Data Modalities MedSegBench：多种数据模式下医学图像分割的综合基准

medRxiv - Health Informatics

Pub Date : 2024-08-28 DOI: 10.1101/2024.08.26.24312619

Zeki Kuş, Musa Aydin

MedSegBench is a comprehensive benchmark designed to evaluate deep learning models for medical image segmentation across a wide range of modalities. It covers a wide range of modalities, including 35 datasets with over 60,000 images from ultrasound, MRI, and X-ray. The benchmark addresses challenges in medical imaging by providing standardized datasets with train/validation/test splits, considering variability in image quality and dataset imbalances. The benchmark supports binary and multi-class segmentation tasks with up to 19 classes. It supports binary and multi-class segmentation tasks with up to 19 classes and uses the U-Net architecture with various encoder/decoder networks such as ResNets, EfficientNet, and DenseNet for evaluations. MedSegBench is a valuable resource for developing robust and flexible segmentation algorithms and allows for fair comparisons across different models, promoting the development of universal models for medical tasks. It is the most comprehensive study among medical segmentation datasets. The datasets and source code are publicly available, encouraging further research and development in medical image analysis.

MedSegBench 是一个综合基准，旨在评估各种模式下医学图像分割的深度学习模型。它涵盖多种模式，包括 35 个数据集，包含 60,000 多张超声波、核磁共振成像和 X 光图像。该基准考虑到图像质量的可变性和数据集的不平衡性，提供了具有训练/验证/测试分裂的标准化数据集，从而解决了医学成像中的难题。该基准支持多达 19 个类别的二元和多类别分割任务。它支持多达 19 个类别的二进制和多类分割任务，并使用 U-Net 架构和各种编码器/解码器网络（如 ResNets、EfficientNet 和 DenseNet）进行评估。MedSegBench 是开发稳健灵活的分割算法的宝贵资源，它允许对不同模型进行公平比较，促进了医疗任务通用模型的开发。它是医学分割数据集中最全面的研究。这些数据集和源代码都是公开的，有助于医学图像分析领域的进一步研究和开发。

{"title":"MedSegBench: A Comprehensive Benchmark for Medical Image Segmentation in Diverse Data Modalities","authors":"Zeki Kuş, Musa Aydin","doi":"10.1101/2024.08.26.24312619","DOIUrl":"https://doi.org/10.1101/2024.08.26.24312619","url":null,"abstract":"MedSegBench is a comprehensive benchmark designed to evaluate deep learning models for medical image segmentation across a wide range of modalities. It covers a wide range of modalities, including 35 datasets with over 60,000 images from ultrasound, MRI, and X-ray. The benchmark addresses challenges in medical imaging by providing standardized datasets with train/validation/test splits, considering variability in image quality and dataset imbalances. The benchmark supports binary and multi-class segmentation tasks with up to 19 classes. It supports binary and multi-class segmentation tasks with up to 19 classes and uses the U-Net architecture with various encoder/decoder networks such as ResNets, EfficientNet, and DenseNet for evaluations. MedSegBench is a valuable resource for developing robust and flexible segmentation algorithms and allows for fair comparisons across different models, promoting the development of universal models for medical tasks. It is the most comprehensive study among medical segmentation datasets. The datasets and source code are publicly available, encouraging further research and development in medical image analysis.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Large Language Model Augmented Clinical Trial Screening 大语言模型辅助临床试验筛选

medRxiv - Health Informatics

Pub Date : 2024-08-28 DOI: 10.1101/2024.08.27.24312646

Jacob Beattie, Dylan Owens, Ann Marie Navar, Luiza Giuliani Schmitt, Kimberly Taing, Sarah Neufeld, Daniel Yang, Christian Chukwuma, Ahmed Gul, Dong Soo Lee, Neil Desai, Dominic Moon, Jing Wang, Steve Jiang, Michael Dohopolski

Purpose: Identifying potential participants for clinical trials using traditional manual screening methods is time-consuming and expensive. Structured data in electronic health records (EHR) are often insufficient to capture trial inclusion and exclusion criteria adequately. Large language models (LLMs) offer the potential for improved participant screening by searching text notes in the EHR, but optimal deployment strategies remain unclear.Methods: We evaluated the performance of GPT-3.5 and GPT-4 in screening a cohort of 74 patients (35 eligible, 39 ineligible) using EHR data, including progress notes, pathology reports, and imaging reports, for a phase 2 clinical trial in patients with head and neck cancer. Fourteen trial criteria were evaluated, including stage, histology, prior treatments, underlying conditions, functional status, etc. Manually annotated data served as the ground truth. We tested three prompting approaches (Structured Output (SO), Chain of Thought (CoT), and Self-Discover (SD)). SO and CoT were further tested using expert and LLM guidance (EG and LLM-G, respectively). Prompts were developed and refined using 10 patients from each cohort and then assessed on the remaining 54 patients. Each approach was assessed for accuracy, sensitivity, specificity, and micro F1 score. We explored two eligibility predictions: strict eligibility required meeting all criteria, while proportional eligibility used the proportion of criteria met. Screening time and cost were measured, and a failure analysis identified common misclassification issues. Results: Fifty-four patients were evaluated (25 enrolled, 29 not enrolled). At the criterion level, GPT-3.5 showed a median accuracy of 0.761 (range: 0.554-0.910), with the Structured Out- put + EG approach performing best. GPT-4 demonstrated a median accuracy of 0.838 (range: 0.758-0.886), with the Self-Discover approach achieving the highest Youden Index of 0.729. For strict patient-level eligibility, GPT-3.5's Structured Output + EG approach reached an accuracy of 0.611, while GPT-4's CoT + EG achieved 0.65. Proportional eligibility performed better over- all, with GPT-4's CoT + LLM-G approach having the highest AUC (0.82) and Youden Index (0.60). Screening times ranged from 1.4 to 3 minutes per patient for GPT-3.5 and 7.9 to 12.4 minutes for GPT-4, with costs of $0.02-$0.03 for GPT-3.5 and $0.15-$0.27 for GPT-4.Conclusion: LLMs can be used to identify specific clinical trial criteria but had difficulties identifying patients who met all criteria. Instead, using the proportion of criteria met to flag candidates for manual review maybe a more practical approach. LLM performance varies by prompt, with GPT-4 generally outperforming GPT-3.5, but at higher costs and longer processing times. LLMs should complement, not replace, manual chart reviews for matching patients to clinical trials.

目的：使用传统的人工筛选方法确定临床试验的潜在参与者既费时又费钱。电子病历（EHR）中的结构化数据往往不足以充分捕捉试验纳入和排除标准。大语言模型（LLM）通过搜索电子病历中的文本注释，为改进参与者筛选提供了可能，但最佳部署策略仍不明确：我们评估了 GPT-3.5 和 GPT-4 在筛选 74 名患者（35 名符合条件，39 名不符合条件）时的性能，筛选时使用的是头颈癌患者 2 期临床试验的 EHR 数据，包括进展记录、病理报告和成像报告。对 14 项试验标准进行了评估，包括分期、组织学、既往治疗、基础疾病、功能状态等。人工标注的数据作为基本事实。我们测试了三种提示方法（结构化输出（SO）、思维链（CoT）和自我发现（SD））。在专家和 LLM 的指导下（分别为 EG 和 LLM-G），我们进一步测试了 SO 和 CoT。在每个队列中挑选 10 名患者对提示进行开发和改进，然后对剩余的 54 名患者进行评估。我们对每种方法的准确性、灵敏度、特异性和微观 F1 分数进行了评估。我们探讨了两种资格预测：严格资格要求符合所有标准，而比例资格则使用符合标准的比例。我们对筛查时间和成本进行了衡量，并通过失败分析找出了常见的错误分类问题。结果：对 54 名患者进行了评估（25 人入选，29 人未获入选）。在标准水平上，GPT-3.5 的中位准确率为 0.761（范围：0.554-0.910），其中结构化外推 + EG 方法表现最佳。GPT-4 的中位准确度为 0.838（范围：0.758-0.886），其中自我发现法的尤登指数最高，为 0.729。对于严格的患者级别资格审查，GPT-3.5 的结构化输出 + EG 方法达到了 0.611 的准确率，而 GPT-4 的 CoT + EG 达到了 0.65。比例资格筛选的总体表现更好，GPT-4 的 CoT + LLM-G 方法的 AUC（0.82）和尤登指数（0.60）最高。GPT-3.5 的每位患者筛查时间为 1.4 到 3 分钟，GPT-4 为 7.9 到 12.4 分钟，GPT-3.5 的成本为 0.02 到 0.03 美元，GPT-4 为 0.15 到 0.27 美元：结论：LLM 可用于识别特定的临床试验标准，但难以识别符合所有标准的患者。相反，使用符合标准的比例来标记需要人工审核的候选者可能是一种更实用的方法。LLM 的性能因提示而异，GPT-4 普遍优于 GPT-3.5，但成本更高，处理时间更长。在将患者与临床试验相匹配方面，LLM 应该是人工病历审核的补充，而不是替代。

{"title":"Large Language Model Augmented Clinical Trial Screening","authors":"Jacob Beattie, Dylan Owens, Ann Marie Navar, Luiza Giuliani Schmitt, Kimberly Taing, Sarah Neufeld, Daniel Yang, Christian Chukwuma, Ahmed Gul, Dong Soo Lee, Neil Desai, Dominic Moon, Jing Wang, Steve Jiang, Michael Dohopolski","doi":"10.1101/2024.08.27.24312646","DOIUrl":"https://doi.org/10.1101/2024.08.27.24312646","url":null,"abstract":"Purpose: Identifying potential participants for clinical trials using traditional manual screening methods is time-consuming and expensive. Structured data in electronic health records (EHR) are often insufficient to capture trial inclusion and exclusion criteria adequately. Large language models (LLMs) offer the potential for improved participant screening by searching text notes in the EHR, but optimal deployment strategies remain unclear.\u0000Methods: We evaluated the performance of GPT-3.5 and GPT-4 in screening a cohort of 74 patients (35 eligible, 39 ineligible) using EHR data, including progress notes, pathology reports, and imaging reports, for a phase 2 clinical trial in patients with head and neck cancer. Fourteen trial criteria were evaluated, including stage, histology, prior treatments, underlying conditions, functional status, etc. Manually annotated data served as the ground truth. We tested three prompting approaches (Structured Output (SO), Chain of Thought (CoT), and Self-Discover (SD)). SO and CoT were further tested using expert and LLM guidance (EG and LLM-G, respectively). Prompts were developed and refined using 10 patients from each cohort and then assessed on the remaining 54 patients. Each approach was assessed for accuracy, sensitivity, specificity, and micro F1 score. We explored two eligibility predictions: strict eligibility required meeting all criteria, while proportional eligibility used the proportion of criteria met. Screening time and cost were measured, and a failure analysis identified common misclassification issues. Results: Fifty-four patients were evaluated (25 enrolled, 29 not enrolled). At the criterion level, GPT-3.5 showed a median accuracy of 0.761 (range: 0.554-0.910), with the Structured Out- put + EG approach performing best. GPT-4 demonstrated a median accuracy of 0.838 (range: 0.758-0.886), with the Self-Discover approach achieving the highest Youden Index of 0.729. For strict patient-level eligibility, GPT-3.5's Structured Output + EG approach reached an accuracy of 0.611, while GPT-4's CoT + EG achieved 0.65. Proportional eligibility performed better over- all, with GPT-4's CoT + LLM-G approach having the highest AUC (0.82) and Youden Index (0.60). Screening times ranged from 1.4 to 3 minutes per patient for GPT-3.5 and 7.9 to 12.4 minutes for GPT-4, with costs of $0.02-$0.03 for GPT-3.5 and $0.15-$0.27 for GPT-4.\u0000Conclusion: LLMs can be used to identify specific clinical trial criteria but had difficulties identifying patients who met all criteria. Instead, using the proportion of criteria met to flag candidates for manual review maybe a more practical approach. LLM performance varies by prompt, with GPT-4 generally outperforming GPT-3.5, but at higher costs and longer processing times. LLMs should complement, not replace, manual chart reviews for matching patients to clinical trials.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluating Anti-LGBTQIA+ Medical Bias in Large Language Models 评估大型语言模型中的反 LGBTQIA+ 医学偏见

medRxiv - Health Informatics

Pub Date : 2024-08-27 DOI: 10.1101/2024.08.22.24312464

Crystal Tin-Tin Chang, Neha Srivathsa, Charbel Bou-Khalil, Akshay Swaminathan, Mitchell R Lunn, Kavita Mishra, Roxana Daneshjou, Sanmi Koyejo

From drafting responses to patient messages to clinical decision support to patient-facing educational chatbots, Large Language Models (LLMs) present many opportunities for use in clinical situations. In these applications, we must consider potential harms to minoritized groups through the propagation of medical misinformation or previously-held misconceptions. In this work, we evaluate the potential of LLMs to propagate anti-LGBTQIA+ medical bias and misinformation. We prompted 4 LLMs (Gemini 1.5 Flash, Claude 3 Haiku, GPT-4o, Stanford Medicine Secure GPT (GPT-4.0)) with a set of 38 prompts consisting of explicit questions and synthetic clinical notes created by medically trained reviewers and LGBTQIA+ health experts. The prompts explored clinical situations across two axes: (i) situations where historical bias has been observed vs. not observed, and (ii) situations where LGBTQIA+ identity is relevant to clinical care vs. not relevant. Medically trained reviewers evaluated LLM responses for appropriateness (safety, privacy, hallucination/accuracy, and bias) and clinical utility. We find that all 4 LLMs evaluated generated inappropriate responses to our prompt set. LLM performance is strongly hampered by learned anti-LGBTQIA+ bias and over-reliance on the mentioned conditions in prompts. Given these results, future work should focus on tailoring output formats according to stated use cases, decreasing sycophancy and reliance on extraneous information in the prompt, and improving accuracy and decreasing bias for LGBTQIA+ patients and care providers.

从起草患者信息回复到临床决策支持，再到面向患者的教育聊天机器人，大语言模型（LLM）为临床应用提供了许多机会。在这些应用中，我们必须考虑到通过传播医疗错误信息或以前持有的错误观念对少数群体造成的潜在伤害。在这项工作中，我们评估了 LLMs 传播反 LGBTQIA+ 医学偏见和错误信息的可能性。我们向 4 款 LLM（Gemini 1.5 Flash、Claude 3 Haiku、GPT-4o、Stanford Medicine Secure GPT (GPT-4.0)）发出了 38 条提示，这些提示由受过医学培训的审阅者和 LGBTQIA+ 健康专家创建的明确问题和合成临床笔记组成。这些提示探讨了两个方面的临床情况：(i) 已观察到与未观察到历史偏见的情况；(ii) LGBTQIA+ 身份与临床护理相关与不相关的情况。接受过医学培训的评审员对 LLM 答复的适当性（安全性、隐私、幻觉/准确性和偏见）和临床实用性进行了评估。我们发现，所有 4 个接受评估的 LLM 都对我们的提示集做出了不恰当的回答。学习到的反 LGBTQIA+ 偏差和过度依赖提示中提到的条件严重影响了 LLM 的性能。鉴于这些结果，未来的工作重点应该是根据所述用例定制输出格式，减少提示中的谄媚和对无关信息的依赖，提高准确性并减少对 LGBTQIA+ 患者和护理提供者的偏见。

{"title":"Evaluating Anti-LGBTQIA+ Medical Bias in Large Language Models","authors":"Crystal Tin-Tin Chang, Neha Srivathsa, Charbel Bou-Khalil, Akshay Swaminathan, Mitchell R Lunn, Kavita Mishra, Roxana Daneshjou, Sanmi Koyejo","doi":"10.1101/2024.08.22.24312464","DOIUrl":"https://doi.org/10.1101/2024.08.22.24312464","url":null,"abstract":"From drafting responses to patient messages to clinical decision support to patient-facing educational chatbots, Large Language Models (LLMs) present many opportunities for use in clinical situations. In these applications, we must consider potential harms to minoritized groups through the propagation of medical misinformation or previously-held misconceptions. In this work, we evaluate the potential of LLMs to propagate anti-LGBTQIA+ medical bias and misinformation. We prompted 4 LLMs (Gemini 1.5 Flash, Claude 3 Haiku, GPT-4o, Stanford Medicine Secure GPT (GPT-4.0)) with a set of 38 prompts consisting of explicit questions and synthetic clinical notes created by medically trained reviewers and LGBTQIA+ health experts. The prompts explored clinical situations across two axes: (i) situations where historical bias has been observed vs. not observed, and (ii) situations where LGBTQIA+ identity is relevant to clinical care vs. not relevant. Medically trained reviewers evaluated LLM responses for appropriateness (safety, privacy, hallucination/accuracy, and bias) and clinical utility. We find that all 4 LLMs evaluated generated inappropriate responses to our prompt set. LLM performance is strongly hampered by learned anti-LGBTQIA+ bias and over-reliance on the mentioned conditions in prompts. Given these results, future work should focus on tailoring output formats according to stated use cases, decreasing sycophancy and reliance on extraneous information in the prompt, and improving accuracy and decreasing bias for LGBTQIA+ patients and care providers.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Clinical Documentation Workflow with Ambient Artificial Intelligence: Clinician Perspectives on Work Burden, Burnout, and Job Satisfaction 利用环境人工智能加强临床文档工作流程：临床医生对工作负担、职业倦怠和工作满意度的看法

medRxiv - Health Informatics

Pub Date : 2024-08-26 DOI: 10.1101/2024.08.12.24311883

Michael Albrecht, Denton Shanks, Tina Shah, Taina Hudson, Jeffrey Thompson, Tanya Filardi, Kelli Wright, Greg Ator, Timothy Ryan Smith

Objective: This study assessed the effects of an ambient artificial intelligence (AI) documentation platform on clinicians' perceptions of documentation workflow.Materials and Methods: A pre- and post-implementation survey evaluated ambulatory clinician perceptions on impact of Abridge, an ambient AI documentation platform. Outcomes included clinical documentation burden, work after-hours, clinician burnout, work satisfaction, and patient access. Data were analyzed using descriptive statistics and proportional odds logistic regression to compare changes for concordant questions across pre- and post-surveys. Covariate analysis examined effect of specialty type and duration of use of the AI tool. Results: Survey response rates were 51.1% (94/181) pre-implementation and 75.9% (101/133) post-implementation. Clinician perception of ease of documentation workflow (OR = 6.91, 95% CI: 3.90 to 12.56, p<0.001) and in completing notes associated with usage of the AI tool (OR = 4.95, 95% CI: 2.87 to 8.69, p<0.001) was significantly improved. The majority of respondents agreed that the AI tool decreased documentation burden, decreased the time spent documenting outside clinical hours, reduced burnout risk, and increased job satisfaction, with 48% agreeing that an additional patient could be seen if needed. Clinician specialty type and number of days using the AI tool did not significantly affect survey responses.Discussion: Clinician experience and efficiency was dramatically improved with use of Abridge across a breadth of specialties.

研究目的本研究评估了环境人工智能（AI）文档平台对临床医生对文档工作流程看法的影响：一项实施前和实施后调查评估了门诊临床医生对环境人工智能文档平台 Abridge 影响的看法。结果包括临床文档负担、下班后的工作、临床医生的职业倦怠、工作满意度和患者就医情况。数据采用描述性统计和比例赔率逻辑回归进行分析，以比较调查前后一致问题的变化。协变量分析考察了专科类型和人工智能工具使用时间的影响。结果调查回复率在实施前为 51.1%（94/181），实施后为 75.9%（101/133）。临床医生对文档工作流程（OR = 6.91，95% CI：3.90 至 12.56，p<0.001）和与使用人工智能工具相关的笔记填写（OR = 4.95，95% CI：2.87 至 8.69，p<0.001）的简易程度有了显著改善。大多数受访者认为，人工智能工具减轻了记录负担，减少了临床时间以外的记录时间，降低了职业倦怠风险，提高了工作满意度，48%的受访者认为，如果有需要，可以多看一位病人。临床医生的专业类型和使用人工智能工具的天数对调查回复没有显著影响：讨论：临床医生在不同专科使用 Abridge 后，其工作经验和效率都得到了显著提高。

{"title":"Enhancing Clinical Documentation Workflow with Ambient Artificial Intelligence: Clinician Perspectives on Work Burden, Burnout, and Job Satisfaction","authors":"Michael Albrecht, Denton Shanks, Tina Shah, Taina Hudson, Jeffrey Thompson, Tanya Filardi, Kelli Wright, Greg Ator, Timothy Ryan Smith","doi":"10.1101/2024.08.12.24311883","DOIUrl":"https://doi.org/10.1101/2024.08.12.24311883","url":null,"abstract":"Objective: This study assessed the effects of an ambient artificial intelligence (AI) documentation platform on clinicians' perceptions of documentation workflow.\u0000Materials and Methods: A pre- and post-implementation survey evaluated ambulatory clinician perceptions on impact of Abridge, an ambient AI documentation platform. Outcomes included clinical documentation burden, work after-hours, clinician burnout, work satisfaction, and patient access. Data were analyzed using descriptive statistics and proportional odds logistic regression to compare changes for concordant questions across pre- and post-surveys. Covariate analysis examined effect of specialty type and duration of use of the AI tool. Results: Survey response rates were 51.1% (94/181) pre-implementation and 75.9% (101/133) post-implementation. Clinician perception of ease of documentation workflow (OR = 6.91, 95% CI: 3.90 to 12.56, p<0.001) and in completing notes associated with usage of the AI tool (OR = 4.95, 95% CI: 2.87 to 8.69, p<0.001) was significantly improved. The majority of respondents agreed that the AI tool decreased documentation burden, decreased the time spent documenting outside clinical hours, reduced burnout risk, and increased job satisfaction, with 48% agreeing that an additional patient could be seen if needed. Clinician specialty type and number of days using the AI tool did not significantly affect survey responses.\u0000Discussion: Clinician experience and efficiency was dramatically improved with use of Abridge across a breadth of specialties.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The MSPTDfast photoplethysmography beat detection algorithm: Design, benchmarking, and open-source distribution MSPTDfast 光心动图节拍检测算法：设计、基准测试和开源发布

medRxiv - Health Informatics

Pub Date : 2024-08-26 DOI: 10.1101/2024.08.23.24312514

Peter H Charlton, Erick Javier Arguello Prada, Jonathan Mant, Panicos A Kyriacou

Objective: Photoplethysmography is widely used for physiological monitoring, whether in clinical devices such as pulse oximeters, or consumer devices such as smartwatches. A key step in the analysis of photoplethysmogram (PPG) signals is detecting heartbeats. The MSPTD algorithm has been found to be one of the most accurate PPG beat detection algorithms, but is less computationally efficient than other algorithms. Therefore, the aim of this study was to develop a more efficient, open-source implementation of the MSPTD algorithm for PPG beat detection, named MSPTDfast (v.2). Approach: Five potential improvements to MSPTD were identified and evaluated on four datasets. MSPTDfast (v.2) was designed by incorporating each improvement which on its own reduced execu- tion time whilst maintaining a high F1-score. After internal validation, MSPTDfast (v.2) was benchmarked against state-of-the-art beat detection algorithms on four additional datasets. Main results: MSPTDfast (v.2) incorporated two key improvements: pre-processing PPG signals to reduce the sampling frequency to 20 Hz; and only calculating scalogram scales corresponding to heart rates >30 bpm. During internal validation MSPTDfast (v.2) was found to have an execution time of between approximately one-third and one-twentieth of MSPTD, and a comparable F1-score. During benchmarking MSPTDfast (v.2) was found to have the highest F1-score alongside MSPTD, and amongst one of the lowest execution times with only MSPTDfast (v.1), qppgfast and MMPD (v.2) achieving shorter execution times. Significance: MSPTDfast (v.2) is an accurate and efficient PPG beat detection algorithm, available in an open-source Matlab toolbox.

目的：无论是在脉搏血氧仪等临床设备中，还是在智能手表等消费设备中，光心动图都被广泛用于生理监测。分析光心动图（PPG）信号的一个关键步骤是检测心跳。研究发现，MSPTD 算法是最准确的 PPG 搏动检测算法之一，但其计算效率低于其他算法。因此，本研究的目的是为 PPG 搏动检测开发一种更高效的 MSPTD 算法开源实现，命名为 MSPTDfast (v.2)。方法：确定了 MSPTD 的五项潜在改进，并在四个数据集上进行了评估。MSPTDfast (v.2)的设计结合了每项改进，在保持较高 F1 分数的同时缩短了执行时间。经过内部验证后，MSPTDfast（v.2）在另外四个数据集上与最先进的节拍检测算法进行了比较。主要结果MSPTDfast (v.2)包含两项关键改进：预处理 PPG 信号，将采样频率降低到 20 Hz；只计算与心率 >30 bpm 相对应的刻度。在内部验证过程中，发现 MSPTDfast (v.2) 的执行时间约为 MSPTD 的三分之一到二十分之一，F1 分数也相当。在基准测试中，MSPTDfast（v.2）与 MSPTD 相比，F1 分数最高，执行时间最短，只有 MSPTDfast（v.1）、qppgfast 和 MMPD（v.2）的执行时间更短。意义重大：MSPTDfast (v.2) 是一种准确、高效的 PPG 搏动检测算法，可在 Matlab 开源工具箱中使用。

{"title":"The MSPTDfast photoplethysmography beat detection algorithm: Design, benchmarking, and open-source distribution","authors":"Peter H Charlton, Erick Javier Arguello Prada, Jonathan Mant, Panicos A Kyriacou","doi":"10.1101/2024.08.23.24312514","DOIUrl":"https://doi.org/10.1101/2024.08.23.24312514","url":null,"abstract":"Objective: Photoplethysmography is widely used for physiological monitoring, whether in clinical devices such as pulse oximeters, or consumer devices such as smartwatches. A key step in the analysis of photoplethysmogram (PPG) signals is detecting heartbeats. The MSPTD algorithm has been found to be one of the most accurate PPG beat detection algorithms, but is less computationally efficient than other algorithms. Therefore, the aim of this study was to develop a more efficient, open-source implementation of the MSPTD algorithm for PPG beat detection, named MSPTDfast (v.2). Approach: Five potential improvements to MSPTD were identified and evaluated on four datasets. MSPTDfast (v.2) was designed by incorporating each improvement which on its own reduced execu- tion time whilst maintaining a high F1-score. After internal validation, MSPTDfast (v.2) was benchmarked against state-of-the-art beat detection algorithms on four additional datasets. Main results: MSPTDfast (v.2) incorporated two key improvements: pre-processing PPG signals to reduce the sampling frequency to 20 Hz; and only calculating scalogram scales corresponding to heart rates >30 bpm. During internal validation MSPTDfast (v.2) was found to have an execution time of between approximately one-third and one-twentieth of MSPTD, and a comparable F1-score. During benchmarking MSPTDfast (v.2) was found to have the highest F1-score alongside MSPTD, and amongst one of the lowest execution times with only MSPTDfast (v.1), qppgfast and MMPD (v.2) achieving shorter execution times. Significance: MSPTDfast (v.2) is an accurate and efficient PPG beat detection algorithm, available in an open-source Matlab toolbox.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging Large Language Models for Identifying Interpretable Linguistic Markers and Enhancing Alzheimer's Disease Diagnostics 利用大型语言模型识别可解释的语言标记并增强阿尔茨海默病诊断能力

medRxiv - Health Informatics

Pub Date : 2024-08-23 DOI: 10.1101/2024.08.22.24312463

Tingyu Mo, Jacqueline Lam, Victor Li, Lawrence Cheung

Alzheimer's disease (AD) is a progressive and irreversible neurodegenerative disorder. Early detection of AD is crucial for timely disease intervention. This study proposes a novel LLM framework, which extracts interpretable linguistic markers from LLM models and incorporates them into supervised AD detection models, while evaluating their model performance and interpretability. Our work consists of the following novelties: First, we design in-context few-shot and zero-shot prompting strategies to facilitate LLMs in extracting high-level linguistic markers discriminative of AD and NC, providing interpretation and assessment of their strength, reliability and relevance to AD classification. Second, we incorporate linguistic markers extracted by LLMs into a smaller AI-driven model to enhance the performance of downstream supervised learning for AD classification, by assigning higher weights to the high-level linguistic markers/features extracted from LLMs. Third, we investigate whether the linguistic markers extracted by LLMs can enhance theaccuracy and interpretability of the downstream supervised learning-based models for AD detection. Our findings suggest that the accuracy of the LLM-extracted linguistic markers-led supervised learning model is less desirable as compared to their counterparts that do not incorporate LLM-extracted markers, highlighting the tradeoffs between interpretability and accuracy in supervised AD classification. Although the use of these interpretable markers may not immediately lead to improved detection accuracy, they significantly improve medical diagnosis and trustworthiness. These interpretable markers allow healthcare professionals to gain a deeper understanding of the linguistic changes that occur in individuals with AD, enabling them to make more informed decisions and provide better patient care.

阿尔茨海默病（AD）是一种进行性、不可逆的神经退行性疾病。早期发现阿尔茨海默病对于及时干预疾病至关重要。本研究提出了一种新颖的 LLM 框架，该框架可从 LLM 模型中提取可解释的语言标记，并将其纳入有监督的 AD 检测模型，同时评估其模型性能和可解释性。我们的工作包括以下新颖之处：首先，我们设计了语境中的 "少镜头 "和 "零镜头 "提示策略，以帮助 LLM 提取可区分 AD 和 NC 的高级语言标记，并对其强度、可靠性和与 AD 分类的相关性进行解释和评估。其次，我们将 LLM 提取的语言标记纳入一个较小的人工智能驱动模型，通过为从 LLM 提取的高级语言标记/特征分配更高的权重，提高下游监督学习的 AD 分类性能。第三，我们研究了从 LLMs 中提取的语言标记是否能提高基于下游监督学习的 AD 检测模型的准确性和可解释性。我们的研究结果表明，由 LLM 提取的语言标记主导的监督学习模型的准确性不如不包含 LLM 提取的标记的同类模型那么理想，这凸显了在 AD 监督分类中可解释性和准确性之间的权衡。虽然使用这些可解释标记物可能不会立即提高检测准确性，但它们能显著改善医疗诊断和可信度。这些可解释标记让医护人员能够更深入地了解注意力缺失症患者的语言变化，从而做出更明智的决定，为患者提供更好的护理。

{"title":"Leveraging Large Language Models for Identifying Interpretable Linguistic Markers and Enhancing Alzheimer's Disease Diagnostics","authors":"Tingyu Mo, Jacqueline Lam, Victor Li, Lawrence Cheung","doi":"10.1101/2024.08.22.24312463","DOIUrl":"https://doi.org/10.1101/2024.08.22.24312463","url":null,"abstract":"Alzheimer's disease (AD) is a progressive and irreversible neurodegenerative disorder. Early detection of AD is crucial for timely disease intervention. This study proposes a novel LLM framework, which extracts interpretable linguistic markers from LLM models and incorporates them into supervised AD detection models, while evaluating their model performance and interpretability. Our work consists of the following novelties: First, we design in-context few-shot and zero-shot prompting strategies to facilitate LLMs in extracting high-level linguistic markers discriminative of AD and NC, providing interpretation and assessment of their strength, reliability and relevance to AD classification. Second, we incorporate linguistic markers extracted by LLMs into a smaller AI-driven model to enhance the performance of downstream supervised learning for AD classification, by assigning higher weights to the high-level linguistic markers/features extracted from LLMs. Third, we investigate whether the linguistic markers extracted by LLMs can enhance theaccuracy and interpretability of the downstream supervised learning-based models for AD detection. Our findings suggest that the accuracy of the LLM-extracted linguistic markers-led supervised learning model is less desirable as compared to their counterparts that do not incorporate LLM-extracted markers, highlighting the tradeoffs between interpretability and accuracy in supervised AD classification. Although the use of these interpretable markers may not immediately lead to improved detection accuracy, they significantly improve medical diagnosis and trustworthiness. These interpretable markers allow healthcare professionals to gain a deeper understanding of the linguistic changes that occur in individuals with AD, enabling them to make more informed decisions and provide better patient care.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"95 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Health Data Nexus: An Open Data Platform for AI Research and Education in Medicine 健康数据纽带：医学人工智能研究和教育的开放数据平台

medRxiv - Health Informatics

Pub Date : 2024-08-23 DOI: 10.1101/2024.08.23.24312060

January L Adams, Rafal Cymerys, Karol Szuster, Daniel Hekman, Zoryana Salo, Rutvik Solanki, Muhammad Mamdani, Alistair Johnson, Katarzyna Ryniak, Tom L Pollard, David Rotenberg, Benjamin Haibe-Kains

We outline the development of the Health Data Nexus, a data platform which enables data storage and access management with a cloud-based computational environment. We describe the importance of this secure platform in an evolving public sector research landscape that utilizes significant quantities of data, particularly clinical data acquired from health systems, as well as the importance of providing meaningful benefits for three targeted user groups: data providers, researchers, and educators. We then describe the implementation of governance practices, technical standards, and data security and privacy protections needed to build this platform, as well as example use-cases highlighting the strengths of the platform in facilitating dataset acquisition, novel research, and hosting educational courses, workshops, and datathons. Finally, we discuss the key principles that informed the platform's development, highlighting the importance of flexible uses, collaborative development, and open-source science.

我们概述了 Health Data Nexus 数据平台的开发过程，该平台通过基于云的计算环境实现数据存储和访问管理。我们介绍了这一安全平台在不断发展的公共部门研究领域中的重要性，该领域使用了大量数据，尤其是从卫生系统获取的临床数据，以及为三个目标用户群（数据提供者、研究人员和教育工作者）提供有意义的好处的重要性。然后，我们介绍了构建该平台所需的管理实践、技术标准以及数据安全和隐私保护措施的实施情况，并举例说明了该平台在促进数据集获取、新颖研究以及举办教育课程、研讨会和数据马拉松等方面的优势。最后，我们讨论了平台开发的关键原则，强调了灵活使用、合作开发和开源科学的重要性。

{"title":"Health Data Nexus: An Open Data Platform for AI Research and Education in Medicine","authors":"January L Adams, Rafal Cymerys, Karol Szuster, Daniel Hekman, Zoryana Salo, Rutvik Solanki, Muhammad Mamdani, Alistair Johnson, Katarzyna Ryniak, Tom L Pollard, David Rotenberg, Benjamin Haibe-Kains","doi":"10.1101/2024.08.23.24312060","DOIUrl":"https://doi.org/10.1101/2024.08.23.24312060","url":null,"abstract":"We outline the development of the Health Data Nexus, a data platform which enables data storage and access management with a cloud-based computational environment. We describe the importance of this secure platform in an evolving public sector research landscape that utilizes significant quantities of data, particularly clinical data acquired from health systems, as well as the importance of providing meaningful benefits for three targeted user groups: data providers, researchers, and educators. We then describe the implementation of governance practices, technical standards, and data security and privacy protections needed to build this platform, as well as example use-cases highlighting the strengths of the platform in facilitating dataset acquisition, novel research, and hosting educational courses, workshops, and datathons. Finally, we discuss the key principles that informed the platform's development, highlighting the importance of flexible uses, collaborative development, and open-source science.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

U-Net as a deep learning-based method for platelets segmentation in microscopic images U-Net 是一种基于深度学习的显微图像血小板分割方法

medRxiv - Health Informatics

Pub Date : 2024-08-23 DOI: 10.1101/2024.08.23.24312502

Eva Maria Valerio de Sousa, Ajay Kumar, Charlie Coupland, Tânia F. Vaz, Will Jones, Rubén Valcarce-Diñeiro, Simon D.J. Calaminus

Manual counting of platelets, in microscopy images, is greatly time-consuming. Our goal was to automatically segment and count platelets images using a deep learning approach, applying U-Net and Fully Convolutional Network (FCN) modelling. Data preprocessing was done by creating binary masks and utilizing supervised learning with ground-truth labels. Data augmentation was implemented, for improved model robustness and detection. The number of detected regions was then retrieved as a count. The study investigated the U-Net models performance with different datasets, indicating notable improvements in segmentation metrics as the dataset size increased, while FCN performance was only evaluated with the smaller dataset and abandoned due to poor results. U-Net surpassed FCN in both detection and counting measures in the smaller dataset Dice 0.90, accuracy of 0.96 (U-Net) vs Dice 0.60 and 0.81 (FCN). When tested in a bigger dataset U-Net produced even better values (Dice 0.99, accuracy of 0.98). The U-Net model proves to be particularly effective as the dataset size increases, showcasing its versatility and accuracy in handling varying cell sizes and appearances. These data show potential areas for further improvement and the promising application of deep learning in automating cell segmentation for diverse life science research applications.

在显微镜图像中手动计数血小板非常耗时。我们的目标是采用深度学习方法，应用 U-Net 和全卷积网络 (FCN) 建模，自动分割和计算血小板图像。数据预处理是通过创建二进制掩码和利用地面实况标签进行监督学习来完成的。为了提高模型的鲁棒性和检测能力，还进行了数据扩增。然后以计数的形式检索检测到的区域数量。研究调查了 U-Net 模型在不同数据集上的性能，结果表明，随着数据集规模的扩大，模型在分割指标上有了显著改善，而 FCN 性能只在较小的数据集上进行了评估，并因效果不佳而放弃。在较小的数据集 Dice 0.90 中，U-Net 的检测和计数指标都超过了 FCN，准确率为 0.96（U-Net），而 Dice 0.60 和 0.81（FCN）。在更大的数据集中进行测试时，U-Net 的结果甚至更好（Dice 0.99，准确率 0.98）。事实证明，随着数据集规模的扩大，U-Net 模型尤其有效，它在处理不同的细胞大小和外观时显示出了多功能性和准确性。这些数据显示了进一步改进的潜在领域，以及深度学习在自动化细胞分割方面的应用前景，可用于各种生命科学研究应用。

{"title":"U-Net as a deep learning-based method for platelets segmentation in microscopic images","authors":"Eva Maria Valerio de Sousa, Ajay Kumar, Charlie Coupland, Tânia F. Vaz, Will Jones, Rubén Valcarce-Diñeiro, Simon D.J. Calaminus","doi":"10.1101/2024.08.23.24312502","DOIUrl":"https://doi.org/10.1101/2024.08.23.24312502","url":null,"abstract":"Manual counting of platelets, in microscopy images, is greatly time-consuming. Our goal was to automatically segment and count platelets images using a deep learning approach, applying U-Net and Fully Convolutional Network (FCN) modelling. Data preprocessing was done by creating binary masks and utilizing supervised learning with ground-truth labels. Data augmentation was implemented, for improved model robustness and detection. The number of detected regions was then retrieved as a count. The study investigated the U-Net models performance with different datasets, indicating notable improvements in segmentation metrics as the dataset size increased, while FCN performance was only evaluated with the smaller dataset and abandoned due to poor results. U-Net surpassed FCN in both detection and counting measures in the smaller dataset Dice 0.90, accuracy of 0.96 (U-Net) vs Dice 0.60 and 0.81 (FCN). When tested in a bigger dataset U-Net produced even better values (Dice 0.99, accuracy of 0.98). The U-Net model proves to be particularly effective as the dataset size increases, showcasing its versatility and accuracy in handling varying cell sizes and appearances. These data show potential areas for further improvement and the promising application of deep learning in automating cell segmentation for diverse life science research applications.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0