首页 > 最新文献

medRxiv - Health Informatics最新文献

英文 中文
MedPodGPT: A multilingual audio-augmented large language model for medical research and education MedPodGPT:用于医学研究和教育的多语种音频增强大语言模型
Pub Date : 2024-07-12 DOI: 10.1101/2024.07.11.24310304
Shuyue Jia, Subhrangshu Bit, Edward Searls, Lindsey Claus, Pengrui Fan, Varuna H. Jasodanand, Meagan V. Lauber, Divya Veerapaneni, William M. Wang, Rhoda Au, Vijaya B Kolachalama
The proliferation of medical podcasts has generated an extensive repository of audio content, rich in specialized terminology, diverse medical topics, and expert dialogues. Here we introduce a computational framework designed to enhance large language models (LLMs) by leveraging the informational content of publicly accessible medical podcast data. This dataset, comprising over 4,300 hours of audio content, was transcribed to generate over 39 million text tokens. Our model, MedPodGPT, integrates the varied dialogue found in medical podcasts to improve understanding of natural language nuances, cultural contexts, and medical knowledge. Evaluated across multiple benchmarks, MedPodGPT demonstrated an average improvement of 2.31% over standard open-source benchmarks and showcased an improvement of 2.58% in its zero-shot multilingual transfer ability, effectively generalizing to different linguistic contexts. By harnessing the untapped potential of podcast content, MedPodGPT advances natural language processing, offering enhanced capabilities for various applications in medical research and education.
医疗播客的激增产生了大量的音频内容,其中包含丰富的专业术语、不同的医疗主题和专家对话。在此,我们介绍一种计算框架,旨在利用可公开访问的医疗播客数据的信息内容来增强大型语言模型(LLM)。该数据集包含 4,300 多个小时的音频内容,经过转录后生成了 3,900 多万个文本标记。我们的模型 MedPodGPT 整合了医疗播客中的各种对话,以提高对自然语言细微差别、文化背景和医学知识的理解。通过多个基准评估,MedPodGPT 与标准开源基准相比平均提高了 2.31%,其零点多语言传输能力提高了 2.58%,可有效适用于不同的语言环境。通过利用播客内容尚未开发的潜力,MedPodGPT 推进了自然语言处理,为医学研究和教育领域的各种应用提供了更强大的功能。
{"title":"MedPodGPT: A multilingual audio-augmented large language model for medical research and education","authors":"Shuyue Jia, Subhrangshu Bit, Edward Searls, Lindsey Claus, Pengrui Fan, Varuna H. Jasodanand, Meagan V. Lauber, Divya Veerapaneni, William M. Wang, Rhoda Au, Vijaya B Kolachalama","doi":"10.1101/2024.07.11.24310304","DOIUrl":"https://doi.org/10.1101/2024.07.11.24310304","url":null,"abstract":"The proliferation of medical podcasts has generated an extensive repository of audio content, rich in specialized terminology, diverse medical topics, and expert dialogues. Here we introduce a computational framework designed to enhance large language models (LLMs) by leveraging the informational content of publicly accessible medical podcast data. This dataset, comprising over 4,300 hours of audio content, was transcribed to generate over 39 million text tokens. Our model, MedPodGPT, integrates the varied dialogue found in medical podcasts to improve understanding of natural language nuances, cultural contexts, and medical knowledge. Evaluated across multiple benchmarks, MedPodGPT demonstrated an average improvement of 2.31% over standard open-source benchmarks and showcased an improvement of 2.58% in its zero-shot multilingual transfer ability, effectively generalizing to different linguistic contexts. By harnessing the untapped potential of podcast content, MedPodGPT advances natural language processing, offering enhanced capabilities for various applications in medical research and education.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2D Transfer Learning for ECG Classification using Continuous Wavelet Transform 利用连续小波变换进行心电图分类的二维迁移学习
Pub Date : 2024-07-11 DOI: 10.1101/2024.07.11.24310258
Wei Zhang
Advanced deep neural networks, when trained on extensive datasets, can outperform cardiologists in diagnosing cardiac arrhythmias. However, the availability of large-scale training data is often impractical. This study explores the use of transfer learning to identify and classify three ECG patterns. It applies knowledge gained from 2D image classification tasks to the domain of 1D time-series ECG signal classification. The research leverages various deep learning models to classify continuous wavelet transform (2D representations) of ECG signals. The effectiveness of these transferred deep learning models in classifying ECG time-series data is then evaluated.
先进的深度神经网络在广泛的数据集上接受训练后,在诊断心律失常方面可胜过心脏病专家。然而,大规模训练数据的可用性往往不切实际。本研究探讨了如何利用迁移学习来识别和分类三种心电图模式。它将从二维图像分类任务中获得的知识应用到一维时间序列心电信号分类领域。研究利用各种深度学习模型对心电图信号的连续小波变换(二维表示)进行分类。然后评估了这些转移的深度学习模型在心电图时间序列数据分类中的有效性。
{"title":"2D Transfer Learning for ECG Classification using Continuous Wavelet Transform","authors":"Wei Zhang","doi":"10.1101/2024.07.11.24310258","DOIUrl":"https://doi.org/10.1101/2024.07.11.24310258","url":null,"abstract":"Advanced deep neural networks, when trained on extensive datasets, can outperform cardiologists in diagnosing cardiac arrhythmias. However, the availability of large-scale training data is often impractical. This study explores the use of transfer learning to identify and classify three ECG patterns. It applies knowledge gained from 2D image classification tasks to the domain of 1D time-series ECG signal classification. The research leverages various deep learning models to classify continuous wavelet transform (2D representations) of ECG signals. The effectiveness of these transferred deep learning models in classifying ECG time-series data is then evaluated.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EARLY LUNG CANCER SCREENING: A COMPARATIVE STUDY OF CNN AND RADIOMICS MODELS WITH PULMONARY NODULE BIOLOGIC CHARACTERIZATION 早期肺癌筛查:CNN 和放射组学模型与肺结节生物特征的比较研究
Pub Date : 2024-07-10 DOI: 10.1101/2024.07.06.24309995
Mukund Gupta, Edbert Victor Fandy, Krrish Ghindani
Lung cancer has become an increasingly prevalent disease, with an estimated 125,070 deaths in theUnited States alone in 2024 ( 5). To improve patient outcomes and assist doctors in differentiating between benign and malignant pulmonary nodules, this paper developed a Convolutional Neural Network (CNN) model for early binary detection of pulmonary nodules and assessed its effectiveness compared to other approaches. The CNN model showed an accuracy of 98.47%, while the radiomics-based SVM-LASSO model and the Lung-RADS system showed accuracies of 84.6% and 72.2%respectively. This demonstrates that the CNN model is significantly more effective for the earlybinary detection of pulmonary nodules than both the radiomics-based model and the Lung-RADSsystem. The paper also discusses the applications of Deep Learning in healthcare, concluding thatalthough AI proves to be an effective method for early lung cancer detection, more research is needed to carefully assess the role and impact of AI in healthcare.
肺癌已成为一种日益流行的疾病,据估计,2024 年仅在美国就有 125,070 人死于肺癌 ( 5)。为了改善患者的预后并协助医生区分肺结节的良性和恶性,本文开发了一种卷积神经网络(CNN)模型,用于肺结节的早期二元检测,并评估了其与其他方法相比的有效性。CNN 模型的准确率为 98.47%,而基于放射组学的 SVM-LASSO 模型和 Lung-RADS 系统的准确率分别为 84.6% 和 72.2%。这表明,CNN 模型在肺结节的早期二元检测方面明显比基于放射组学的模型和 Lung-RADS 系统更有效。论文还讨论了深度学习在医疗保健领域的应用,并得出结论:虽然人工智能被证明是早期肺癌检测的有效方法,但还需要更多的研究来仔细评估人工智能在医疗保健领域的作用和影响。
{"title":"EARLY LUNG CANCER SCREENING: A COMPARATIVE STUDY OF CNN AND RADIOMICS MODELS WITH PULMONARY NODULE BIOLOGIC CHARACTERIZATION","authors":"Mukund Gupta, Edbert Victor Fandy, Krrish Ghindani","doi":"10.1101/2024.07.06.24309995","DOIUrl":"https://doi.org/10.1101/2024.07.06.24309995","url":null,"abstract":"Lung cancer has become an increasingly prevalent disease, with an estimated 125,070 deaths in the\u0000United States alone in 2024 ( 5). To improve patient outcomes and assist doctors in differentiating between benign and malignant pulmonary nodules, this paper developed a Convolutional Neural Network (CNN) model for early binary detection of pulmonary nodules and assessed its effectiveness compared to other approaches. The CNN model showed an accuracy of 98.47%, while the radiomics-based SVM-LASSO model and the Lung-RADS system showed accuracies of 84.6% and 72.2%\u0000respectively. This demonstrates that the CNN model is significantly more effective for the early\u0000binary detection of pulmonary nodules than both the radiomics-based model and the Lung-RADS\u0000system. The paper also discusses the applications of Deep Learning in healthcare, concluding that\u0000although AI proves to be an effective method for early lung cancer detection, more research is needed to carefully assess the role and impact of AI in healthcare.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"70 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566861","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Characteristics of Suicide Prevention Apps: A Content Analysis of Apps Available in Canada and the United Kingdom 预防自杀应用程序的特点:对加拿大和英国现有应用程序的内容分析
Pub Date : 2024-07-10 DOI: 10.1101/2024.07.10.24310091
Laura Bennett-Poynter, Samantha Groves, Jessica Kemp, Hwayeon Danielle Shin, Lydia Sequeira, Karen Lascelles, Gillian Strudwick
Objective: We aimed to examine the characteristics, features, and content of suicide prevention mobile apps available in app stores in Canada and the United Kingdom.Design: Suicide prevention apps were identified from Apple and Android app stores between March-April 2023. Apps were screened against predefined inclusion criteria, and duplicate apps were removed. Data were then extracted based on descriptive (e.g., genre, app developer), security (e.g., password protection), and design features (e.g., personalization options). Content of apps were assessed using the Essential Features Framework. Extracted data were analyzed using a content analysis approach including narrative frequencies and descriptive statistics.Results: Fifty-two (n=52) suicide prevention apps were included within the review. Most were tailored for the general population and were in English language only. One app had the option to increase app accessibility by offering content presented using sign language. Many apps allowed some form of personalization by adding text content, however most did not facilitate further customization such as the ability to upload photo and audio content. All identified apps included content from at least one of the domains of the Essential Features Framework. The most commonly included domains were sources of suicide prevention support, and information about suicide. The domain least frequently included was screening tools followed by wellness content. No identified apps had the ability to be linked to patient medical records.Conclusions: The findings of this research present implications for the development of future suicide prevention apps. Development of a co-produced suicide prevention app which is accessible, allows for personalization, and can be integrated into clinical care may present an opportunity to enhance suicide prevention support for individuals experiencing suicidal thoughts and behaviours.
目的我们旨在研究加拿大和英国应用程序商店中的自杀预防移动应用程序的特点、功能和内容:设计:2023 年 3 月至 4 月期间,我们从苹果和安卓应用商店中找到了预防自杀的应用程序。根据预定义的纳入标准对应用程序进行筛选,并删除重复的应用程序。然后根据描述性(如类型、应用开发者)、安全性(如密码保护)和设计特点(如个性化选项)提取数据。应用程序的内容使用 "基本功能框架 "进行评估。提取的数据采用内容分析法进行分析,包括叙述频率和描述性统计:52款(n=52)预防自杀应用程序被纳入审查范围。大多数应用程序是为普通人群量身定制的,且仅使用英语。其中一款应用程序可通过提供手语内容来提高应用程序的可访问性。许多应用程序允许通过添加文本内容进行某种形式的个性化定制,但大多数应用程序不支持进一步的定制,如上传照片和音频内容。所有已确认的应用程序都包含了 "基本功能框架 "中至少一个领域的内容。最常包含的领域是自杀预防支持来源和自杀相关信息。包含最少的领域是筛查工具,其次是健康内容。所发现的应用程序均无法与患者的医疗记录建立联系:本研究结果对未来自杀预防应用程序的开发具有启示意义。共同开发一款易于使用、可实现个性化并能与临床护理相结合的预防自杀应用程序,将为加强对有自杀想法和行为的人的自杀预防支持提供机会。
{"title":"Characteristics of Suicide Prevention Apps: A Content Analysis of Apps Available in Canada and the United Kingdom","authors":"Laura Bennett-Poynter, Samantha Groves, Jessica Kemp, Hwayeon Danielle Shin, Lydia Sequeira, Karen Lascelles, Gillian Strudwick","doi":"10.1101/2024.07.10.24310091","DOIUrl":"https://doi.org/10.1101/2024.07.10.24310091","url":null,"abstract":"Objective: We aimed to examine the characteristics, features, and content of suicide prevention mobile apps available in app stores in Canada and the United Kingdom.\u0000Design: Suicide prevention apps were identified from Apple and Android app stores between March-April 2023. Apps were screened against predefined inclusion criteria, and duplicate apps were removed. Data were then extracted based on descriptive (e.g., genre, app developer), security (e.g., password protection), and design features (e.g., personalization options). Content of apps were assessed using the Essential Features Framework. Extracted data were analyzed using a content analysis approach including narrative frequencies and descriptive statistics.\u0000Results: Fifty-two (n=52) suicide prevention apps were included within the review. Most were tailored for the general population and were in English language only. One app had the option to increase app accessibility by offering content presented using sign language. Many apps allowed some form of personalization by adding text content, however most did not facilitate further customization such as the ability to upload photo and audio content. All identified apps included content from at least one of the domains of the Essential Features Framework. The most commonly included domains were sources of suicide prevention support, and information about suicide. The domain least frequently included was screening tools followed by wellness content. No identified apps had the ability to be linked to patient medical records.\u0000Conclusions: The findings of this research present implications for the development of future suicide prevention apps. Development of a co-produced suicide prevention app which is accessible, allows for personalization, and can be integrated into clinical care may present an opportunity to enhance suicide prevention support for individuals experiencing suicidal thoughts and behaviours.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"434 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141587377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of a Primary Aldosteronism Predictive Model in Secondary Hypertension Decision Support 原发性醛固酮增多症预测模型对继发性高血压决策支持的影响
Pub Date : 2024-07-10 DOI: 10.1101/2024.07.09.24310088
Peter Bowman Mack, Casey Cole, Mintaek Lee, Lisa Peterson, Matthew Lundy, Karen Elizabeth Hegarty, William Espinoza
Objective: To determine whether the addition of a primary aldosteronism (PA) predictive model to a secondary hypertension decision supporttool increases screening for PA in a primary care setting.Materials and Methods: 153 primary care clinics were randomized to receive a secondary hypertension decision support tool with or withoutan integrated predictive model between August 2023 and April 2024.Results: For patients with risk scores in the top 1 percentile, 63/2,896 (2.2%) patients where the alert was displayed in model clinics had theorder set launched while 12/1,210 (1.0%) in no model clinics had the order set launched (P = 0.014). 19/2,896 (0.66%) of these highest riskpatients in model clinics had an ARR ordered compared to 0/1,210 (0.0%) patients in no model clinics (P = 0.010). For patients with scoresnot in the top 1 percentile, 438/20,493 (2.1%) patients in model clinics had the order set launched compared to 273/17,820 (1.5%) in no modelclinics (P < 0.001). 124/20,493 (0.61%) in model clinics had an ARR ordered compared to 34/17,820 (0.19%) in the no model clinics (P <0.001).Discussion: The addition of a PA predictive model to secondary hypertension alert displays and triggering criteria along with order set displaysand order preselection criteria results in a statistically and clinically significant increase in screening for PA, a condition that cliniciansinsufficiently screen for currently.Conclusion: Addition of a predictive model for an under-screened condition to traditional clinical decision support may increase screening forthese conditions.
目的:确定在二级高血压决策支持工具中添加原发性醛固酮增多症(PA)预测模型是否能提高初级医疗机构的 PA 筛查率:材料与方法:在 2023 年 8 月至 2024 年 4 月期间,对 153 家初级保健诊所进行了随机分组,以接受带或不带集成预测模型的二级高血压决策支持工具:对于风险评分在前1个百分位数的患者,有63/2896(2.2%)名患者在模型诊所显示了警报,并启动了订单集,而12/1210(1.0%)名患者在无模型诊所启动了订单集(P=0.014)。在这些高风险患者中,有 19/2,896 人(0.66%)在模式诊所中下达了 ARR,而在无模式诊所中,只有 0/1,210 人(0.0%)下达了 ARR(P = 0.010)。对于得分不在前 1 个百分位数的患者,有 438/20,493 名(2.1%)示范诊所的患者下达了订单,而无示范诊所的患者为 273/17,820 名(1.5%)(P <0.001)。124/20,493(0.61%)名模型诊所的患者下达了 ARR 订单,而无模型诊所的患者为 34/17,820(0.19%)(P <0.001):讨论:将 PA 预测模型添加到继发性高血压警报显示和触发标准以及医嘱集显示和医嘱预选标准中,可在统计学和临床上显著提高 PA 筛查率,而目前临床医生对 PA 的筛查并不充分:结论:在传统的临床决策支持中加入筛查不足病症的预测模型,可提高对这些病症的筛查率。
{"title":"The Impact of a Primary Aldosteronism Predictive Model in Secondary Hypertension Decision Support","authors":"Peter Bowman Mack, Casey Cole, Mintaek Lee, Lisa Peterson, Matthew Lundy, Karen Elizabeth Hegarty, William Espinoza","doi":"10.1101/2024.07.09.24310088","DOIUrl":"https://doi.org/10.1101/2024.07.09.24310088","url":null,"abstract":"Objective: To determine whether the addition of a primary aldosteronism (PA) predictive model to a secondary hypertension decision support\u0000tool increases screening for PA in a primary care setting.\u0000Materials and Methods: 153 primary care clinics were randomized to receive a secondary hypertension decision support tool with or without\u0000an integrated predictive model between August 2023 and April 2024.\u0000Results: For patients with risk scores in the top 1 percentile, 63/2,896 (2.2%) patients where the alert was displayed in model clinics had the\u0000order set launched while 12/1,210 (1.0%) in no model clinics had the order set launched (P = 0.014). 19/2,896 (0.66%) of these highest risk\u0000patients in model clinics had an ARR ordered compared to 0/1,210 (0.0%) patients in no model clinics (P = 0.010). For patients with scores\u0000not in the top 1 percentile, 438/20,493 (2.1%) patients in model clinics had the order set launched compared to 273/17,820 (1.5%) in no model\u0000clinics (P &lt; 0.001). 124/20,493 (0.61%) in model clinics had an ARR ordered compared to 34/17,820 (0.19%) in the no model clinics (P &lt;\u00000.001).\u0000Discussion: The addition of a PA predictive model to secondary hypertension alert displays and triggering criteria along with order set displays\u0000and order preselection criteria results in a statistically and clinically significant increase in screening for PA, a condition that clinicians\u0000insufficiently screen for currently.\u0000Conclusion: Addition of a predictive model for an under-screened condition to traditional clinical decision support may increase screening for\u0000these conditions.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141577415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TLFT: Transfer Learning and Fourier Transform for ECG Classification TLFT:用于心电图分类的迁移学习和傅立叶变换
Pub Date : 2024-07-10 DOI: 10.1101/2024.07.09.24310152
Erick Wang, Sarah Lee
Electrocardiogram (ECG) provides a non-invasive method for identifying cardiac issues, particularly arrhythmias or irregular heartbeats. In recent years, the fields of artificial intelligence and machine learning have made significant inroads into various healthcare applications, including the development of arrhythmia classifiers using deep learning techniques. However, a persistent challenge in this domain is the limited availability of large, well-annotated ECG datasets, which are crucial for building and evaluating robust machine learning models.To address this limitation, we propose a novel deep transfer learning framework designed to perform effectively on small training datasets. Our approach involves fine-tuning ResNet-18, a general-purpose image classifier, using the MIT-BIH arrhythmia dataset. This method aims to leverage the power of transfer learning to overcome the constraints of limited data availability.Furthermore, this paper conducts a critical examination of existing deep learning models in the field of ECG analysis. Our investigation reveals that many of these models suffer from methodological flaws, particularly in terms of data leakage. This issue potentially leads to overly optimistic performance estimates and raises concerns about the reliability and generalizability of these models in real-world clinical applications.By addressing these challenges, our work contributes to the advancement of more robust and reliable ECG analysis techniques, potentially improving the accuracy and applicability of automated arrhythmia detection in clinical settings.
心电图(ECG)提供了一种无创方法来识别心脏问题,尤其是心律失常或不规则心跳。近年来,人工智能和机器学习领域在各种医疗保健应用中取得了重大进展,包括利用深度学习技术开发心律失常分类器。然而,该领域长期存在的一个挑战是,大型、有良好注释的心电图数据集的可用性有限,而这些数据集对于构建和评估稳健的机器学习模型至关重要。为了解决这一局限性,我们提出了一种新颖的深度迁移学习框架,旨在有效地在小型训练数据集上执行。我们的方法包括使用麻省理工学院-BIH 心律失常数据集对通用图像分类器 ResNet-18 进行微调。此外,本文还对心电图分析领域现有的深度学习模型进行了批判性研究。我们的调查显示,这些模型中有许多存在方法论缺陷,尤其是在数据泄露方面。通过应对这些挑战,我们的工作有助于推动更强大、更可靠的心电图分析技术的发展,从而提高临床环境中自动心律失常检测的准确性和适用性。
{"title":"TLFT: Transfer Learning and Fourier Transform for ECG Classification","authors":"Erick Wang, Sarah Lee","doi":"10.1101/2024.07.09.24310152","DOIUrl":"https://doi.org/10.1101/2024.07.09.24310152","url":null,"abstract":"Electrocardiogram (ECG) provides a non-invasive method for identifying cardiac issues, particularly arrhythmias or irregular heartbeats. In recent years, the fields of artificial intelligence and machine learning have made significant inroads into various healthcare applications, including the development of arrhythmia classifiers using deep learning techniques. However, a persistent challenge in this domain is the limited availability of large, well-annotated ECG datasets, which are crucial for building and evaluating robust machine learning models.\u0000To address this limitation, we propose a novel deep transfer learning framework designed to perform effectively on small training datasets. Our approach involves fine-tuning ResNet-18, a general-purpose image classifier, using the MIT-BIH arrhythmia dataset. This method aims to leverage the power of transfer learning to overcome the constraints of limited data availability.\u0000Furthermore, this paper conducts a critical examination of existing deep learning models in the field of ECG analysis. Our investigation reveals that many of these models suffer from methodological flaws, particularly in terms of data leakage. This issue potentially leads to overly optimistic performance estimates and raises concerns about the reliability and generalizability of these models in real-world clinical applications.\u0000By addressing these challenges, our work contributes to the advancement of more robust and reliable ECG analysis techniques, potentially improving the accuracy and applicability of automated arrhythmia detection in clinical settings.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing a natural language processing system using transformer-based models for adverse drug event detection in electronic health records 利用基于转换器的模型开发自然语言处理系统,用于检测电子健康记录中的药物不良事件
Pub Date : 2024-07-10 DOI: 10.1101/2024.07.09.24310100
Jingyuan Wu, Xiaodi Ruan, Elizabeth McNeer, Katelyn M. Rossow, Leena Choi
Objective:To develop a transformer-based natural language processing (NLP) system for detecting adverse drug events (ADEs) from clinical notes in electronic health records (EHRs).Materials and Methods:We fine-tuned BERT Short-Formers and Clinical-Longformer using the processed dataset of the 2018 National NLP Clinical Challenges (n2c2) shared task Track 2. We investigated two data processing methods, window-based and split-based approaches, to find an optimal processing method. We evaluated the generalization capabilities on a dataset extracted from Vanderbilt University Medical Center (VUMC) EHRs.Results:On the n2c2 dataset, the best average macro F-scores of 0.832 and 0.868 were achieved using a 15-word window with PubMedBERT and a 10-chunk split with Clinical-Longformer. On the VUMC dataset, the best average macro F-scores of 0.720 and 0.786 were achieved using a 4-chunk split with PubMedBERT and Clinical-Longformer.Discussion:Our study provided a comparative analysis of data processing methods. The fine-tuned transformer models showed good performance for ADE-related tasks. Especially, Clinical-Longformer model with split-based approach had a great potential for practical implementation of ADE detection. While the token limit was crucial, the chunk size also significantly influenced model performance, even when the text length was within the token limit.Conclusion:We provided guidance on model development, including data processing methods for ADE detection from clinical notes using transformer-based models. Our results on two datasets indicated that data processing methods and models should be carefully selected based on the type of clinical notes and the allocation trade-offs of human and computational power in annotation and model fine-tuning.
目的:开发一种基于转换器的自然语言处理(NLP)系统,用于从电子健康记录(EHR)中的临床笔记中检测药物不良事件(ADE)。材料与方法:我们使用2018年全国NLP临床挑战赛(n2c2)共享任务轨道2的处理数据集,对BERT Short-Formers和Clinical-Longformer进行了微调。我们研究了两种数据处理方法,即基于窗口的方法和基于分割的方法,以找到最佳的处理方法。结果表明:在 n2c2 数据集上,使用 PubMedBERT 的 15 字窗口和 Clinical-Longformer 的 10 块分割,分别获得了 0.832 和 0.868 的最佳平均宏 F 分数。讨论:我们的研究对数据处理方法进行了比较分析。微调转换器模型在 ADE 相关任务中表现良好。尤其是基于拆分方法的 Clinical-Longformer 模型在 ADE 检测的实际应用中潜力巨大。结论:我们为模型开发提供了指导,包括使用基于转换器的模型从临床笔记中检测 ADE 的数据处理方法。我们对两个数据集的研究结果表明,应根据临床笔记的类型以及注释和模型微调过程中人力和计算力的分配权衡,谨慎选择数据处理方法和模型。
{"title":"Developing a natural language processing system using transformer-based models for adverse drug event detection in electronic health records","authors":"Jingyuan Wu, Xiaodi Ruan, Elizabeth McNeer, Katelyn M. Rossow, Leena Choi","doi":"10.1101/2024.07.09.24310100","DOIUrl":"https://doi.org/10.1101/2024.07.09.24310100","url":null,"abstract":"Objective:\u0000To develop a transformer-based natural language processing (NLP) system for detecting adverse drug events (ADEs) from clinical notes in electronic health records (EHRs).\u0000Materials and Methods:\u0000We fine-tuned BERT Short-Formers and Clinical-Longformer using the processed dataset of the 2018 National NLP Clinical Challenges (n2c2) shared task Track 2. We investigated two data processing methods, window-based and split-based approaches, to find an optimal processing method. We evaluated the generalization capabilities on a dataset extracted from Vanderbilt University Medical Center (VUMC) EHRs.\u0000Results:\u0000On the n2c2 dataset, the best average macro F-scores of 0.832 and 0.868 were achieved using a 15-word window with PubMedBERT and a 10-chunk split with Clinical-Longformer. On the VUMC dataset, the best average macro F-scores of 0.720 and 0.786 were achieved using a 4-chunk split with PubMedBERT and Clinical-Longformer.\u0000Discussion:\u0000Our study provided a comparative analysis of data processing methods. The fine-tuned transformer models showed good performance for ADE-related tasks. Especially, Clinical-Longformer model with split-based approach had a great potential for practical implementation of ADE detection. While the token limit was crucial, the chunk size also significantly influenced model performance, even when the text length was within the token limit.\u0000Conclusion:\u0000We provided guidance on model development, including data processing methods for ADE detection from clinical notes using transformer-based models. Our results on two datasets indicated that data processing methods and models should be carefully selected based on the type of clinical notes and the allocation trade-offs of human and computational power in annotation and model fine-tuning.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141587471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Extracting the Sample Size From Randomized Controlled Trials in Explainable Fashion Using Natural Language Processing 利用自然语言处理以可解释的方式提取随机对照试验的样本量
Pub Date : 2024-07-10 DOI: 10.1101/2024.07.09.24310155
Paul Windisch, Fabio Dennstaedt, Carole Koechli, Robert Foerster, Christina Schroeder, Daniel M. Aebersold, Daniel R. Zwahlen
Background: Extracting the sample size from randomized controlled trials (RCTs) remains a challenge to developing better search functionalities or automating systematic reviews. Most current approaches rely on the sample size being explicitly mentioned in the abstract. Methods: 847 RCTs from high-impact medical journals were tagged with six different entities that could indicate the sample size. A named entity recognition (NER) model was trained to extract the entities and then deployed on a test set of 150 RCTs. The entities' performance in predicting the actual number of trial participants who were randomized was assessed and possible combinations of the entities were evaluated to create predictive models.Results: The most accurate model could make predictions for 64.7% of trials in the test set, and the resulting predictions were within 10% of the ground truth in 96.9% of cases. A less strict model could make a prediction for 96.0% of trials, and its predictions were within 10% of the ground truth in 88.2% of cases.Conclusion: Training a named entity recognition model to predict the sample size from randomized controlled trials is feasible, not only if the sample size is explicitly mentioned but also if the sample size can be calculated, e.g., by adding up the number of patients in each arm.
背景:从随机对照试验(RCT)中提取样本量仍然是开发更好的搜索功能或自动进行系统综述所面临的挑战。目前大多数方法都依赖于摘要中明确提及的样本量。方法:对来自高影响力医学期刊的 847 篇随机对照试验进行标记,标记中包含六种不同的实体,这些实体可以表明样本量。对命名实体识别(NER)模型进行了提取实体的训练,然后将其部署在由 150 份 RCT 组成的测试集上。评估了实体在预测实际随机试验参与者人数方面的性能,并对实体的可能组合进行了评估,以创建预测模型:结果:最准确的模型可以对测试集中 64.7% 的试验进行预测,所得出的预测结果在 96.9% 的情况下都在基本事实的 10% 以内。一个不那么严格的模型可以对 96.0% 的试验做出预测,其预测结果在 88.2% 的情况下在地面实况的 10% 以内:结论:通过训练命名实体识别模型来预测随机对照试验的样本量是可行的,不仅要明确提及样本量,而且要能计算出样本量,例如,通过将每个臂中的患者人数相加。
{"title":"Extracting the Sample Size From Randomized Controlled Trials in Explainable Fashion Using Natural Language Processing","authors":"Paul Windisch, Fabio Dennstaedt, Carole Koechli, Robert Foerster, Christina Schroeder, Daniel M. Aebersold, Daniel R. Zwahlen","doi":"10.1101/2024.07.09.24310155","DOIUrl":"https://doi.org/10.1101/2024.07.09.24310155","url":null,"abstract":"Background: Extracting the sample size from randomized controlled trials (RCTs) remains a challenge to developing better search functionalities or automating systematic reviews. Most current approaches rely on the sample size being explicitly mentioned in the abstract. Methods: 847 RCTs from high-impact medical journals were tagged with six different entities that could indicate the sample size. A named entity recognition (NER) model was trained to extract the entities and then deployed on a test set of 150 RCTs. The entities' performance in predicting the actual number of trial participants who were randomized was assessed and possible combinations of the entities were evaluated to create predictive models.\u0000Results: The most accurate model could make predictions for 64.7% of trials in the test set, and the resulting predictions were within 10% of the ground truth in 96.9% of cases. A less strict model could make a prediction for 96.0% of trials, and its predictions were within 10% of the ground truth in 88.2% of cases.\u0000Conclusion: Training a named entity recognition model to predict the sample size from randomized controlled trials is feasible, not only if the sample size is explicitly mentioned but also if the sample size can be calculated, e.g., by adding up the number of patients in each arm.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ongoing and planned Randomized Controlled Trials of AI in medicine: An analysis of Clinicaltrials.gov registration data 正在进行和计划进行的人工智能医学随机对照试验:对 Clinicaltrials.gov 注册数据的分析
Pub Date : 2024-07-09 DOI: 10.1101/2024.07.09.24310133
mattia andreoletti, Berkay Senkalfa, Alessandro Blasimme
The integration of Artificial Intelligence (AI) technologies into clinical practice holds significant promise for revolutionizing healthcare. However, the realization of this potential requires rigorous evaluation and validation of AI applications to ensure their safety, efficacy, and clinical significance. Despite increasing awareness of the need for robust testing, the majority of AI-related Randomized Controlled Trials (RCTs) so far have exhibited notable limitations, impeding the generalizability and proper integration of their findings into clinical settings. To understand whether the field is progressing towards more robust testing, we conducted an analysis of the registration data of ongoing and planned RCTs of AI in medicine available in the Clinicaltrials.gov database. Our analysis highlights several key trends and challenges. Effectively addressing these challenges is essential for advancing the field of medical AI and ensuring its successful integration into clinical practice.
人工智能(AI)技术与临床实践的结合为医疗保健带来了巨大的变革前景。然而,要实现这一潜力,需要对人工智能应用进行严格的评估和验证,以确保其安全性、有效性和临床意义。尽管人们越来越意识到需要进行强有力的测试,但迄今为止,大多数与人工智能相关的随机对照试验(RCT)都表现出明显的局限性,阻碍了其研究结果在临床环境中的推广和适当整合。为了了解该领域是否正在朝着更稳健的测试方向发展,我们对 Clinicaltrials.gov 数据库中正在进行和计划进行的人工智能医学随机对照试验的注册数据进行了分析。我们的分析强调了几个主要趋势和挑战。有效应对这些挑战对于推动医学人工智能领域的发展并确保其成功融入临床实践至关重要。
{"title":"Ongoing and planned Randomized Controlled Trials of AI in medicine: An analysis of Clinicaltrials.gov registration data","authors":"mattia andreoletti, Berkay Senkalfa, Alessandro Blasimme","doi":"10.1101/2024.07.09.24310133","DOIUrl":"https://doi.org/10.1101/2024.07.09.24310133","url":null,"abstract":"The integration of Artificial Intelligence (AI) technologies into clinical practice holds significant promise for revolutionizing healthcare. However, the realization of this potential requires rigorous evaluation and validation of AI applications to ensure their safety, efficacy, and clinical significance. Despite increasing awareness of the need for robust testing, the majority of AI-related Randomized Controlled Trials (RCTs) so far have exhibited notable limitations, impeding the generalizability and proper integration of their findings into clinical settings. To understand whether the field is progressing towards more robust testing, we conducted an analysis of the registration data of ongoing and planned RCTs of AI in medicine available in the Clinicaltrials.gov database. Our analysis highlights several key trends and challenges. Effectively addressing these challenges is essential for advancing the field of medical AI and ensuring its successful integration into clinical practice.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Autoencoder to Identify Sex-Specific Sub-phenotypes in Alzheimer's Disease Progression Using Longitudinal Electronic Health Records 利用纵向电子健康记录的自动编码器识别阿尔茨海默病进展的性别特异性亚型
Pub Date : 2024-07-08 DOI: 10.1101/2024.07.07.24310055
Weimin Meng, Jie Xu, Yu Huang, Cankun Wang, Qianqian Song, Anjun Ma, Lixin Song, Jiang Bian, Qin Ma, Rui Yin
Alzheimer's Disease (AD) is a complex neurodegenerative disorder significantly influenced by sex differences, with approximately two-thirds of AD patients being women. Characterizing the sex-specific AD progression and identifying its progression trajectory is a crucial step to developing effective risk stratification and prevention strategies. In this study, we developed an autoencoder to uncover sex-specific sub-phenotypes in AD progression leveraging longitudinal electronic health record (EHR) data from OneFlorida+ Clinical Research Consortium. Specifically, we first constructed temporal patient representation using longitudinal EHRs from sex-stratified AD cohort. We used a long short-term memory (LSTM)-based autoencoder to extract and generate latent representation embeddings from sequential clinical records of patients. We then applied hierarchical agglomerative clustering to the learned representations, grouping patients based on their progression sub-phenotypes. The experimental results show that we successfully identified five primary sex-based AD sub-phenotypes with corresponding progression pathways with high confidence. These sex-specific sub-phenotypes not only illustrated distinct AD progression patterns but also revealed differences in clinical characteristics and comorbidities between females and males in AD development. These findings could provide valuable insights for advancing personalized AD intervention and treatment strategies.
阿尔茨海默病(AD)是一种复杂的神经退行性疾病,受性别差异的影响很大,约三分之二的阿尔茨海默病患者为女性。要想制定有效的风险分层和预防策略,就必须描述阿尔茨海默病的性别特异性进展并确定其进展轨迹。在这项研究中,我们利用 OneFlorida+ 临床研究联合会的纵向电子健康记录(EHR)数据,开发了一种自动编码器,以发现 AD 进展过程中的性别特异性亚表型。具体来说,我们首先利用来自性别分层的 AD 队列的纵向电子病历构建了患者的时序表征。我们使用基于长短期记忆(LSTM)的自动编码器从患者的连续临床记录中提取并生成潜在表征嵌入。然后,我们对学习到的表征进行分层聚类,根据患者的进展亚表型对其进行分组。实验结果表明,我们成功地识别出了五种基于性别的原发性注意力缺失症亚型,并以较高的置信度确定了相应的进展路径。这些基于性别的亚型不仅展示了不同的AD进展模式,还揭示了AD发展过程中女性和男性在临床特征和合并症方面的差异。这些发现可为推进个性化的注意力缺失症干预和治疗策略提供有价值的见解。
{"title":"Autoencoder to Identify Sex-Specific Sub-phenotypes in Alzheimer's Disease Progression Using Longitudinal Electronic Health Records","authors":"Weimin Meng, Jie Xu, Yu Huang, Cankun Wang, Qianqian Song, Anjun Ma, Lixin Song, Jiang Bian, Qin Ma, Rui Yin","doi":"10.1101/2024.07.07.24310055","DOIUrl":"https://doi.org/10.1101/2024.07.07.24310055","url":null,"abstract":"Alzheimer's Disease (AD) is a complex neurodegenerative disorder significantly influenced by sex differences, with approximately two-thirds of AD patients being women. Characterizing the sex-specific AD progression and identifying its progression trajectory is a crucial step to developing effective risk stratification and prevention strategies. In this study, we developed an autoencoder to uncover sex-specific sub-phenotypes in AD progression leveraging longitudinal electronic health record (EHR) data from OneFlorida+ Clinical Research Consortium. Specifically, we first constructed temporal patient representation using longitudinal EHRs from sex-stratified AD cohort. We used a long short-term memory (LSTM)-based autoencoder to extract and generate latent representation embeddings from sequential clinical records of patients. We then applied hierarchical agglomerative clustering to the learned representations, grouping patients based on their progression sub-phenotypes. The experimental results show that we successfully identified five primary sex-based AD sub-phenotypes with corresponding progression pathways with high confidence. These sex-specific sub-phenotypes not only illustrated distinct AD progression patterns but also revealed differences in clinical characteristics and comorbidities between females and males in AD development. These findings could provide valuable insights for advancing personalized AD intervention and treatment strategies.","PeriodicalId":501454,"journal":{"name":"medRxiv - Health Informatics","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141566864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
medRxiv - Health Informatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1