Karynsa Kilpatrick, Katherine Cahill, Urmila Chandran, Daniel Riskin
{"title":"在哮喘领域生成高效力真实世界证据的先进方法。","authors":"Karynsa Kilpatrick, Katherine Cahill, Urmila Chandran, Daniel Riskin","doi":"10.1097/EDE.0000000000001803","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Asthma is a phenotypically complex disease requiring nuanced data to generate clinically and scientifically robust real-world evidence. A quantitative measure of data quality is important for variables key to the research questions at hand. Using electronic health record (EHR) data, this study compared accuracy for asthma features between traditional real-world evidence approaches using structured data and advanced approaches applying artificial intelligence technologies to unstructured clinical data.</p><p><strong>Methods: </strong>We extracted 18 protocol-defined features from 6037 healthcare encounters among 3481 patients. Features included asthma severity subtypes, comorbidities, symptoms, findings, and procedures. We created a manual reference standard through chart abstraction, with two annotators reviewing each record. We assessed interrater reliability using Cohen's kappa score and accuracy against the reference standard as an F1-score.</p><p><strong>Results: </strong>In the traditional study arm, average recall was 40.8%, precision 72.5%, and F1-score across features was 52.2%. In the advanced study arm, average recall was 95.7%, precision 93.8%, and F1-score was 94.7%. There was an absolute increase of 42.5% and a relative increase of 81.4% in the F1-score between traditional and advanced approaches. Cohen's kappa score indicated 0.80 inter-rater reliability, reflecting a credible reference standard.</p><p><strong>Conclusions: </strong>Use of advanced approaches can enable high-quality real-world data sets in asthma, including granular clinical features such as disease subtypes and symptomatic outcomes. Data quality can be measured and, when high, can support generation of high-validity real-world evidence using routinely collected healthcare data.</p>","PeriodicalId":11779,"journal":{"name":"Epidemiology","volume":"36 1","pages":"20-27"},"PeriodicalIF":4.7000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11594548/pdf/","citationCount":"0","resultStr":"{\"title\":\"Advanced Approaches to Generating High-validity Real-world Evidence in Asthma.\",\"authors\":\"Karynsa Kilpatrick, Katherine Cahill, Urmila Chandran, Daniel Riskin\",\"doi\":\"10.1097/EDE.0000000000001803\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Asthma is a phenotypically complex disease requiring nuanced data to generate clinically and scientifically robust real-world evidence. A quantitative measure of data quality is important for variables key to the research questions at hand. Using electronic health record (EHR) data, this study compared accuracy for asthma features between traditional real-world evidence approaches using structured data and advanced approaches applying artificial intelligence technologies to unstructured clinical data.</p><p><strong>Methods: </strong>We extracted 18 protocol-defined features from 6037 healthcare encounters among 3481 patients. Features included asthma severity subtypes, comorbidities, symptoms, findings, and procedures. We created a manual reference standard through chart abstraction, with two annotators reviewing each record. We assessed interrater reliability using Cohen's kappa score and accuracy against the reference standard as an F1-score.</p><p><strong>Results: </strong>In the traditional study arm, average recall was 40.8%, precision 72.5%, and F1-score across features was 52.2%. In the advanced study arm, average recall was 95.7%, precision 93.8%, and F1-score was 94.7%. There was an absolute increase of 42.5% and a relative increase of 81.4% in the F1-score between traditional and advanced approaches. Cohen's kappa score indicated 0.80 inter-rater reliability, reflecting a credible reference standard.</p><p><strong>Conclusions: </strong>Use of advanced approaches can enable high-quality real-world data sets in asthma, including granular clinical features such as disease subtypes and symptomatic outcomes. Data quality can be measured and, when high, can support generation of high-validity real-world evidence using routinely collected healthcare data.</p>\",\"PeriodicalId\":11779,\"journal\":{\"name\":\"Epidemiology\",\"volume\":\"36 1\",\"pages\":\"20-27\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11594548/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Epidemiology\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1097/EDE.0000000000001803\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/25 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/EDE.0000000000001803","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/25 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 0
摘要
背景:哮喘是一种表型复杂的疾病,需要细致入微的数据来生成临床和科学上可靠的真实世界证据。数据质量的定量测量对于手头研究问题的关键变量非常重要。本研究使用电子健康记录(EHR)数据,比较了使用结构化数据的传统真实世界证据方法和将人工智能技术应用于非结构化临床数据的先进方法对哮喘特征的准确性:我们从 3481 名患者的 6037 次医疗保健会诊中提取了 18 个协议定义的特征。特征包括哮喘严重程度亚型、合并症、症状、检查结果和治疗过程。我们通过病历摘要创建了一个人工参考标准,由两名注释者审查每份记录。我们用 Cohen's kappa 分数评估了研究者之间的可靠性,并用 F1 分数评估了对照参考标准的准确性:传统研究组的平均召回率为 40.8%,精确率为 72.5%,各特征的 F1 分数为 52.2%。在高级研究组中,平均召回率为 95.7%,精确率为 93.8%,F1 分数为 94.7%。传统方法和先进方法的 F1 分数绝对值提高了 42.5%,相对值提高了 81.4%。科恩卡帕(Cohen's kappa)评分显示评分者之间的可靠性为 0.80,反映出参考标准是可信的:结论:使用先进的方法可以获得高质量的真实世界哮喘数据集,包括细粒度的临床特征,如疾病亚型和症状结果。数据质量是可以衡量的,如果数据质量较高,则可以支持利用常规收集的医疗保健数据生成高效力的真实世界证据。
Advanced Approaches to Generating High-validity Real-world Evidence in Asthma.
Background: Asthma is a phenotypically complex disease requiring nuanced data to generate clinically and scientifically robust real-world evidence. A quantitative measure of data quality is important for variables key to the research questions at hand. Using electronic health record (EHR) data, this study compared accuracy for asthma features between traditional real-world evidence approaches using structured data and advanced approaches applying artificial intelligence technologies to unstructured clinical data.
Methods: We extracted 18 protocol-defined features from 6037 healthcare encounters among 3481 patients. Features included asthma severity subtypes, comorbidities, symptoms, findings, and procedures. We created a manual reference standard through chart abstraction, with two annotators reviewing each record. We assessed interrater reliability using Cohen's kappa score and accuracy against the reference standard as an F1-score.
Results: In the traditional study arm, average recall was 40.8%, precision 72.5%, and F1-score across features was 52.2%. In the advanced study arm, average recall was 95.7%, precision 93.8%, and F1-score was 94.7%. There was an absolute increase of 42.5% and a relative increase of 81.4% in the F1-score between traditional and advanced approaches. Cohen's kappa score indicated 0.80 inter-rater reliability, reflecting a credible reference standard.
Conclusions: Use of advanced approaches can enable high-quality real-world data sets in asthma, including granular clinical features such as disease subtypes and symptomatic outcomes. Data quality can be measured and, when high, can support generation of high-validity real-world evidence using routinely collected healthcare data.
期刊介绍:
Epidemiology publishes original research from all fields of epidemiology. The journal also welcomes review articles and meta-analyses, novel hypotheses, descriptions and applications of new methods, and discussions of research theory or public health policy. We give special consideration to papers from developing countries.