首页 > 最新文献

Journal of Clinical Epidemiology最新文献

英文 中文
Potential waste in evidence synthesis for health screening: a scoping review and call for action 健康筛查证据合成中的潜在浪费:范围审查和行动呼吁。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-17 DOI: 10.1016/j.jclinepi.2025.112053
Sarah Batson , Matthew J. Randell , Catherine Bane , Julia Geppert , Pranshu Mundada , Chris Stinton , Eleanor Cozens , Maggie Powell , Sian Taylor-Phillips

Objectives

To estimate the extent of potential research waste in the production of evidence-synthesis products for health screening, providing an initial assessment of its magnitude.

Study Design and Setting

Evidence-synthesis products supporting screening recommendations for adult populations, published by the UK National Screening Committee (UK NSC) and the US Preventive Services Task Force (USPSTF) between 2014 and 2024, were identified as anchor reviews. For each anchor review, Embase, Medline, and national and international organization websites were searched for overlapping evidence reviews on the same topic, defined as addressing the same research questions with at least partial overlap in the population, interventions, comparisons, and outcomes.

Results

A total of 48 anchor reviews (covering 33 conditions) were identified from the UK NSC and USPSTF. Overlapping evidence reviews were identified for 92% (44/48) of these, with a median of 4 additional reviews per anchor review (range: 0–60; interquartile range [IQR]: 2–15). Of the overlapping reviews, 11% showed overlap with partial use of prior work, explicitly updating or building upon prior external work, but with new elements and scope differences that maintained their classification as overlapping. Focusing on a core subset of conditions of shared interest to both organizations, the median overlap increased to 13 (range: 2–47; IQR: 4–17), indicating substantial duplication in priority areas. Seventy percent of all reviews in the evidence base were conducted in North America (28%) and Western Europe (42%), with limited representation from low- and middle-income countries.

Conclusion

The results of this review highlight potential research waste due to duplication in evidence synthesis efforts. Coordinated action among organizations advising policymakers, such as NSCs, public health agencies, and evidence review bodies may help establish more efficient, collaborative approaches that enable reuse and adaptation across contexts. Such action could include real-time sharing of ongoing reviews, multiregion comprehensive reviews, and the use of stratified analyses to tailor findings to country-specific needs. These strategies should be explored to determine whether organizations can reduce unnecessary duplication, enhance equity, improve the timeliness and relevance of guidance, and redirect resources toward unmet research priorities and other pressing public health challenges.
目的:估计用于健康筛查的证据合成产品生产中潜在的研究浪费的程度,并对其程度进行初步评估。研究设计和设置:2014年至2024年间由英国国家筛查委员会(UK NSC)和美国预防服务工作组(USPSTF)发布的支持成人人群筛查建议的证据合成产品被确定为锚定综述。对于每一篇锚点综述,我们在Embase、Medline以及国家和国际组织网站上搜索了关于同一主题的重叠证据综述,这些证据综述被定义为解决相同的研究问题,但在人口、干预措施、比较和结果方面至少有部分重叠。结果:从英国NSC和USPSTF共确定了48个锚点审查(涵盖33个条件)。其中有92%(44/48)的证据评价存在重叠,每个锚点评价中位数为4个额外评价(范围:0-60;四分位数间距[IQR]: 2-15)。在重叠的审查中,11%显示了与先前工作的部分使用重叠,明确地更新或建立在先前的外部工作之上,但是有新的元素和范围差异,保持了它们作为重叠的分类。关注两个组织共同感兴趣的核心条件子集,中位数重叠增加到13(范围:2-47;IQR: 4-17),表明在优先领域存在大量重复。证据库中70%的评价是在北美(28%)和西欧(42%)进行的,来自低收入和中等收入国家的代表性有限。结论:本综述的结果突出了由于证据合成工作的重复而可能造成的研究浪费。为决策者提供咨询的组织(如国家筛选委员会、公共卫生机构和证据审查机构)之间的协调行动可能有助于建立更有效的协作方法,从而实现跨环境的重用和适应。这种行动可包括实时分享正在进行的审查、多区域全面审查和使用分层分析使调查结果适合具体国家的需要。应探讨这些战略,以确定各组织是否能够减少不必要的重复,增强公平性,提高指导的及时性和相关性,并将资源转向未满足的研究重点和其他紧迫的公共卫生挑战。
{"title":"Potential waste in evidence synthesis for health screening: a scoping review and call for action","authors":"Sarah Batson ,&nbsp;Matthew J. Randell ,&nbsp;Catherine Bane ,&nbsp;Julia Geppert ,&nbsp;Pranshu Mundada ,&nbsp;Chris Stinton ,&nbsp;Eleanor Cozens ,&nbsp;Maggie Powell ,&nbsp;Sian Taylor-Phillips","doi":"10.1016/j.jclinepi.2025.112053","DOIUrl":"10.1016/j.jclinepi.2025.112053","url":null,"abstract":"<div><h3>Objectives</h3><div>To estimate the extent of potential research waste in the production of evidence-synthesis products for health screening, providing an initial assessment of its magnitude.</div></div><div><h3>Study Design and Setting</h3><div>Evidence-synthesis products supporting screening recommendations for adult populations, published by the UK National Screening Committee (UK NSC) and the US Preventive Services Task Force (USPSTF) between 2014 and 2024, were identified as anchor reviews. For each anchor review, Embase, Medline, and national and international organization websites were searched for overlapping evidence reviews on the same topic, defined as addressing the same research questions with at least partial overlap in the population, interventions, comparisons, and outcomes.</div></div><div><h3>Results</h3><div>A total of 48 anchor reviews (covering 33 conditions) were identified from the UK NSC and USPSTF. Overlapping evidence reviews were identified for 92% (44/48) of these, with a median of 4 additional reviews per anchor review (range: 0–60; interquartile range [IQR]: 2–15). Of the overlapping reviews, 11% showed overlap with partial use of prior work, explicitly updating or building upon prior external work, but with new elements and scope differences that maintained their classification as overlapping. Focusing on a core subset of conditions of shared interest to both organizations, the median overlap increased to 13 (range: 2–47; IQR: 4–17), indicating substantial duplication in priority areas. Seventy percent of all reviews in the evidence base were conducted in North America (28%) and Western Europe (42%), with limited representation from low- and middle-income countries.</div></div><div><h3>Conclusion</h3><div>The results of this review highlight potential research waste due to duplication in evidence synthesis efforts. Coordinated action among organizations advising policymakers, such as NSCs, public health agencies, and evidence review bodies may help establish more efficient, collaborative approaches that enable reuse and adaptation across contexts. Such action could include real-time sharing of ongoing reviews, multiregion comprehensive reviews, and the use of stratified analyses to tailor findings to country-specific needs. These strategies should be explored to determine whether organizations can reduce unnecessary duplication, enhance equity, improve the timeliness and relevance of guidance, and redirect resources toward unmet research priorities and other pressing public health challenges.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112053"},"PeriodicalIF":5.2,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145558295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing a framework for assessing the applicability of the target condition in diagnostic research 发展一个评估目标条件在诊断研究中的适用性的框架。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-15 DOI: 10.1016/j.jclinepi.2025.112059
Eve Tomlinson , Jude Holmes , Anne W.S. Rutjes , Clare Davenport , Mariska Leeflang , Bada Yang , Sue Mallett , Penny Whiting
<div><h3>Objectives</h3><div>Assessment of the applicability of primary studies is an essential but often a challenging aspect of systematic reviews of diagnostic test accuracy studies (DTA reviews). We explored review authors’ applicability assessments for the QUADAS-2 reference standard domain within Cochrane DTA reviews. We highlight applicability concerns, identify potential issues with assessment, and develop a framework for assessing the applicability of the target condition as defined by the reference standard.</div></div><div><h3>Study Design and Setting</h3><div>Methodological review. DTA reviews in the Cochrane Library that used QUADAS-2 and judged applicability for the reference standard domain as “high concern” for at least one study were eligible. One reviewer extracted the rationale for the “high concern” and this was checked by a second reviewer. Two reviewers categorized the rationale inductively into themes, and a third reviewer verified these. Discussions regarding the extracted information informed framework development.</div></div><div><h3>Results</h3><div>We identified 50 eligible reviews. Five themes emerged: study uses different reference standard threshold to define the target condition (six reviews), misclassification by the reference standard in the study such that the target condition in the study does not match the review question (11 reviews), reference standard could not be applied to all participants resulting in a different target condition (five reviews), misunderstanding QUADAS-2 applicability (seven reviews), and insufficient information (21 reviews). Our framework for researchers outlines four potential applicability concerns for the assessment of the target condition as defined by the reference standard: different sub-categories of the target condition, different threshold used to define the target condition, reference standard not applied to full study group, and misclassification of the target condition by the reference standard.</div></div><div><h3>Conclusion</h3><div>Clear sources of applicability concerns are identifiable, but several Cochrane review authors struggle to adequately identify and report them. We have developed an applicability framework to guide review authors in their assessment of applicability concerns for the QUADAS reference standard domain.</div></div><div><h3>Plain Language Summary</h3><div>What is the problem? Doctors use tests to help to decide if a person has a certain condition. They want to know how accurate the test is before they use it. This means how well it can tell people who have the condition from people who do not have it. This information can be found in “diagnostic systematic reviews”. Diagnostic systematic reviews start with a research question. They bring together findings from studies that have already been done to try to answer this question. It is important for researchers to check that the studies match the review question. This is called an “applicability assess
目的:评估初步研究的适用性是诊断试验准确性研究系统评价(DTA评价)的一个重要但经常具有挑战性的方面。我们在Cochrane DTA综述中探讨了综述作者对QUADAS-2参考标准领域的适用性评估。我们强调适用性问题,识别评估中的潜在问题,并制定一个框架来评估参考标准定义的目标条件的适用性。研究设计和设置:方法学回顾。Cochrane图书馆中使用QUADAS-2并判定参考标准领域的适用性为至少一项研究“高度关注”的DTA评价是合格的。一位审稿人提取了“高度关注”的理由,这由另一位审稿人进行了检查。两位审稿人将基本原理归纳为主题,第三位审稿人验证了这些主题。关于抽取信息知情框架开发的讨论。结果:我们确定了50篇符合条件的综述。出现了五个主题:研究使用不同的参考标准阈值来定义目标条件(6篇综述),研究中参考标准的错误分类导致研究中的目标条件与综述问题不匹配(11篇综述),参考标准不能适用于所有参与者导致不同的目标条件(5篇综述),误解QUADAS-2适用性(7篇综述),信息不充分(21篇综述)。我们的研究人员框架概述了参考标准所定义的目标条件评估的四个潜在适用性问题:目标条件的不同子类别,定义目标条件的不同阈值,参考标准不适用于整个研究组,以及参考标准对目标条件的错误分类。结论:明确的适用性问题来源是可识别的,但一些Cochrane综述作者难以充分识别和报告它们。我们已经开发了一个适用性框架来指导评审作者评估QUADAS参考标准领域的适用性问题。问题是什么?医生通过测试来帮助确定一个人是否患有某种疾病。在使用之前,他们想知道测试的准确性。这意味着它可以很好地区分患有这种疾病的人。这些信息可以在“诊断系统评价”中找到。诊断性系统评价从一个研究问题开始。他们汇集了已经完成的研究结果,试图回答这个问题。研究人员检查研究是否符合回顾问题是很重要的。这被称为“适用性评估”。例如,如果综述着眼于儿童,那么检查这些研究是否也关注儿童是很重要的。有一个名为“QUADAS-2”的工具可以用来检查研究与复习问题的匹配程度。这可能很难做到,而且没有很多例子可以帮助人们。我们做了什么?我们想更多地了解人们如何使用QUADAS-2工具来判断适用性。我们还想编写指导来支持关于适用性的判断。我们发现了什么?我们找到了人们如何进行适用性评估的例子。许多评论没有做对,我们解释为什么会这样。我们还做了指导,帮助人们进行适用性评估。
{"title":"Developing a framework for assessing the applicability of the target condition in diagnostic research","authors":"Eve Tomlinson ,&nbsp;Jude Holmes ,&nbsp;Anne W.S. Rutjes ,&nbsp;Clare Davenport ,&nbsp;Mariska Leeflang ,&nbsp;Bada Yang ,&nbsp;Sue Mallett ,&nbsp;Penny Whiting","doi":"10.1016/j.jclinepi.2025.112059","DOIUrl":"10.1016/j.jclinepi.2025.112059","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Objectives&lt;/h3&gt;&lt;div&gt;Assessment of the applicability of primary studies is an essential but often a challenging aspect of systematic reviews of diagnostic test accuracy studies (DTA reviews). We explored review authors’ applicability assessments for the QUADAS-2 reference standard domain within Cochrane DTA reviews. We highlight applicability concerns, identify potential issues with assessment, and develop a framework for assessing the applicability of the target condition as defined by the reference standard.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Study Design and Setting&lt;/h3&gt;&lt;div&gt;Methodological review. DTA reviews in the Cochrane Library that used QUADAS-2 and judged applicability for the reference standard domain as “high concern” for at least one study were eligible. One reviewer extracted the rationale for the “high concern” and this was checked by a second reviewer. Two reviewers categorized the rationale inductively into themes, and a third reviewer verified these. Discussions regarding the extracted information informed framework development.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;We identified 50 eligible reviews. Five themes emerged: study uses different reference standard threshold to define the target condition (six reviews), misclassification by the reference standard in the study such that the target condition in the study does not match the review question (11 reviews), reference standard could not be applied to all participants resulting in a different target condition (five reviews), misunderstanding QUADAS-2 applicability (seven reviews), and insufficient information (21 reviews). Our framework for researchers outlines four potential applicability concerns for the assessment of the target condition as defined by the reference standard: different sub-categories of the target condition, different threshold used to define the target condition, reference standard not applied to full study group, and misclassification of the target condition by the reference standard.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;div&gt;Clear sources of applicability concerns are identifiable, but several Cochrane review authors struggle to adequately identify and report them. We have developed an applicability framework to guide review authors in their assessment of applicability concerns for the QUADAS reference standard domain.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;What is the problem? Doctors use tests to help to decide if a person has a certain condition. They want to know how accurate the test is before they use it. This means how well it can tell people who have the condition from people who do not have it. This information can be found in “diagnostic systematic reviews”. Diagnostic systematic reviews start with a research question. They bring together findings from studies that have already been done to try to answer this question. It is important for researchers to check that the studies match the review question. This is called an “applicability assess","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112059"},"PeriodicalIF":5.2,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145543777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Challenges in using GRADE by systematic review authors and how to overcome them: a response to Andric et al. 系统评价作者在使用GRADE时面临的挑战以及如何克服这些挑战:对Andric等人的回应。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-15 DOI: 10.1016/j.jclinepi.2025.112055
Miranda W. Langendam, Ignacio Neumann, Holger J. Schünemann
{"title":"Challenges in using GRADE by systematic review authors and how to overcome them: a response to Andric et al.","authors":"Miranda W. Langendam,&nbsp;Ignacio Neumann,&nbsp;Holger J. Schünemann","doi":"10.1016/j.jclinepi.2025.112055","DOIUrl":"10.1016/j.jclinepi.2025.112055","url":null,"abstract":"","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"190 ","pages":"Article 112055"},"PeriodicalIF":5.2,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145543732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimands: what they are and why we should use them 估计:它们是什么,为什么我们应该使用它们。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-15 DOI: 10.1016/j.jclinepi.2025.112054
Brennan C. Kahan , Declan Devane
In clinical trials, postrandomization events, such as treatment discontinuation or the use of rescue medication, can complicate the interpretation of results. An estimand is a precise description of the treatment effect that investigators wish to estimate. Estimands facilitate more straightforward interpretation of trial results by explicitly defining how postrandomization “intercurrent” events are incorporated into the research question. This article introduces the five key attributes of estimands (population, treatment conditions, endpoint, summary measure, and strategies for intercurrent events) and explains the five main strategies for managing intercurrent events (treatment policy, composite, while on treatment, hypothetical, and principal stratum). Using a practical example of a trial comparing cognitive behavioral therapy vs medication for mild anxiety, we demonstrate how different estimand choices lead to varying study designs, analyses, and interpretations. Understanding estimands helps researchers design better trials and enables stakeholders to determine if the results are relevant to their situation. We also explain how sensitivity analyses can be used to check the reliability of results by assessing how results change under different statistical assumptions.
在临床试验中,随机化后的事件,如停止治疗或使用抢救药物,可能使结果的解释复杂化。估计是对研究人员希望估计的治疗效果的精确描述。通过明确定义随机化后的“交互”事件如何纳入研究问题,估算有助于更直接地解释试验结果。本文介绍了估计的五个关键属性(人口、治疗条件、终点、汇总测量和并发事件的策略),并解释了管理并发事件的五个主要策略(治疗策略、综合治疗、同时治疗、假设和主要阶层)。通过一个比较认知行为疗法和药物治疗轻度焦虑的试验实例,我们展示了不同的评估和选择如何导致不同的研究设计、分析和解释。了解评估有助于研究人员设计更好的试验,并使利益相关者能够确定结果是否与他们的情况相关。我们还解释了如何通过评估结果在不同统计假设下的变化来使用敏感性分析来检查结果的可靠性。
{"title":"Estimands: what they are and why we should use them","authors":"Brennan C. Kahan ,&nbsp;Declan Devane","doi":"10.1016/j.jclinepi.2025.112054","DOIUrl":"10.1016/j.jclinepi.2025.112054","url":null,"abstract":"<div><div>In clinical trials, postrandomization events, such as treatment discontinuation or the use of rescue medication, can complicate the interpretation of results. An estimand is a precise description of the treatment effect that investigators wish to estimate. Estimands facilitate more straightforward interpretation of trial results by explicitly defining how postrandomization “intercurrent” events are incorporated into the research question. This article introduces the five key attributes of estimands (population, treatment conditions, endpoint, summary measure, and strategies for intercurrent events) and explains the five main strategies for managing intercurrent events (treatment policy, composite, while on treatment, hypothetical, and principal stratum). Using a practical example of a trial comparing cognitive behavioral therapy vs medication for mild anxiety, we demonstrate how different estimand choices lead to varying study designs, analyses, and interpretations. Understanding estimands helps researchers design better trials and enables stakeholders to determine if the results are relevant to their situation. We also explain how sensitivity analyses can be used to check the reliability of results by assessing how results change under different statistical assumptions.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"189 ","pages":"Article 112054"},"PeriodicalIF":5.2,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145543764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying ethical issues in stepped-wedge cluster randomized trials to inform the Ottawa Statement update: a systematic review of trials published 2016–2022 确定楔形步进集群随机试验中的伦理问题,为渥太华声明更新提供信息:2016-2022年发表的试验系统综述。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-13 DOI: 10.1016/j.jclinepi.2025.112051
Cory E. Goldstein , Julia F. Shaw , Ariella Binik , Hayden P. Nix , Luis Ortiz-Reyes , Thais Mazzetti , Anand Sergeant , Charles Weijer , Monica Taljaard
<div><h3>Objectives</h3><div>The <em>Ottawa Statement on the Ethical Design and Conduct of Cluster Randomized Trials</em> (CRTs), published in 2012, is the first and remains the only international ethics guidance document specific to CRTs. However, the stepped-wedge CRT design raises complex ethical issues for which guidance may be lacking. The overarching objective of this review is to inform the forthcoming update of the <em>Ottawa Statement</em>; specific objectives are to characterize the types of interventions and data collection procedures in stepped-wedge CRTs (SW-CRTs), and to examine adherence to key ethical design, conduct, and reporting recommendations.</div></div><div><h3>Study Design and Setting</h3><div>Primary reports of SW-CRTs evaluating health interventions published 2016-2022 in English were reviewed. Two reviewers extracted data from each trial independently; discrepancies were resolved through consensus.</div></div><div><h3>Results</h3><div>Among 160 SW-CRTs, most evaluated multilevel interventions (78, 49%), and uncommonly involved therapeutic patient interventions (19, 12%). Few (10, 6%) exclusively used routinely collected data sources for outcome assessment. Sixty-four trials (40%) provided explicit justifications for using cluster randomization and the stepped-wedge design. Most (157, 98%) included a statement about research ethics committee (REC) review, of which 148 (94%) reported approval. A statement about consent was reported in 145 (91%), with 113 (78%) pertaining to patients only, 10 (7%) to health-care providers only, and 22 (15%) to both. Among 135 trials reporting on patient consent, consent was not obtained in 55 (41%). Justifications for not obtaining consent were provided in 42 (76%).</div></div><div><h3>Conclusion</h3><div>The updated <em>Ottawa Statement</em> should provide guidance about when people should be considered research participants and when their consent is required, justifications for using a stepped-wedge design, the need for REC review, and the burden of data collection procedures in SW-CRTs.</div></div><div><h3>Plain Language Summary</h3><div>We reviewed 160 SW-CRTs published between 2016 and 2022. These trials often evaluate multifaceted interventions delivered at the cluster or professional level rather than individual level, and most had to rely on at least some form of primary data collection. Although almost all SW-CRTs reported ethics review, fewer than half explained why they used the SW-CRT design, and a substantial minority did not include a statement about whether participant informed consent was obtained. Rarely did consent statement pertain to health-care providers; most statements pertained to patients. Among the SW-CRTs that had consent statements pertaining to patients, almost half reported not obtaining or waiving consent from them, and many of these did not report clear reasons for this. Our findings highlight the need for clearer guidance in the forthcoming update to the <e
目的:2012年发布的《关于聚类随机试验伦理设计和行为的渥太华声明》是第一个也是唯一一个专门针对聚类随机试验(crt)的国际伦理指导文件。然而,楔形阴极射线管的设计引发了复杂的伦理问题,可能缺乏指导。本次审查的总体目标是为即将修订的《渥太华声明》提供信息;具体目标是描述sw - crt的干预类型和数据收集程序,并检查对关键道德设计、行为和报告建议的遵守情况。研究设计:回顾2016-2022年发表的英文sw - crt评估健康干预措施的主要报告。两名审稿人独立地从每个试验中提取数据;分歧通过协商一致解决。结果:在160个sw - crt中,大多数评估多级干预(78,49%),并且不常见地涉及治疗性患者干预(19,12%)。很少(10.6%)专门使用常规收集的数据来源进行结果评估。64项试验(40%)为使用聚类随机化和楔形设计提供了明确的理由。大多数(157,98%)包括关于研究伦理委员会(REC)审查的声明,其中148项(94%)报告批准。145家(91%)报告了关于同意的声明,其中113家(78%)仅涉及患者,10家(7%)仅涉及医疗保健提供者,22家(15%)涉及两者。在135项报告患者同意的试验中,55项(41%)未获得患者同意。42例(76%)提供了未获得同意的理由。结论:更新后的渥太华声明应提供有关何时应将受试者视为研究参与者以及何时需要其同意的指导,使用楔形设计的理由,REC审查的必要性以及sw - crt数据收集程序的负担。简单的语言总结:我们回顾了2016年至2022年间发表的160项楔形聚类随机试验(sw - crt)。这些试验通常评估在集群或专业水平上提供的多方面干预措施,而不是个人水平,并且大多数必须依赖于至少某种形式的原始数据收集。尽管几乎所有的SW-CRT都报告了伦理审查,但只有不到一半的人解释了他们为什么使用SW-CRT设计,而且相当少数的人没有包括关于是否获得参与者知情同意的声明。同意声明很少涉及医疗保健提供者;大多数陈述与患者有关。在有患者同意声明的sw - crt中,几乎一半的人报告没有获得或放弃他们的同意,其中许多人没有报告明确的原因。我们的研究结果强调,在即将更新的《关于crt的伦理设计和行为的渥太华声明》中,需要更明确的指导。
{"title":"Identifying ethical issues in stepped-wedge cluster randomized trials to inform the Ottawa Statement update: a systematic review of trials published 2016–2022","authors":"Cory E. Goldstein ,&nbsp;Julia F. Shaw ,&nbsp;Ariella Binik ,&nbsp;Hayden P. Nix ,&nbsp;Luis Ortiz-Reyes ,&nbsp;Thais Mazzetti ,&nbsp;Anand Sergeant ,&nbsp;Charles Weijer ,&nbsp;Monica Taljaard","doi":"10.1016/j.jclinepi.2025.112051","DOIUrl":"10.1016/j.jclinepi.2025.112051","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Objectives&lt;/h3&gt;&lt;div&gt;The &lt;em&gt;Ottawa Statement on the Ethical Design and Conduct of Cluster Randomized Trials&lt;/em&gt; (CRTs), published in 2012, is the first and remains the only international ethics guidance document specific to CRTs. However, the stepped-wedge CRT design raises complex ethical issues for which guidance may be lacking. The overarching objective of this review is to inform the forthcoming update of the &lt;em&gt;Ottawa Statement&lt;/em&gt;; specific objectives are to characterize the types of interventions and data collection procedures in stepped-wedge CRTs (SW-CRTs), and to examine adherence to key ethical design, conduct, and reporting recommendations.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Study Design and Setting&lt;/h3&gt;&lt;div&gt;Primary reports of SW-CRTs evaluating health interventions published 2016-2022 in English were reviewed. Two reviewers extracted data from each trial independently; discrepancies were resolved through consensus.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;Among 160 SW-CRTs, most evaluated multilevel interventions (78, 49%), and uncommonly involved therapeutic patient interventions (19, 12%). Few (10, 6%) exclusively used routinely collected data sources for outcome assessment. Sixty-four trials (40%) provided explicit justifications for using cluster randomization and the stepped-wedge design. Most (157, 98%) included a statement about research ethics committee (REC) review, of which 148 (94%) reported approval. A statement about consent was reported in 145 (91%), with 113 (78%) pertaining to patients only, 10 (7%) to health-care providers only, and 22 (15%) to both. Among 135 trials reporting on patient consent, consent was not obtained in 55 (41%). Justifications for not obtaining consent were provided in 42 (76%).&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;div&gt;The updated &lt;em&gt;Ottawa Statement&lt;/em&gt; should provide guidance about when people should be considered research participants and when their consent is required, justifications for using a stepped-wedge design, the need for REC review, and the burden of data collection procedures in SW-CRTs.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;We reviewed 160 SW-CRTs published between 2016 and 2022. These trials often evaluate multifaceted interventions delivered at the cluster or professional level rather than individual level, and most had to rely on at least some form of primary data collection. Although almost all SW-CRTs reported ethics review, fewer than half explained why they used the SW-CRT design, and a substantial minority did not include a statement about whether participant informed consent was obtained. Rarely did consent statement pertain to health-care providers; most statements pertained to patients. Among the SW-CRTs that had consent statements pertaining to patients, almost half reported not obtaining or waiving consent from them, and many of these did not report clear reasons for this. Our findings highlight the need for clearer guidance in the forthcoming update to the &lt;e","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"189 ","pages":"Article 112051"},"PeriodicalIF":5.2,"publicationDate":"2025-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145530801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable medication extraction and discontinuation identification from electronic health records using large language models 使用大型语言模型从电子健康记录中提取可扩展的药物和停用识别。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-11 DOI: 10.1016/j.jclinepi.2025.112049
Chong Shao , Douglas Snyder , Chiran Li , Bowen Gu , Kerry Ngan , Chun-Ting Yang , Jiageng Wu , Richard Wyss , Kueiyu Joshua Lin , Jie Yang
<div><h3>Objectives</h3><div>Identifying medication discontinuations in electronic health records (EHRs) is vital for patient safety but is often hindered by information being buried in unstructured notes. This study aims to evaluate the capabilities of advanced open-sourced and proprietary large language models in extracting medications and classifying their medication status from EHR notes, focusing on their scalability for medication information extraction without human annotation.</div></div><div><h3>Study Design and Setting</h3><div>We collected three EHR datasets from diverse sources to build the evaluation benchmark: 1 publicly available dataset (Reannotated Clinical Acronym Sense Inventory dataset [Re-CASI]), 1 we annotated based on public MIMIC notes (MIMIC-IV Medication Snippet dataset [MIV-Med]), and 1 internally annotated on clinical notes from Mass General Brigham (MGB-Med). We evaluated 12 advanced LLMs, including general-domain open-sourced models (eg, Llama-3.1-70B-Instruct, Qwen2.5-72B-Instruct), medical-specific models (eg, MeLLaMA-70B-chat), and a proprietary model (GPT-4o). We explored multiple LLM prompting strategies, including zero-shot, 5-shot, and Chain-of-Thought (CoT) approaches. Performance on medication extraction, medication status classification, and their joint task (extraction then classification) was systematically compared across all experiments.</div></div><div><h3>Results</h3><div>LLMs showed promising performance on medication extraction, while discontinuation classification and joint tasks were more challenging. GPT-4o consistently achieved the highest average F1 scores in all tasks under zero-shot setting — 94.0% for medication extraction, 78.1% for discontinuation classification, and 72.7% for the joint task. Open-sourced models followed closely, with Llama-3.1-70B-Instruct achieving the highest performance in medication status classification on the MIV-Med dataset (68.7%) and in the joint task on both the Re-CASI (76.2%) and MIV-Med (60.2%) datasets. Medical-specific LLMs demonstrated lower performance compared to advanced general-domain LLMs. Few-shot learning generally improved performance, while CoT reasoning showed inconsistent gains. Notably, open-sourced models occasionally surpassed GPT-4o performance, underscoring their potential in privacy-sensitive clinical research.</div></div><div><h3>Conclusion</h3><div>LLMs demonstrate strong potential for medication extraction and discontinuation identification on EHR notes, with open-sourced models offering scalable alternatives to proprietary systems and few-shot learning further improving LLMs’ capability.</div></div><div><h3>Plain Language Summary</h3><div>Stopping a medicine can affect safety and treatment decisions, yet this detail is often buried in long electronic health record notes. We evaluated whether large language models, which read and summarize text, can automatically find medication names and decide whether each medicine is still being take
目的:在电子健康记录(EHRs)中识别药物停药对患者安全至关重要,但往往受到信息隐藏在非结构化笔记中的阻碍。本研究旨在评估先进的开源和专有的大型语言模型(llm)在从EHR笔记中提取药物和分类药物状态方面的能力,重点关注其在无需人工注释的情况下提取药物信息的可扩展性。研究设计和设置:我们从不同的来源收集了三个EHR数据集来建立评估基准:一个公开可用的数据集(Re-CASI),一个我们基于公共MIMIC笔记(MIV-Med)进行注释的数据集,一个基于麻省总医院布里格姆的临床笔记进行内部注释的数据集(MGB-Med)。我们评估了12个先进的llm,包括通用领域的开源模型(例如,Llama-3.1-70B-Instruct, Qwen2.5-72B-Instruct),医学特定模型(例如,MeLLaMA-70B-chat)和专有模型(gpt - 40)。我们探索了多种LLM提示策略,包括零枪、五枪和思维链(CoT)方法。系统比较各实验在药物提取、药物状态分类及其联合任务(先提取后分类)上的表现。结果:llm在药物提取方面表现良好,但停药分类和联合任务更具挑战性。gpt - 40在零射击设置下的所有任务中均获得最高的平均F1分数,药物提取为94.0%,停药分类为78.1%,联合任务为72.7%。开源模型紧随其后,Llama-3.1-70B-Instruct在MIV-Med数据集上的药物状态分类(68.7%)和Re-CASI(76.2%)和MIV-Med(60.2%)数据集上的联合任务中取得了最高的性能。与高级通用领域法学硕士相比,医学特定法学硕士的性能较低。少数次学习通常会提高表现,而CoT推理则表现出不一致的收益。值得注意的是,开源模型偶尔会超过gpt - 40的性能,这突显了它们在隐私敏感的临床研究中的潜力。结论:llm在EHR记录上的药物提取和停药识别方面显示出强大的潜力,开源模型为专有系统提供了可扩展的替代方案,并且少量学习进一步提高了llm的能力。简单的语言总结:停药会影响安全性和治疗决策,但这些细节往往隐藏在长长的电子健康记录中。我们评估了阅读和总结文本的大型语言模型是否可以自动找到药物名称,并决定每种药物是否仍在服用,已停药,或两者都不服用。我们在三个临床笔记集上测试了12个模型,包括适合安全医院使用的开源选项,并比较了三种简单的教学风格:不提供示例,展示一些示例,并要求逐步推理。所有的模型都产生了可用的结果。在平衡完整性和正确性的0到100的标准衡量标准上,最强的系统在寻找药物名称方面得分约为94分,在决定继续或停止状态方面得分约为78分。展示一些示例通常比逐步提示更有帮助,并且几个开源模型的执行接近领先的专有系统。这些工具可以帮助医院和研究人员大规模监测药物,以支持药物安全性研究、依从性跟踪和临床决策支持,并在临床使用前进行当地验证和保障。
{"title":"Scalable medication extraction and discontinuation identification from electronic health records using large language models","authors":"Chong Shao ,&nbsp;Douglas Snyder ,&nbsp;Chiran Li ,&nbsp;Bowen Gu ,&nbsp;Kerry Ngan ,&nbsp;Chun-Ting Yang ,&nbsp;Jiageng Wu ,&nbsp;Richard Wyss ,&nbsp;Kueiyu Joshua Lin ,&nbsp;Jie Yang","doi":"10.1016/j.jclinepi.2025.112049","DOIUrl":"10.1016/j.jclinepi.2025.112049","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Objectives&lt;/h3&gt;&lt;div&gt;Identifying medication discontinuations in electronic health records (EHRs) is vital for patient safety but is often hindered by information being buried in unstructured notes. This study aims to evaluate the capabilities of advanced open-sourced and proprietary large language models in extracting medications and classifying their medication status from EHR notes, focusing on their scalability for medication information extraction without human annotation.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Study Design and Setting&lt;/h3&gt;&lt;div&gt;We collected three EHR datasets from diverse sources to build the evaluation benchmark: 1 publicly available dataset (Reannotated Clinical Acronym Sense Inventory dataset [Re-CASI]), 1 we annotated based on public MIMIC notes (MIMIC-IV Medication Snippet dataset [MIV-Med]), and 1 internally annotated on clinical notes from Mass General Brigham (MGB-Med). We evaluated 12 advanced LLMs, including general-domain open-sourced models (eg, Llama-3.1-70B-Instruct, Qwen2.5-72B-Instruct), medical-specific models (eg, MeLLaMA-70B-chat), and a proprietary model (GPT-4o). We explored multiple LLM prompting strategies, including zero-shot, 5-shot, and Chain-of-Thought (CoT) approaches. Performance on medication extraction, medication status classification, and their joint task (extraction then classification) was systematically compared across all experiments.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;LLMs showed promising performance on medication extraction, while discontinuation classification and joint tasks were more challenging. GPT-4o consistently achieved the highest average F1 scores in all tasks under zero-shot setting — 94.0% for medication extraction, 78.1% for discontinuation classification, and 72.7% for the joint task. Open-sourced models followed closely, with Llama-3.1-70B-Instruct achieving the highest performance in medication status classification on the MIV-Med dataset (68.7%) and in the joint task on both the Re-CASI (76.2%) and MIV-Med (60.2%) datasets. Medical-specific LLMs demonstrated lower performance compared to advanced general-domain LLMs. Few-shot learning generally improved performance, while CoT reasoning showed inconsistent gains. Notably, open-sourced models occasionally surpassed GPT-4o performance, underscoring their potential in privacy-sensitive clinical research.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;div&gt;LLMs demonstrate strong potential for medication extraction and discontinuation identification on EHR notes, with open-sourced models offering scalable alternatives to proprietary systems and few-shot learning further improving LLMs’ capability.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;Stopping a medicine can affect safety and treatment decisions, yet this detail is often buried in long electronic health record notes. We evaluated whether large language models, which read and summarize text, can automatically find medication names and decide whether each medicine is still being take","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"189 ","pages":"Article 112049"},"PeriodicalIF":5.2,"publicationDate":"2025-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145514496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dysinclusion: naming and defining the inequitable absence of marginalized populations in health research 包容障碍:命名和定义卫生研究中边缘化人群的不公平缺席。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-10 DOI: 10.1016/j.jclinepi.2025.112048
Anna Durbin , Lisa Whittingham , Anjali Menezes , Lucie Richard , Janet Durbin , Aaron M. Orkin
<div><h3>Background and Objectives</h3><div>Marginalized populations are frequently absent or invisible in health research. Yet this problem is seldom characterized as a distinct methodological concern. Existing concepts like selection bias or generalizability approach these inequities primarily as technical limitations, not as methodological deficiencies. We introduce <em>dysinclusion</em> to name and define the inequitable absence or invisibility of groups who should be included in research. Our objective is to establish dysinclusion as a distinct concept at the intersection of equity and methods, distinguish it from existing methodological concepts, and examine how it functions, why it matters, and how it can be addressed.</div></div><div><h3>Methods</h3><div>We draw on examples to define dysinclusion and describe its mechanisms. We differentiate dysinclusion from adjacent epidemiological concepts, and propose three types of dysinclusion processes: data coverage, nonparticipation, and invisibility.</div></div><div><h3>Results</h3><div>Dysinclusion reveals how structural marginalization becomes embedded in research methods. It occurs when marginalized groups are absent from data sources, excluded through study design or barriers to participation, or rendered invisible by measurement and reporting practices. These patterned absences compromise the validity, relevance, and ethical foundation of research. We argue that dysinclusion should be identified and managed not only as a source of bias or threat to validity, but as a central criterion of methodological rigor in the design and implementation of health research.</div></div><div><h3>Conclusion</h3><div>Naming dysinclusion challenges the normalization of exclusion and inequity in health research. Dysinclusion offers language to link the ethical concept of equity with research methods. Making dysinclusion visible reframes patterned absence as a threat to both equity and scientific rigor—one that demands deliberate recognition, accountability, and change.</div></div><div><h3>Plain Language Summary</h3><div>Some groups—like people with disabilities, racialized communities, or those living in poverty—are often missing from health research. Even when they face some of the greatest health challenges, these groups are frequently left out of studies, underrepresented in data, or not even recognized as distinct populations. This absence has serious consequences: it limits what we know about their health, weakens the accuracy of research findings, and can make existing health disparities worse. This problem is common, but there is no widely used method or term in health research to describe or address it. Researchers typically think about who is missing from studies in terms of technical issues like bias or generalizability. These concepts do not fully capture the deeper problem of structural inequality, and make it seem as though ethical concerns, like health equity, are separate from the methods that l
背景和目标:边缘化人群在卫生研究中经常缺席或被忽视。然而,这个问题很少被描述为一个独特的方法论问题。现有的概念,如选择偏差或概括性,主要将这些不平等视为技术限制,而不是方法缺陷。我们引入包容性障碍来命名和定义应该被纳入研究的群体的不公平缺席或隐形。我们的目标是在公平和方法的交叉点建立包容性障碍作为一个独特的概念,将其与现有的方法论概念区分开来,并研究它是如何起作用的,为什么它很重要,以及如何解决它。方法:我们通过实例来定义包容性障碍并描述其机制。我们将包容性障碍与相邻的流行病学概念区分开来,并提出了三种类型的包容性障碍过程:数据覆盖、不参与和不可见。结果:包容性障碍揭示了结构性边缘化如何嵌入到研究方法中。当边缘化群体:没有数据来源,因研究设计或参与障碍而被排除在外,或因衡量和报告做法而被忽视时,就会发生这种情况。这些模式缺失损害了研究的有效性、相关性和伦理基础。我们认为,融入障碍不仅应作为偏见或有效性威胁的来源加以识别和管理,而且应作为设计和实施健康研究方法严密性的中心标准。结论:命名包容障碍挑战了健康研究中排斥和不公平的正常化。包容障碍提供了将公平的伦理概念与研究方法联系起来的语言。让包容性障碍变得可见,将模式缺失重新定义为对公平和科学严谨性的威胁——这需要深思熟虑的承认、问责和变革。简单的语言总结:一些群体,如残疾人、种族化社区或生活在贫困中的人,经常被遗漏在健康研究中。即使他们面临一些最大的健康挑战,这些群体也经常被排除在研究之外,在数据中代表性不足,甚至不被视为独特的人群。这种缺失会产生严重的后果:它限制了我们对他们健康状况的了解,削弱了研究结果的准确性,并可能使现有的健康差距进一步恶化。这个问题很常见,但在健康研究中没有广泛使用的方法或术语来描述或解决它。研究人员通常会从偏见或概括性等技术问题上考虑谁在研究中缺失。这些概念并没有完全抓住结构性不平等这一更深层次的问题,并使其看起来像是伦理问题,如卫生公平,与导致严谨科学的方法是分开的。本文引入了一个新术语:失包涵。融入障碍是指不公平或不公正地没有纳入本应成为健康研究一部分的群体。这不仅仅是关于谁失踪了,而是关于他们为什么失踪,以及这说明了研究的设计方式。我们概述了包容障碍发生的三种常见方式:包容障碍是伦理学和研究方法交叉的一个概念。命名和定义包容性障碍,可以帮助指导将公平作为研究质量核心部分的研究。就像我们评估研究的偏倚或混杂一样,我们也应该评估它们是否存在融入障碍,并采取措施通过更好的研究设计、更具包容性的数据收集和更清晰的报告来减少这种情况。解决融入障碍不仅仅是为了公平。这也与更好的科学有关。
{"title":"Dysinclusion: naming and defining the inequitable absence of marginalized populations in health research","authors":"Anna Durbin ,&nbsp;Lisa Whittingham ,&nbsp;Anjali Menezes ,&nbsp;Lucie Richard ,&nbsp;Janet Durbin ,&nbsp;Aaron M. Orkin","doi":"10.1016/j.jclinepi.2025.112048","DOIUrl":"10.1016/j.jclinepi.2025.112048","url":null,"abstract":"&lt;div&gt;&lt;h3&gt;Background and Objectives&lt;/h3&gt;&lt;div&gt;Marginalized populations are frequently absent or invisible in health research. Yet this problem is seldom characterized as a distinct methodological concern. Existing concepts like selection bias or generalizability approach these inequities primarily as technical limitations, not as methodological deficiencies. We introduce &lt;em&gt;dysinclusion&lt;/em&gt; to name and define the inequitable absence or invisibility of groups who should be included in research. Our objective is to establish dysinclusion as a distinct concept at the intersection of equity and methods, distinguish it from existing methodological concepts, and examine how it functions, why it matters, and how it can be addressed.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Methods&lt;/h3&gt;&lt;div&gt;We draw on examples to define dysinclusion and describe its mechanisms. We differentiate dysinclusion from adjacent epidemiological concepts, and propose three types of dysinclusion processes: data coverage, nonparticipation, and invisibility.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Results&lt;/h3&gt;&lt;div&gt;Dysinclusion reveals how structural marginalization becomes embedded in research methods. It occurs when marginalized groups are absent from data sources, excluded through study design or barriers to participation, or rendered invisible by measurement and reporting practices. These patterned absences compromise the validity, relevance, and ethical foundation of research. We argue that dysinclusion should be identified and managed not only as a source of bias or threat to validity, but as a central criterion of methodological rigor in the design and implementation of health research.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Conclusion&lt;/h3&gt;&lt;div&gt;Naming dysinclusion challenges the normalization of exclusion and inequity in health research. Dysinclusion offers language to link the ethical concept of equity with research methods. Making dysinclusion visible reframes patterned absence as a threat to both equity and scientific rigor—one that demands deliberate recognition, accountability, and change.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;Some groups—like people with disabilities, racialized communities, or those living in poverty—are often missing from health research. Even when they face some of the greatest health challenges, these groups are frequently left out of studies, underrepresented in data, or not even recognized as distinct populations. This absence has serious consequences: it limits what we know about their health, weakens the accuracy of research findings, and can make existing health disparities worse. This problem is common, but there is no widely used method or term in health research to describe or address it. Researchers typically think about who is missing from studies in terms of technical issues like bias or generalizability. These concepts do not fully capture the deeper problem of structural inequality, and make it seem as though ethical concerns, like health equity, are separate from the methods that l","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"189 ","pages":"Article 112048"},"PeriodicalIF":5.2,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145507665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Behavior of prediction performance metrics with rare events 具有罕见事件的预测性能指标的行为。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-10 DOI: 10.1016/j.jclinepi.2025.112046
Emily Minus , R.Yates Coley , Susan M. Shortreed , Brian D. Williamson

Objective

Area under the receiver operating characteristic curve (AUC) is commonly reported alongside prediction models for binary outcomes. Recent articles have raised concerns that AUC might be a misleading measure of prediction performance in the rare event setting. This setting is common since many events of clinical importance are rare. We aimed to determine whether the bias and variance of AUC are driven by the number of events or the event rate. We also investigated the behavior of other commonly used measures of prediction performance, including positive predictive value, accuracy, sensitivity, and specificity.

Study Design and Setting

We conducted a simulation study to determine when or whether AUC is unstable in the rare event setting by varying the size of datasets used to train and evaluate prediction models. This plasmode simulation study was based on data from the Mental Health Research Network; the data contained 149 predictors and the outcome of interest, suicide attempt, which had event rate 0.92% in the original dataset.

Results

Our results indicate that poor AUC behavior—as measured by empirical bias, variability of cross-validated AUC estimates, and empirical coverage of confidence intervals—is driven by the number of events in a rare-event setting, not event rate. Performance of sensitivity is driven by the number of events, while that of specificity is driven by the number of nonevents. Other measures, including positive predictive value and accuracy, depend on the event rate even in large samples.

Conclusion

AUC is reliable in the rare event setting provided that the total number of events is moderately large; in our simulations, we observed near zero bias with 1000 events.

Plain Language Summary

Predicting self-harm or suicidal behavior is medically important for guiding clinicians in providing care to patients. Several research teams have developed and evaluated suicide risk prediction models based on health records data. Part of evaluating these models is calculating area under the receiver operating characteristic curve (AUC) and other prediction performance metrics. Self-harm and suicide are rare events. Recent research has raised concerns with using AUC in rare-event settings. We aimed to determine whether having a sufficiently large dataset could remove these concerns. In our experiments, we found that AUC can be used without concern in settings with 1000 events or more. Thus, AUC is a valid measure of suicide risk prediction model performance in many large healthcare databases.
目的:接收操作者特征曲线下的面积(AUC)通常与二元结果的预测模型一起报道。最近的文章引起了人们的关注,即AUC可能是在罕见事件设置中预测性能的误导性度量。这种情况很常见,因为许多具有临床重要性的事件很少发生。我们的目的是确定AUC的偏差和方差是由事件数还是事件率驱动的。我们还研究了其他常用的预测性能指标的行为,包括阳性预测值、准确性、敏感性和特异性。研究设计和设置:我们进行了一项模拟研究,通过改变用于训练和评估预测模型的数据集的大小,确定在罕见事件设置中AUC何时或是否不稳定。这项等离子模式模拟研究基于心理健康研究网络的数据;数据中包含149个预测因子,其中感兴趣的结果为自杀未遂,在原数据集中事件率为0.92%。结果:我们的研究结果表明,不良的AUC行为——通过经验偏差、交叉验证的AUC估计的可变性和置信区间的经验覆盖来衡量——是由罕见事件设置中的事件数量驱动的,而不是事件发生率。灵敏度的性能由事件的数量驱动,而特异性的性能由非事件的数量驱动。其他指标,包括阳性预测值和准确性,即使在大样本中也取决于事件率。结论:在事件总数适中的情况下,AUC在罕见事件情况下是可靠的;在我们的模拟中,我们观察到1000个事件的偏差接近于零。
{"title":"Behavior of prediction performance metrics with rare events","authors":"Emily Minus ,&nbsp;R.Yates Coley ,&nbsp;Susan M. Shortreed ,&nbsp;Brian D. Williamson","doi":"10.1016/j.jclinepi.2025.112046","DOIUrl":"10.1016/j.jclinepi.2025.112046","url":null,"abstract":"<div><h3>Objective</h3><div>Area under the receiver operating characteristic curve (AUC) is commonly reported alongside prediction models for binary outcomes. Recent articles have raised concerns that AUC might be a misleading measure of prediction performance in the rare event setting. This setting is common since many events of clinical importance are rare. We aimed to determine whether the bias and variance of AUC are driven by the number of events or the event rate. We also investigated the behavior of other commonly used measures of prediction performance, including positive predictive value, accuracy, sensitivity, and specificity.</div></div><div><h3>Study Design and Setting</h3><div>We conducted a simulation study to determine when or whether AUC is unstable in the rare event setting by varying the size of datasets used to train and evaluate prediction models. This plasmode simulation study was based on data from the Mental Health Research Network; the data contained 149 predictors and the outcome of interest, suicide attempt, which had event rate 0.92% in the original dataset.</div></div><div><h3>Results</h3><div>Our results indicate that poor AUC behavior—as measured by empirical bias, variability of cross-validated AUC estimates, and empirical coverage of confidence intervals—is driven by the number of events in a rare-event setting, not event rate. Performance of sensitivity is driven by the number of events, while that of specificity is driven by the number of nonevents. Other measures, including positive predictive value and accuracy, depend on the event rate even in large samples.</div></div><div><h3>Conclusion</h3><div>AUC is reliable in the rare event setting provided that the total number of events is moderately large; in our simulations, we observed near zero bias with 1000 events.</div></div><div><h3>Plain Language Summary</h3><div>Predicting self-harm or suicidal behavior is medically important for guiding clinicians in providing care to patients. Several research teams have developed and evaluated suicide risk prediction models based on health records data. Part of evaluating these models is calculating area under the receiver operating characteristic curve (AUC) and other prediction performance metrics. Self-harm and suicide are rare events. Recent research has raised concerns with using AUC in rare-event settings. We aimed to determine whether having a sufficiently large dataset could remove these concerns. In our experiments, we found that AUC can be used without concern in settings with 1000 events or more. Thus, AUC is a valid measure of suicide risk prediction model performance in many large healthcare databases.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"189 ","pages":"Article 112046"},"PeriodicalIF":5.2,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145507618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Influence of focusing on dominant first order loop when assessing the certainty of evidence of network meta-analysis: a case study 在评估网络荟萃分析证据的确定性时,关注主导一阶回路的影响:一个案例研究。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-10 DOI: 10.1016/j.jclinepi.2025.112050
Liang Yao , Xu Hui , Liujiao Cao , Meixuan Li , Jinhui Tian , Yizhuo Chen , Peijing Yan , Qi Wang , Xiaoqin Wang , Kehu Yang , Gordon H. Guyatt , Romina Brignardello-Petersen

Objectives

Assessing the certainty of evidence (COE) of both direct and indirect evidence of is crucial for enhancing the understanding of network meta-analysis (NMA) findings and drawing appropriate conclusions. To make the certainty rating of indirect evidence feasible, Grading of Recommendations Assessment, Development, and Evaluation (GRADE) guidance suggests focusing on the dominant first order loop that typically contributes the most information to the indirect estimate. This approach, however, can raise concerns that failure to consider most or all loops will provide inaccurate COE of indirect evidence.

Study Design and Setting

We investigated six NMA publications and compared indirect COE ratings considering only the dominant first order loop vs ratings considering most or all loops.

Results

Across 103 indirect comparisons in the six NMAs, 15% (15 out of 103) comparisons did not have a first order loop, and 47% (42 out of 88) comparisons with dominant first order loops contributed a weight smaller than 50% of the indirect evidence. We identified a total of 6% (6 out of 103) indirect comparisons in which ratings based on the dominant first order loop resulted in different COE ratings from considering most or all loops, with two shifting from moderate to low COE of NMA.

Conclusion

Our findings suggest that using only the dominant first order loop vs considering most or all loops is very seldom misleading. Further research is necessary to replicate or refute our findings in other network meta-analyses and assess the implications for clinical decision-making.

Plain Language Summary

NMA allows researchers to compare the relative effectiveness of multiple treatments. The GRADE working group provides guidance to establish the confidence in treatment effects (how likely they are to be true) by evaluating the COE to direct, indirect and NMA estimates, which can be high, moderate, low or very low. We investigated the influences of using two different approaches (dominant first order loop vs consideration of most or all loops) for rating the certainty of NMA. We found similar certainty ratings of indirect evidence and NMA between the two approaches. Therefore, systematic review authors can continue using the simpler approach.
目的:评估直接和间接证据的证据确定性(COE)对于增强对网络元分析(NMA)结果的理解和得出适当的结论至关重要。为了使间接证据的确定性评级可行,GRADE(建议评估、发展和评估的分级)指南建议将重点放在主导的一阶环路上,该环路通常为间接估计提供了最多的信息。然而,这种方法可能会引起关注,即未能考虑大多数或所有循环将提供不准确的间接证据COE。研究设计和设置:我们调查了六份NMA出版物,并比较了仅考虑主要一阶回路的间接COE评级与考虑大多数或所有回路的评级。结果:在6个nma的103个间接比较中,15%(103个中有15个)的比较没有一阶环路,47%(88个中有42个)的比较具有主导一阶环路,其权重小于间接证据的50%。我们确定了总共6%(103个中有6个)的间接比较,其中基于主导一阶回路的评级导致了不同的COE评级,而考虑了大多数或所有回路,其中两个从中等到低的NMA COE转变。结论:我们的研究结果表明,只使用占主导地位的一阶循环而不是考虑大多数或所有循环很少会产生误导。需要进一步的研究来复制或反驳我们在其他网络荟萃分析中的发现,并评估对临床决策的影响。简单的语言总结:网络荟萃分析允许研究人员比较多种治疗的相对有效性。GRADE工作组通过评估直接、间接和网络meta分析估计的COE,提供指导,以建立对治疗效果的信心(它们是真实的可能性),COE可以是高、中、低或非常低。我们研究了使用两种不同方法(主要一阶循环与考虑大部分或全部循环)对网络元分析确定性评级的影响。我们发现两种方法之间的间接证据和网络meta分析的确定性评级相似。因此,系统综述作者可以继续使用更简单的方法。
{"title":"Influence of focusing on dominant first order loop when assessing the certainty of evidence of network meta-analysis: a case study","authors":"Liang Yao ,&nbsp;Xu Hui ,&nbsp;Liujiao Cao ,&nbsp;Meixuan Li ,&nbsp;Jinhui Tian ,&nbsp;Yizhuo Chen ,&nbsp;Peijing Yan ,&nbsp;Qi Wang ,&nbsp;Xiaoqin Wang ,&nbsp;Kehu Yang ,&nbsp;Gordon H. Guyatt ,&nbsp;Romina Brignardello-Petersen","doi":"10.1016/j.jclinepi.2025.112050","DOIUrl":"10.1016/j.jclinepi.2025.112050","url":null,"abstract":"<div><h3>Objectives</h3><div>Assessing the certainty of evidence (COE) of both direct and indirect evidence of is crucial for enhancing the understanding of network meta-analysis (NMA) findings and drawing appropriate conclusions. To make the certainty rating of indirect evidence feasible, Grading of Recommendations Assessment, Development, and Evaluation (GRADE) guidance suggests focusing on the dominant first order loop that typically contributes the most information to the indirect estimate. This approach, however, can raise concerns that failure to consider most or all loops will provide inaccurate COE of indirect evidence.</div></div><div><h3>Study Design and Setting</h3><div>We investigated six NMA publications and compared indirect COE ratings considering only the dominant first order loop vs ratings considering most or all loops.</div></div><div><h3>Results</h3><div>Across 103 indirect comparisons in the six NMAs, 15% (15 out of 103) comparisons did not have a first order loop, and 47% (42 out of 88) comparisons with dominant first order loops contributed a weight smaller than 50% of the indirect evidence. We identified a total of 6% (6 out of 103) indirect comparisons in which ratings based on the dominant first order loop resulted in different COE ratings from considering most or all loops, with two shifting from moderate to low COE of NMA.</div></div><div><h3>Conclusion</h3><div>Our findings suggest that using only the dominant first order loop vs considering most or all loops is very seldom misleading. Further research is necessary to replicate or refute our findings in other network meta-analyses and assess the implications for clinical decision-making.</div></div><div><h3>Plain Language Summary</h3><div>NMA allows researchers to compare the relative effectiveness of multiple treatments. The GRADE working group provides guidance to establish the confidence in treatment effects (how likely they are to be true) by evaluating the COE to direct, indirect and NMA estimates, which can be high, moderate, low or very low. We investigated the influences of using two different approaches (dominant first order loop vs consideration of most or all loops) for rating the certainty of NMA. We found similar certainty ratings of indirect evidence and NMA between the two approaches. Therefore, systematic review authors can continue using the simpler approach.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"189 ","pages":"Article 112050"},"PeriodicalIF":5.2,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145507670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Core GRADE unpacked: a summary of recent innovations in complementary GRADE methodology 核心等级解压:在补充等级方法的最新创新的总结。
IF 5.2 2区 医学 Q1 HEALTH CARE SCIENCES & SERVICES Pub Date : 2025-11-08 DOI: 10.1016/j.jclinepi.2025.112047
Luis Enrique Colunga-Lozano , Ying Wang , Thomas Agoritsas , Monica Hultcrantz , Alfonso Iorio , Victor M. Montori , Gordon Guyatt
<div><div>The Grading of Recommendations Assessment, Development and Evaluation (GRADE) framework has become the global standard for rating evidence certainty and grading strength of health-care recommendations in systematic reviews, clinical practice guidelines (CPGs), and health technology assessments (HTAs). However, as the methodology has evolved, its growing complexity and difficulties navigating the many articles providing GRADE guidance have created challenges for users. Moreover, GRADE guidance has been published across multiple journals and platforms, resulting in a body of work with considerable redundancy, modifications, and out-of-date material that requires meticulous appraisal of GRADE writings to identify and implement current guidance. In response, a group of GRADE methodologists developed Core GRADE, a simple yet comprehensive framework focused on the essential elements required to apply GRADE to paired comparisons of interventions. Building on prior GRADE publications, the authors reviewed existing guidance and distilled its fundamental components. This proposal represents an independent approach and has not received formal endorsement from the GRADE working group. During this process, they identified multiple areas where clearer recommendations were warranted and incorporated these improvements into the seven Core GRADE papers. This article presents the resulting innovations introduced by core GRADE, that include the use of flow charts and algorithms to guide GRADE implementation; an emphasis on viewing both individual GRADE domains and overall certainty as continua; and clarification of decisions related to addressing potential relative and absolute subgroup effects when formulating patient population, intervention, comparison, and outcome questions.</div></div><div><h3>Plain Language Summary</h3><div>The GRADE framework was introduced in 2004 to help researchers and health-care professionals assess the certainty (quality) of evidence in systematic reviews and the strength of recommendations in CPGs and HTAs. Over the past 2 decades, it has become overwhelmingly the most widely used approach to making certainty of evidence and strength of recommendation decisions. However, many users now find GRADE increasingly complex, and guidance has appeared in numerous journals and platforms. This has created overlap, outdated information, and inconsistencies that make it difficult to identify the most current best approaches. To address these challenges, GRADE experts developed Core GRADE, a simplified but comprehensive version of GRADE. Core GRADE brings together the essential elements needed to apply GRADE when comparing health-care interventions (1 intervention vs 1 comparator). In addition to this simplification, Core GRADE offers several innovations. These include practical flowcharts and algorithms to guide step-by-step application of the framework; a new emphasis on viewing both individual domains and overall certainty of evidence
建议分级评估、发展和评价(GRADE)框架已成为系统评价、临床实践指南(CPG)和卫生技术评估(HTA)中对医疗建议的证据确定性和强度进行评级的全球标准。然而,随着方法的发展,其日益增长的复杂性和导航提供GRADE指导的许多文章的困难给用户带来了挑战。此外,GRADE指南已在多个期刊和平台上发表,导致大量工作冗余,修改和过时的材料,需要对GRADE作品进行细致的评估,以确定和实施当前的指南。作为回应,一组GRADE方法学家开发了Core GRADE,这是一个简单而全面的框架,专注于将GRADE应用于干预措施配对比较所需的基本要素。建立在以前的GRADE出版物,作者审查了现有的指导和提炼其基本组成部分。该建议是一种独立的方法,尚未得到GRADE工作组的正式认可。在此过程中,他们确定了需要提出更明确建议的多个领域,并将这些改进纳入了七份核心GRADE文件。本文介绍了core GRADE所带来的创新,包括:使用流程图和算法来指导GRADE的实施;强调将单个GRADE域和整体确定性视为连续的;在制定患者/干预/比较者/结果(PICO)问题时,澄清与解决潜在的相对和绝对亚组效应相关的决定。GRADE框架于2004年推出,旨在帮助研究人员和医疗保健专业人员评估系统评价中证据的确定性(质量)以及临床实践指南(cpg)和卫生技术评估(HTAs)中建议的强度。在过去的二十年里,它已经成为最广泛使用的方法来确定证据和推荐决策的强度。然而,许多用户现在发现GRADE越来越复杂,指南已经出现在许多期刊和平台上。这造成了重叠、过时的信息和不一致,使得难以确定最新的最佳方法。为了应对这些挑战,GRADE专家开发了Core GRADE,这是GRADE的简化但全面的版本。核心GRADE汇集了在比较医疗保健干预措施(一种干预措施与一个比较国)时应用GRADE所需的基本要素。除了这种简化之外,Core GRADE还提供了一些创新。这些包括实用的流程图和算法,以指导框架的逐步应用;重新强调将个别领域和证据的总体确定性视为连续的;以及在制定研究问题时如何解决不同亚群的更清晰建议。除了简化之外,这些创新还进一步提高了GRADE的实用性和易用性。
{"title":"Core GRADE unpacked: a summary of recent innovations in complementary GRADE methodology","authors":"Luis Enrique Colunga-Lozano ,&nbsp;Ying Wang ,&nbsp;Thomas Agoritsas ,&nbsp;Monica Hultcrantz ,&nbsp;Alfonso Iorio ,&nbsp;Victor M. Montori ,&nbsp;Gordon Guyatt","doi":"10.1016/j.jclinepi.2025.112047","DOIUrl":"10.1016/j.jclinepi.2025.112047","url":null,"abstract":"&lt;div&gt;&lt;div&gt;The Grading of Recommendations Assessment, Development and Evaluation (GRADE) framework has become the global standard for rating evidence certainty and grading strength of health-care recommendations in systematic reviews, clinical practice guidelines (CPGs), and health technology assessments (HTAs). However, as the methodology has evolved, its growing complexity and difficulties navigating the many articles providing GRADE guidance have created challenges for users. Moreover, GRADE guidance has been published across multiple journals and platforms, resulting in a body of work with considerable redundancy, modifications, and out-of-date material that requires meticulous appraisal of GRADE writings to identify and implement current guidance. In response, a group of GRADE methodologists developed Core GRADE, a simple yet comprehensive framework focused on the essential elements required to apply GRADE to paired comparisons of interventions. Building on prior GRADE publications, the authors reviewed existing guidance and distilled its fundamental components. This proposal represents an independent approach and has not received formal endorsement from the GRADE working group. During this process, they identified multiple areas where clearer recommendations were warranted and incorporated these improvements into the seven Core GRADE papers. This article presents the resulting innovations introduced by core GRADE, that include the use of flow charts and algorithms to guide GRADE implementation; an emphasis on viewing both individual GRADE domains and overall certainty as continua; and clarification of decisions related to addressing potential relative and absolute subgroup effects when formulating patient population, intervention, comparison, and outcome questions.&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;h3&gt;Plain Language Summary&lt;/h3&gt;&lt;div&gt;The GRADE framework was introduced in 2004 to help researchers and health-care professionals assess the certainty (quality) of evidence in systematic reviews and the strength of recommendations in CPGs and HTAs. Over the past 2 decades, it has become overwhelmingly the most widely used approach to making certainty of evidence and strength of recommendation decisions. However, many users now find GRADE increasingly complex, and guidance has appeared in numerous journals and platforms. This has created overlap, outdated information, and inconsistencies that make it difficult to identify the most current best approaches. To address these challenges, GRADE experts developed Core GRADE, a simplified but comprehensive version of GRADE. Core GRADE brings together the essential elements needed to apply GRADE when comparing health-care interventions (1 intervention vs 1 comparator). In addition to this simplification, Core GRADE offers several innovations. These include practical flowcharts and algorithms to guide step-by-step application of the framework; a new emphasis on viewing both individual domains and overall certainty of evidence","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"189 ","pages":"Article 112047"},"PeriodicalIF":5.2,"publicationDate":"2025-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145482500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Clinical Epidemiology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1