Journal of Clinical Epidemiology最新文献

英文中文

Visualizing the value of diagnostic tests and prediction models, part I: introduction and expected gain in utility as a function of pretest probability.

IF 7.3 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology

Pub Date : 2025-01-24 DOI: 10.1016/j.jclinepi.2025.111689

Michael A Kohn, Thomas B Newman

Background: In this first of a 3-part series, we review expected gain in utility (EGU) calculations and graphs; in later parts, we contrast them with net benefit calculations and graphs. Our example is plasma D-dimer as a test for pulmonary embolism.

Methods: We approach EGU calculations from the perspective of a clinician evaluating a patient. The clinician is considering 1) not testing and not treating, 2) testing and treating according to the test result; or 3) treating without testing. We use simple algebra and graphs to show how EGU depends on pretest probability and the benefit of treating someone with disease (B) relative to the harms of treating someone without the disease (C) and the harm of the testing the procedure itself (T).

Results: The treatment threshold probability, i.e., the probability of disease at which the expected benefit of treating those with disease is balanced by the harm of treating those without disease (EGU = 0) is C/(C + B). When a diagnostic test is available, the course of action with the highest EGU depends on C, B, T, the pretest probability of disease, and the test result. For a given C, B, and T, the lower the pretest probability, the more abnormal the test result must be to justify treatment.

Conclusion: EGU calculations and graphs allow visualization of how the value of testing can be calculated from the prior probability of the disease, the benefit of treating those with disease, the harm of treating those without disease, and the harm of testing itself.

{"title":"Visualizing the value of diagnostic tests and prediction models, part I: introduction and expected gain in utility as a function of pretest probability.","authors":"Michael A Kohn, Thomas B Newman","doi":"10.1016/j.jclinepi.2025.111689","DOIUrl":"10.1016/j.jclinepi.2025.111689","url":null,"abstract":"<p><strong>Background: </strong>In this first of a 3-part series, we review expected gain in utility (EGU) calculations and graphs; in later parts, we contrast them with net benefit calculations and graphs. Our example is plasma D-dimer as a test for pulmonary embolism.</p><p><strong>Methods: </strong>We approach EGU calculations from the perspective of a clinician evaluating a patient. The clinician is considering 1) not testing and not treating, 2) testing and treating according to the test result; or 3) treating without testing. We use simple algebra and graphs to show how EGU depends on pretest probability and the benefit of treating someone with disease (B) relative to the harms of treating someone without the disease (C) and the harm of the testing the procedure itself (T).</p><p><strong>Results: </strong>The treatment threshold probability, i.e., the probability of disease at which the expected benefit of treating those with disease is balanced by the harm of treating those without disease (EGU = 0) is C/(C + B). When a diagnostic test is available, the course of action with the highest EGU depends on C, B, T, the pretest probability of disease, and the test result. For a given C, B, and T, the lower the pretest probability, the more abnormal the test result must be to justify treatment.</p><p><strong>Conclusion: </strong>EGU calculations and graphs allow visualization of how the value of testing can be calculated from the prior probability of the disease, the benefit of treating those with disease, the harm of treating those without disease, and the harm of testing itself.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"111689"},"PeriodicalIF":7.3,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Scoping review of registration of observational studies finds inadequate registration policies, increased registration, and a debate converging toward proregistration

IF 7.3 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology

Pub Date : 2025-01-23 DOI: 10.1016/j.jclinepi.2025.111686

Daniel Malmsiø , Simon Norlén , Cecilie Jespersen , Victoria Emilie Neesgaard , Zexing Song , An-Wen Chan , Asbjørn Hróbjartsson

Objectives

We aimed to examine a) the policies of national and international clinical trial registries regarding observational studies; b) the time trends of observational study registration; and c) the published arguments for and against observational study registration.

Study Design and Setting

Scoping review of registry practices and published arguments. We searched the websites and databases of all 19 members of the World Health Organization's Registry Network to identify policies relating to observational studies and the number of observational studies registered annually from the beginning of the registries to 2022. Regarding documents with arguments, we searched Medline, Embase, Google Scholar, and top medical and epidemiological journals from 2009 to 2023. We classified arguments as “main” based on the number (n ≥ 3) of documents they occurred in.

Results

Of 19 registries, 15 allowed observational study registration, of which seven (35%) had an explicit policy regarding what to register and two (11%) about when to register. The annual number of observational study registrations increased over time in all registries; for example, ClinicalTrials.gov increased from 313 in 1999 to 9775 in 2022. Fifty documents provided arguments concerning observational study registration: 31 argued for, 18 against, and one was neutral. Since 2012, 19 out of 25 documents argued for. We classified nine arguments as main: five for and four against. The two most prevalent arguments for were the prevention of selective reporting of outcomes (n = 16) and publication bias (n = 12), and against were that it will hinder exploration of new ideas (n = 17) and it will waste resources (n = 6).

Conclusion

Few registries have policies regarding observational studies; an increasing number of observational studies were registered; there was a lively debate on the merits of registration of observational studies, which, since 2012, seems to converge toward proregistration.

{"title":"Scoping review of registration of observational studies finds inadequate registration policies, increased registration, and a debate converging toward proregistration","authors":"Daniel Malmsiø , Simon Norlén , Cecilie Jespersen , Victoria Emilie Neesgaard , Zexing Song , An-Wen Chan , Asbjørn Hróbjartsson","doi":"10.1016/j.jclinepi.2025.111686","DOIUrl":"10.1016/j.jclinepi.2025.111686","url":null,"abstract":"<div><h3>Objectives</h3><div>We aimed to examine a) the policies of national and international clinical trial registries regarding observational studies; b) the time trends of observational study registration; and c) the published arguments for and against observational study registration.</div></div><div><h3>Study Design and Setting</h3><div>Scoping review of registry practices and published arguments. We searched the websites and databases of all 19 members of the World Health Organization's Registry Network to identify policies relating to observational studies and the number of observational studies registered annually from the beginning of the registries to 2022. Regarding documents with arguments, we searched Medline, Embase, Google Scholar, and top medical and epidemiological journals from 2009 to 2023. We classified arguments as “main” based on the number (<em>n</em> ≥ 3) of documents they occurred in.</div></div><div><h3>Results</h3><div>Of 19 registries, 15 allowed observational study registration, of which seven (35%) had an explicit policy regarding what to register and two (11%) about when to register. The annual number of observational study registrations increased over time in all registries; for example, ClinicalTrials.gov increased from 313 in 1999 to 9775 in 2022. Fifty documents provided arguments concerning observational study registration: 31 argued for, 18 against, and one was neutral. Since 2012, 19 out of 25 documents argued for. We classified nine arguments as main: five for and four against. The two most prevalent arguments for were the prevention of selective reporting of outcomes (<em>n</em> = 16) and publication bias (<em>n</em> = 12), and against were that it will hinder exploration of new ideas (<em>n</em> = 17) and it will waste resources (<em>n</em> = 6).</div></div><div><h3>Conclusion</h3><div>Few registries have policies regarding observational studies; an increasing number of observational studies were registered; there was a lively debate on the merits of registration of observational studies, which, since 2012, seems to converge toward proregistration.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111686"},"PeriodicalIF":7.3,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Heterogeneity across outcomes in clinical trials on sodium-glucose cotransporter 2 inhibitors in chronic heart failure: a cross-sectional study

IF 7.3 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology

Pub Date : 2025-01-23 DOI: 10.1016/j.jclinepi.2025.111685

Fran Šaler , Marin Viđak , Ružica Tokalić , Livia Puljak

Objectives

This study aimed to analyze the outcomes, outcome domains, and prevalence of the use of clinical outcome endpoints (COE) in clinical trials on sodium-glucose cotransporter 2 (SGLT2) inhibitors for chronic heart failure (CHF) registered on ClinicalTrials.gov and compare them to COE for cardiovascular trials.

Study Design and Setting

We conducted a cross-sectional methodological study. Trials and trial outcomes were extracted from ClinicalTrials.gov, classified, and analyzed. For pivotal trials, registrations were compared with matching publications and supplementary documentation. The adherence of outcomes in pivotal clinical trials to COE developed by the European Society of Cardiology (ESC) was checked.

Results

In 71 included trials, we found 170 individual clinical outcomes and divided them into 11 groups (10 clinical outcome groups and ESC COE). Heart failure with reduced ejection fraction (HFrEF) was analyzed in 33 (46%) trials, and heart failure with preserved ejection fraction (HFpEF) in 25% of trials. ESC COE outcomes were used in less than 30% of trials, and only in 9 as primary outcomes (13%). Trials included 59 different biomarker endpoints. Patient-reported outcomes were highly heterogeneous, utilizing various nonvalidated questionnaires. All five pivotal trials used primary outcomes from ESC COE. The adherence of pivotal trials to the ESC COE was moderately high, with insufficient data on dyspnea and heart failure events such as intensification of diuretic therapy. All pivotal trials had at least one change in study protocol at one point during the trial, in outcome measures, statistical model, enrollment, or trial duration.

Conclusion

Outcomes used in CHF trials of SGLT2 inhibitors were highly heterogeneous. Core outcome sets developed especially for CHF were underutilized. Standardization of outcomes is needed in the CHF field to enable between-trial comparisons and evidence syntheses.

{"title":"Heterogeneity across outcomes in clinical trials on sodium-glucose cotransporter 2 inhibitors in chronic heart failure: a cross-sectional study","authors":"Fran Šaler , Marin Viđak , Ružica Tokalić , Livia Puljak","doi":"10.1016/j.jclinepi.2025.111685","DOIUrl":"10.1016/j.jclinepi.2025.111685","url":null,"abstract":"<div><h3>Objectives</h3><div>This study aimed to analyze the outcomes, outcome domains, and prevalence of the use of clinical outcome endpoints (COE) in clinical trials on sodium-glucose cotransporter 2 (SGLT2) inhibitors for chronic heart failure (CHF) registered on ClinicalTrials.gov and compare them to COE for cardiovascular trials.</div></div><div><h3>Study Design and Setting</h3><div>We conducted a cross-sectional methodological study. Trials and trial outcomes were extracted from ClinicalTrials.gov, classified, and analyzed. For pivotal trials, registrations were compared with matching publications and supplementary documentation. The adherence of outcomes in pivotal clinical trials to COE developed by the European Society of Cardiology (ESC) was checked.</div></div><div><h3>Results</h3><div>In 71 included trials, we found 170 individual clinical outcomes and divided them into 11 groups (10 clinical outcome groups and ESC COE). Heart failure with reduced ejection fraction (HFrEF) was analyzed in 33 (46%) trials, and heart failure with preserved ejection fraction (HFpEF) in 25% of trials. ESC COE outcomes were used in less than 30% of trials, and only in 9 as primary outcomes (13%). Trials included 59 different biomarker endpoints. Patient-reported outcomes were highly heterogeneous, utilizing various nonvalidated questionnaires. All five pivotal trials used primary outcomes from ESC COE. The adherence of pivotal trials to the ESC COE was moderately high, with insufficient data on dyspnea and heart failure events such as intensification of diuretic therapy. All pivotal trials had at least one change in study protocol at one point during the trial, in outcome measures, statistical model, enrollment, or trial duration.</div></div><div><h3>Conclusion</h3><div>Outcomes used in CHF trials of SGLT2 inhibitors were highly heterogeneous. Core outcome sets developed especially for CHF were underutilized. Standardization of outcomes is needed in the CHF field to enable between-trial comparisons and evidence syntheses.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111685"},"PeriodicalIF":7.3,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

How the PICAR framework can benefit guideline systematic reviews: a call for greater attention (Letter commenting on: J Clin Epidemiol. 2019; 108:64-76)

IF 7.3 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology

Pub Date : 2025-01-22 DOI: 10.1016/j.jclinepi.2025.111679

Yin Yu, Zihan Huang, Hui Liu, Xuanlin Li, Lin Huang, Chengping Wen, Yaolong Chen

引用次数: 0

The importance of properly specifying your target trial emulation: commentary on Mésidor et al

IF 7.3 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology

Pub Date : 2025-01-21 DOI: 10.1016/j.jclinepi.2025.111683

Andrea L. Schaffer, William J. Hulme

引用次数: 0

Risk of bias assessment tools often addressed items not related to risk of bias and used numerical scores

IF 7.3 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology

Pub Date : 2025-01-21 DOI: 10.1016/j.jclinepi.2025.111684

Madelin R. Siedler , Hassan Kawtharany , Muayad Azzam , Defne Ezgü , Abrar Alshorman , Ibrahim K. El Mikati , Sadiya Abid , Ali Choaib , Qais Hamarsha , M. Hassan Murad , Rebecca L. Morgan , Yngve Falck-Ytter , Shahnaz Sultan , Philipp Dahm , Reem A. Mustafa

Objectives

We aimed to determine whether the existing risk of bias assessment tools addressed constructs other than risk of bias or internal validity and whether they used numerical scores to express quality, which is discouraged and may be a misleading approach.

Methods

We searched Ovid MEDLINE and Embase to identify quality appraisal tools across all disciplines in human health research. Tools designed specifically to evaluate reporting quality were excluded. Potentially eligible tools were screened by independent pairs of reviewers. We categorized tools according to conceptual constructs and evaluated their scoring methods.

Results

We included 230 tools published from 1995 to 2023. Access to the tool was limited to a peer-reviewed journal article in 63% of the sample. Most tools (76%) provided signaling questions, whereas 39% produced an overall judgment across multiple domains. Most tools (93%) addressed concepts other than risk of bias, such as the appropriateness of statistical analysis (65%), reporting quality (64%), indirectness (41%), imprecision (38%), and ethical considerations and funding (22%). Numerical scoring was used in 25% of tools.

Conclusion

Currently available study quality assessment tools were not explicit about the constructs addressed by their items or signaling questions and addressed multiple constructs in addition to risk of bias. Many tools used numerical scoring systems, which can be misleading. Limitations of the existing tools make the process of rating the certainty of evidence more difficult.

Plain Language Summary

Many tools have been made to assess how well a scientific study was designed, conducted, and written. We searched for these tools to better understand the types of questions they ask and the types of studies to which they apply. We found 230 tools published between 1995 and 2023. One in every four tools used a numerical scoring system. This approach is not recommended because it does not distinguish well between different ways quality can be assessed. Tools assessed quality in a number of different ways, with the most common ways being risk of bias (how a study is designed and run to reduce biased results; 98%), statistical analysis (how the data were analyzed; 65%), and reporting quality (whether important details were included in the article; 64%). People who make tools in the future should carefully consider the aspects of quality that they want the tool to address and distinguish between questions of study design, conduct, analysis, ethics, and reporting.

{"title":"Risk of bias assessment tools often addressed items not related to risk of bias and used numerical scores","authors":"Madelin R. Siedler , Hassan Kawtharany , Muayad Azzam , Defne Ezgü , Abrar Alshorman , Ibrahim K. El Mikati , Sadiya Abid , Ali Choaib , Qais Hamarsha , M. Hassan Murad , Rebecca L. Morgan , Yngve Falck-Ytter , Shahnaz Sultan , Philipp Dahm , Reem A. Mustafa","doi":"10.1016/j.jclinepi.2025.111684","DOIUrl":"10.1016/j.jclinepi.2025.111684","url":null,"abstract":"<div><h3>Objectives</h3><div>We aimed to determine whether the existing risk of bias assessment tools addressed constructs other than risk of bias or internal validity and whether they used numerical scores to express quality, which is discouraged and may be a misleading approach.</div></div><div><h3>Methods</h3><div>We searched Ovid MEDLINE and Embase to identify quality appraisal tools across all disciplines in human health research. Tools designed specifically to evaluate reporting quality were excluded. Potentially eligible tools were screened by independent pairs of reviewers. We categorized tools according to conceptual constructs and evaluated their scoring methods.</div></div><div><h3>Results</h3><div>We included 230 tools published from 1995 to 2023. Access to the tool was limited to a peer-reviewed journal article in 63% of the sample. Most tools (76%) provided signaling questions, whereas 39% produced an overall judgment across multiple domains. Most tools (93%) addressed concepts other than risk of bias, such as the appropriateness of statistical analysis (65%), reporting quality (64%), indirectness (41%), imprecision (38%), and ethical considerations and funding (22%). Numerical scoring was used in 25% of tools.</div></div><div><h3>Conclusion</h3><div>Currently available study quality assessment tools were not explicit about the constructs addressed by their items or signaling questions and addressed multiple constructs in addition to risk of bias. Many tools used numerical scoring systems, which can be misleading. Limitations of the existing tools make the process of rating the certainty of evidence more difficult.</div></div><div><h3>Plain Language Summary</h3><div>Many tools have been made to assess how well a scientific study was designed, conducted, and written. We searched for these tools to better understand the types of questions they ask and the types of studies to which they apply. We found 230 tools published between 1995 and 2023. One in every four tools used a numerical scoring system. This approach is not recommended because it does not distinguish well between different ways quality can be assessed. Tools assessed quality in a number of different ways, with the most common ways being risk of bias (how a study is designed and run to reduce biased results; 98%), statistical analysis (how the data were analyzed; 65%), and reporting quality (whether important details were included in the article; 64%). People who make tools in the future should carefully consider the aspects of quality that they want the tool to address and distinguish between questions of study design, conduct, analysis, ethics, and reporting.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111684"},"PeriodicalIF":7.3,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Data management and sharing

IF 7.3 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology

Pub Date : 2025-01-20 DOI: 10.1016/j.jclinepi.2025.111680

Claude Pellen , Nchangwi Syntia Munung , Anna Catharina Armond , Daniel Kulp , Ulrich Mansmann , Maximilian Siebert , Florian Naudet

Guided by the FAIR principles (Findable, Accessible, Interoperable, Reusable), responsible data sharing requires well-organized, high-quality datasets. However, researchers often struggle with implementing Data Management and Sharing Plans due to lack of knowledge on how to do this, time constraints, and legal, technical, and financial challenges, particularly concerning data ownership and privacy. While patients support data sharing, researchers and funders may hesitate, fearing the loss of intellectual property or competitive advantage. Although some journals and institutions encourage or mandate data sharing, further progress is needed. Additionally, global solutions are vital to ensure equitable participation from low- and middle-income countries. Ultimately, responsible data sharing requires strategic planning, cultural shifts in research, and coordinated efforts from all stakeholders to become standard practice in biomedical research.

引用次数: 0

You wait ages, and then two arrive at once: reporting guidelines should not be like buses

IF 7.3 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology

Pub Date : 2025-01-20 DOI: 10.1016/j.jclinepi.2025.111682

William T. Gattrell, David Tovey, Patricia Logullo, Amy Price, Paul Blazey, Christopher C. Winchester, Esther J. van Zuuren, Niall Harrison

引用次数: 0

Using artificial intelligence to semi-automate trustworthiness assessment of randomized controlled trials: a case study 应用人工智能对随机对照试验的半自动化可信度评估：一个案例研究。

IF 7.3 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology

Pub Date : 2025-01-17 DOI: 10.1016/j.jclinepi.2025.111672

Ling Shan Au , Lizhen Qu , Jeremy Nielsen , Zongyuan Ge , Lyle C. Gurrin , Ben W. Mol , Rui Wang

Background and Objective

Randomized controlled trials (RCTs) are the cornerstone of evidence-based medicine. Unfortunately, not all RCTs are based on real data. This serious breach of research integrity compromises the reliability of systematic reviews and meta-analyses, leading to misinformed clinical guidelines and posing a risk to both individual and public health. While methods to detect problematic RCTs have been proposed, they are time-consuming and labor-intensive. The use of artificial intelligence large language models (LLMs) has the potential to accelerate the data collection needed to assess the trustworthiness of published RCTs.

Methods

We present a case study using ChatGPT powered by OpenAI's GPT-4o to assess an RCT paper. The case study focuses on applying the trustworthiness in randomised controlled trials (TRACT checklist) and automating data table extraction to accelerate statistical analysis targeting the trustworthiness of the data. We provide a detailed step-by-step outline of the process, along with considerations for potential improvements.

Results

ChatGPT completed all tasks by processing the PDF of the selected publication and responding to specific prompts. ChatGPT addressed items in the TRACT checklist effectively, demonstrating an ability to provide precise “yes” or “no” answers while quickly synthesizing information from both the paper and relevant online resources. A comparison of results generated by ChatGPT and the human assessor showed an 84% level of agreement of (16/19) TRACT items. This substantially accelerated the qualitative assessment process. Additionally, ChatGPT was able to extract efficiently the data tables as Microsoft Excel worksheets and reorganize the data, with three out of four extracted tables achieving an accuracy score of 100%, facilitating subsequent analysis and data verification.

Conclusion

ChatGPT demonstrates potential in semiautomating the trustworthiness assessment of RCTs, though in our experience this required repeated prompting from the user. Further testing and refinement will involve applying ChatGPT to collections of RCT papers to improve the accuracy of data capture and lessen the role of the user. The ultimate aim is a completely automated process for large volumes of papers that seems plausible given our initial experience.

目的：随机对照试验（RCTs）是循证医学的基石。不幸的是，并非所有随机对照试验都基于真实数据。这种对研究完整性的严重破坏损害了系统评价和荟萃分析的可靠性，导致错误的临床指南，并对个人和公众健康构成风险。虽然已经提出了检测有问题的随机对照试验的方法，但这些方法既耗时又费力。人工智能大语言模型（LLM）的使用有可能加速评估已发表随机对照试验可信度所需的数据收集。方法：我们提出了一个案例研究，使用OpenAI的gpt - 40支持的ChatGPT来评估一篇随机对照试验论文。案例研究的重点是应用TRACT清单和自动化数据表提取来加速针对数据可信度的统计分析。我们提供了该过程的详细分步大纲，以及对潜在改进的考虑。结果：ChatGPT通过处理选定出版物的PDF并响应特定提示完成了所有任务。ChatGPT有效地处理了TRACT清单中的项目，展示了提供精确的“是”或“否”答案的能力，同时快速地综合了来自纸张和相关在线资源的信息。ChatGPT和人工评估产生的结果的比较显示，（16/19）个TRACT项目的一致性水平为84%。这大大加快了定性评估进程。此外，ChatGPT能够有效地将数据表提取为Microsoft Excel工作表，并对数据进行重组，提取的4个表中有3个表的准确率达到100%，便于后续的分析和数据验证。结论：ChatGPT显示了半自动化随机对照试验可信度评估的潜力，尽管在我们的经验中，这需要用户反复提示。进一步的测试和改进将涉及将ChatGPT应用于RCT论文集合，以提高数据捕获的准确性并减少用户的作用。最终目标是对大量论文进行完全自动化处理，根据我们的初步经验，这似乎是可行的。

{"title":"Using artificial intelligence to semi-automate trustworthiness assessment of randomized controlled trials: a case study","authors":"Ling Shan Au , Lizhen Qu , Jeremy Nielsen , Zongyuan Ge , Lyle C. Gurrin , Ben W. Mol , Rui Wang","doi":"10.1016/j.jclinepi.2025.111672","DOIUrl":"10.1016/j.jclinepi.2025.111672","url":null,"abstract":"<div><h3>Background and Objective</h3><div>Randomized controlled trials (RCTs) are the cornerstone of evidence-based medicine. Unfortunately, not all RCTs are based on real data. This serious breach of research integrity compromises the reliability of systematic reviews and meta-analyses, leading to misinformed clinical guidelines and posing a risk to both individual and public health. While methods to detect problematic RCTs have been proposed, they are time-consuming and labor-intensive. The use of artificial intelligence large language models (LLMs) has the potential to accelerate the data collection needed to assess the trustworthiness of published RCTs.</div></div><div><h3>Methods</h3><div>We present a case study using ChatGPT powered by OpenAI's GPT-4o to assess an RCT paper. The case study focuses on applying the trustworthiness in randomised controlled trials (TRACT checklist) and automating data table extraction to accelerate statistical analysis targeting the trustworthiness of the data. We provide a detailed step-by-step outline of the process, along with considerations for potential improvements.</div></div><div><h3>Results</h3><div>ChatGPT completed all tasks by processing the PDF of the selected publication and responding to specific prompts. ChatGPT addressed items in the TRACT checklist effectively, demonstrating an ability to provide precise “yes” or “no” answers while quickly synthesizing information from both the paper and relevant online resources. A comparison of results generated by ChatGPT and the human assessor showed an 84% level of agreement of (16/19) TRACT items. This substantially accelerated the qualitative assessment process. Additionally, ChatGPT was able to extract efficiently the data tables as Microsoft Excel worksheets and reorganize the data, with three out of four extracted tables achieving an accuracy score of 100%, facilitating subsequent analysis and data verification.</div></div><div><h3>Conclusion</h3><div>ChatGPT demonstrates potential in semiautomating the trustworthiness assessment of RCTs, though in our experience this required repeated prompting from the user. Further testing and refinement will involve applying ChatGPT to collections of RCT papers to improve the accuracy of data capture and lessen the role of the user. The ultimate aim is a completely automated process for large volumes of papers that seems plausible given our initial experience.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111672"},"PeriodicalIF":7.3,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143015518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incomplete reporting of adverse events in duloxetine trials: a meta-research survey of randomized controlled trials vs placebo 度洛西汀试验中不良事件的不完整报告：随机对照试验与安慰剂的荟萃研究调查。

IF 7.3 2区医学 Q1 HEALTH CARE SCIENCES & SERVICES

Journal of Clinical Epidemiology

Pub Date : 2025-01-16 DOI: 10.1016/j.jclinepi.2025.111677

P. Rolland , A. Jutel , K. Douget , F. Naudet , J.C. Roy

Background and Objectives

Relying on published data alone might be insufficient for meta-analyses to be reliable and trustworthy since selective outcome reporting is common, especially for adverse events (AEs). We investigated the existence of selective reporting and its potential for bias in a case study exploring AEs of duloxetine in adults.

Study Design and Setting

We systematically searched all previous meta-analyses/pooled analyses on duloxetine published on PubMed for seven indications approved by the American and European health authorities. We included all randomized controlled trials (RCTs) vs placebo. For each RCT, we extracted the number of serious adverse events (SAEs), AEs, drop-outs (DOs) and drop-outs for safety reasons (DOSRs) using four information sources: published articles, clinical study registries, clinical study reports and data available in meta-analyses/pooled analyses. To assess the range of differences resulting from these four extraction strategies, we performed 4 meta-analyses using random effect models as well as a complete meta-analysis combining all sources.

Results

A total of 70 RCTs (including 24,330 patients) were included. Of those, SAEs were identified for 42 studies (61%) in published articles, 58 (84%) in study reports (8 study reports were not retrieved), 24 (34.7%) in registries, and 21 (30.4%) in meta-analyses/pooled analyses. For 2 (2.9%), 2 (2.9%), 2 (2.9%) and 1 (1.4%) studies, we found respectively no data on SAEs, AEs, DOs, and DOSRs in any sources. Discrepant results across sources were found in 24 (34.5%), 20 (28.5%), 13 (18.6%), and 9 (12.8%) studies, respectively for SAEs, AEs, DOs, and DOSRs. Despite variations in point estimates and their 95% confidence intervals, we did not find different results in the conclusions of meta-analyses depending on the different information sources used, except for DOs, for which no effect was found using results published in registries, in contrast to other information sources.

Conclusion

None of the four information sources provided complete retrieval of safety results for duloxetine in adults across various indications. However, we did not find strong evidence that this underreporting leads to different conclusions in meta-analyses. Nonetheless, this finding remains uncertain, as we were unable to obtain complete information for all studies despite extensive searches.

目的：仅依靠已发表的数据可能不足以使meta分析可靠和可信，因为选择性结果报告是常见的，特别是对于不良事件。我们在一项研究成人度洛西汀不良事件的病例研究中调查了选择性报告的存在及其潜在的偏倚。研究设计和背景：我们系统地检索了美国和欧洲卫生当局批准的7种适应症的PubMed上发表的所有关于度洛西汀的meta分析/汇总分析。我们纳入了所有随机对照试验（RCT）和安慰剂对照试验。对于每一项RCT，我们使用4种信息来源提取严重不良事件（SAE）、不良事件（AE）、退出事件（DO）和因安全原因退出事件（DOSR）的数量：已发表的文章、临床研究登记、临床研究报告和荟萃分析/汇总分析中的数据。为了评估这4种提取策略产生的差异范围，我们使用随机效应模型进行了4次荟萃分析，并结合所有来源进行了一次完整的荟萃分析。结果：纳入70项随机对照试验（rct），共纳入24330例患者。其中，在已发表的文章中发现了42项研究（61%），在研究报告中发现了58项研究（84%）（未检索到8项研究报告），在登记中发现了24项研究（34.7%），在荟萃分析/汇总分析中发现了21项研究（30.4%）。在2项（2.9%）、2项（2.9%）、2项（2.9%）和1项（1.4%）研究中，我们分别没有发现任何来源的sae、ae、DOs和DOSRs的数据。在SAEs、ae、DOs和DOSRs方面，不同来源的研究结果分别有24项（34.5%）、20项（28.5%）、13项（18.6%）和9项（12.8%）。尽管点估计值及其95%置信区间存在差异，但我们在meta分析的结论中没有发现不同的结果，这取决于所使用的不同信息源，除了DOs，与其他信息源相比，在注册表中发表的结果没有发现任何影响。结论：四个信息来源中没有一个提供完整检索度洛西汀在成人各种适应症的安全性结果。然而，我们没有发现强有力的证据表明，在荟萃分析中，这种低报导致了不同的结论。尽管如此，这一发现仍然不确定，因为尽管进行了广泛的搜索，我们仍无法获得所有研究的完整信息。

{"title":"Incomplete reporting of adverse events in duloxetine trials: a meta-research survey of randomized controlled trials vs placebo","authors":"P. Rolland , A. Jutel , K. Douget , F. Naudet , J.C. Roy","doi":"10.1016/j.jclinepi.2025.111677","DOIUrl":"10.1016/j.jclinepi.2025.111677","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Relying on published data alone might be insufficient for meta-analyses to be reliable and trustworthy since selective outcome reporting is common, especially for adverse events (AEs). We investigated the existence of selective reporting and its potential for bias in a case study exploring AEs of duloxetine in adults.</div></div><div><h3>Study Design and Setting</h3><div>We systematically searched all previous meta-analyses/pooled analyses on duloxetine published on PubMed for seven indications approved by the American and European health authorities. We included all randomized controlled trials (RCTs) vs placebo. For each RCT, we extracted the number of serious adverse events (SAEs), AEs, drop-outs (DOs) and drop-outs for safety reasons (DOSRs) using four information sources: published articles, clinical study registries, clinical study reports and data available in meta-analyses/pooled analyses. To assess the range of differences resulting from these four extraction strategies, we performed 4 meta-analyses using random effect models as well as a complete meta-analysis combining all sources.</div></div><div><h3>Results</h3><div>A total of <em>70</em> RCTs (including 24,330 patients) were included. Of those, SAEs were identified for 42 studies (61%) in published articles, 58 (84%) in study reports (8 study reports were not retrieved), 24 (34.7%) in registries, and 21 (30.4%) in meta-analyses/pooled analyses. For 2 (2.9%), 2 (2.9%), 2 (2.9%) and 1 (1.4%) studies, we found respectively no data on SAEs, AEs, DOs, and DOSRs in any sources. Discrepant results across sources were found in 24 (34.5%), 20 (28.5%), 13 (18.6%), and 9 (12.8%) studies, respectively for SAEs, AEs, DOs, and DOSRs. Despite variations in point estimates and their 95% confidence intervals, we did not find different results in the conclusions of meta-analyses depending on the different information sources used, except for DOs, for which no effect was found using results published in registries, in contrast to other information sources.</div></div><div><h3>Conclusion</h3><div>None of the four information sources provided complete retrieval of safety results for duloxetine in adults across various indications. However, we did not find strong evidence that this underreporting leads to different conclusions in meta-analyses. Nonetheless, this finding remains uncertain, as we were unable to obtain complete information for all studies despite extensive searches.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111677"},"PeriodicalIF":7.3,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143015508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of Clinical Epidemiology

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀