Pub Date : 2025-01-24DOI: 10.1016/j.jclinepi.2025.111689
Michael A Kohn, Thomas B Newman
Background: In this first of a 3-part series, we review expected gain in utility (EGU) calculations and graphs; in later parts, we contrast them with net benefit calculations and graphs. Our example is plasma D-dimer as a test for pulmonary embolism.
Methods: We approach EGU calculations from the perspective of a clinician evaluating a patient. The clinician is considering 1) not testing and not treating, 2) testing and treating according to the test result; or 3) treating without testing. We use simple algebra and graphs to show how EGU depends on pretest probability and the benefit of treating someone with disease (B) relative to the harms of treating someone without the disease (C) and the harm of the testing the procedure itself (T).
Results: The treatment threshold probability, i.e., the probability of disease at which the expected benefit of treating those with disease is balanced by the harm of treating those without disease (EGU = 0) is C/(C + B). When a diagnostic test is available, the course of action with the highest EGU depends on C, B, T, the pretest probability of disease, and the test result. For a given C, B, and T, the lower the pretest probability, the more abnormal the test result must be to justify treatment.
Conclusion: EGU calculations and graphs allow visualization of how the value of testing can be calculated from the prior probability of the disease, the benefit of treating those with disease, the harm of treating those without disease, and the harm of testing itself.
{"title":"Visualizing the value of diagnostic tests and prediction models, part I: introduction and expected gain in utility as a function of pretest probability.","authors":"Michael A Kohn, Thomas B Newman","doi":"10.1016/j.jclinepi.2025.111689","DOIUrl":"10.1016/j.jclinepi.2025.111689","url":null,"abstract":"<p><strong>Background: </strong>In this first of a 3-part series, we review expected gain in utility (EGU) calculations and graphs; in later parts, we contrast them with net benefit calculations and graphs. Our example is plasma D-dimer as a test for pulmonary embolism.</p><p><strong>Methods: </strong>We approach EGU calculations from the perspective of a clinician evaluating a patient. The clinician is considering 1) not testing and not treating, 2) testing and treating according to the test result; or 3) treating without testing. We use simple algebra and graphs to show how EGU depends on pretest probability and the benefit of treating someone with disease (B) relative to the harms of treating someone without the disease (C) and the harm of the testing the procedure itself (T).</p><p><strong>Results: </strong>The treatment threshold probability, i.e., the probability of disease at which the expected benefit of treating those with disease is balanced by the harm of treating those without disease (EGU = 0) is C/(C + B). When a diagnostic test is available, the course of action with the highest EGU depends on C, B, T, the pretest probability of disease, and the test result. For a given C, B, and T, the lower the pretest probability, the more abnormal the test result must be to justify treatment.</p><p><strong>Conclusion: </strong>EGU calculations and graphs allow visualization of how the value of testing can be calculated from the prior probability of the disease, the benefit of treating those with disease, the harm of treating those without disease, and the harm of testing itself.</p>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":" ","pages":"111689"},"PeriodicalIF":7.3,"publicationDate":"2025-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143048556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-23DOI: 10.1016/j.jclinepi.2025.111686
Daniel Malmsiø , Simon Norlén , Cecilie Jespersen , Victoria Emilie Neesgaard , Zexing Song , An-Wen Chan , Asbjørn Hróbjartsson
Objectives
We aimed to examine a) the policies of national and international clinical trial registries regarding observational studies; b) the time trends of observational study registration; and c) the published arguments for and against observational study registration.
Study Design and Setting
Scoping review of registry practices and published arguments. We searched the websites and databases of all 19 members of the World Health Organization's Registry Network to identify policies relating to observational studies and the number of observational studies registered annually from the beginning of the registries to 2022. Regarding documents with arguments, we searched Medline, Embase, Google Scholar, and top medical and epidemiological journals from 2009 to 2023. We classified arguments as “main” based on the number (n ≥ 3) of documents they occurred in.
Results
Of 19 registries, 15 allowed observational study registration, of which seven (35%) had an explicit policy regarding what to register and two (11%) about when to register. The annual number of observational study registrations increased over time in all registries; for example, ClinicalTrials.gov increased from 313 in 1999 to 9775 in 2022. Fifty documents provided arguments concerning observational study registration: 31 argued for, 18 against, and one was neutral. Since 2012, 19 out of 25 documents argued for. We classified nine arguments as main: five for and four against. The two most prevalent arguments for were the prevention of selective reporting of outcomes (n = 16) and publication bias (n = 12), and against were that it will hinder exploration of new ideas (n = 17) and it will waste resources (n = 6).
Conclusion
Few registries have policies regarding observational studies; an increasing number of observational studies were registered; there was a lively debate on the merits of registration of observational studies, which, since 2012, seems to converge toward proregistration.
{"title":"Scoping review of registration of observational studies finds inadequate registration policies, increased registration, and a debate converging toward proregistration","authors":"Daniel Malmsiø , Simon Norlén , Cecilie Jespersen , Victoria Emilie Neesgaard , Zexing Song , An-Wen Chan , Asbjørn Hróbjartsson","doi":"10.1016/j.jclinepi.2025.111686","DOIUrl":"10.1016/j.jclinepi.2025.111686","url":null,"abstract":"<div><h3>Objectives</h3><div>We aimed to examine a) the policies of national and international clinical trial registries regarding observational studies; b) the time trends of observational study registration; and c) the published arguments for and against observational study registration.</div></div><div><h3>Study Design and Setting</h3><div>Scoping review of registry practices and published arguments. We searched the websites and databases of all 19 members of the World Health Organization's Registry Network to identify policies relating to observational studies and the number of observational studies registered annually from the beginning of the registries to 2022. Regarding documents with arguments, we searched Medline, Embase, Google Scholar, and top medical and epidemiological journals from 2009 to 2023. We classified arguments as “main” based on the number (<em>n</em> ≥ 3) of documents they occurred in.</div></div><div><h3>Results</h3><div>Of 19 registries, 15 allowed observational study registration, of which seven (35%) had an explicit policy regarding what to register and two (11%) about when to register. The annual number of observational study registrations increased over time in all registries; for example, ClinicalTrials.gov increased from 313 in 1999 to 9775 in 2022. Fifty documents provided arguments concerning observational study registration: 31 argued for, 18 against, and one was neutral. Since 2012, 19 out of 25 documents argued for. We classified nine arguments as main: five for and four against. The two most prevalent arguments for were the prevention of selective reporting of outcomes (<em>n</em> = 16) and publication bias (<em>n</em> = 12), and against were that it will hinder exploration of new ideas (<em>n</em> = 17) and it will waste resources (<em>n</em> = 6).</div></div><div><h3>Conclusion</h3><div>Few registries have policies regarding observational studies; an increasing number of observational studies were registered; there was a lively debate on the merits of registration of observational studies, which, since 2012, seems to converge toward proregistration.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111686"},"PeriodicalIF":7.3,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study aimed to analyze the outcomes, outcome domains, and prevalence of the use of clinical outcome endpoints (COE) in clinical trials on sodium-glucose cotransporter 2 (SGLT2) inhibitors for chronic heart failure (CHF) registered on ClinicalTrials.gov and compare them to COE for cardiovascular trials.
Study Design and Setting
We conducted a cross-sectional methodological study. Trials and trial outcomes were extracted from ClinicalTrials.gov, classified, and analyzed. For pivotal trials, registrations were compared with matching publications and supplementary documentation. The adherence of outcomes in pivotal clinical trials to COE developed by the European Society of Cardiology (ESC) was checked.
Results
In 71 included trials, we found 170 individual clinical outcomes and divided them into 11 groups (10 clinical outcome groups and ESC COE). Heart failure with reduced ejection fraction (HFrEF) was analyzed in 33 (46%) trials, and heart failure with preserved ejection fraction (HFpEF) in 25% of trials. ESC COE outcomes were used in less than 30% of trials, and only in 9 as primary outcomes (13%). Trials included 59 different biomarker endpoints. Patient-reported outcomes were highly heterogeneous, utilizing various nonvalidated questionnaires. All five pivotal trials used primary outcomes from ESC COE. The adherence of pivotal trials to the ESC COE was moderately high, with insufficient data on dyspnea and heart failure events such as intensification of diuretic therapy. All pivotal trials had at least one change in study protocol at one point during the trial, in outcome measures, statistical model, enrollment, or trial duration.
Conclusion
Outcomes used in CHF trials of SGLT2 inhibitors were highly heterogeneous. Core outcome sets developed especially for CHF were underutilized. Standardization of outcomes is needed in the CHF field to enable between-trial comparisons and evidence syntheses.
{"title":"Heterogeneity across outcomes in clinical trials on sodium-glucose cotransporter 2 inhibitors in chronic heart failure: a cross-sectional study","authors":"Fran Šaler , Marin Viđak , Ružica Tokalić , Livia Puljak","doi":"10.1016/j.jclinepi.2025.111685","DOIUrl":"10.1016/j.jclinepi.2025.111685","url":null,"abstract":"<div><h3>Objectives</h3><div>This study aimed to analyze the outcomes, outcome domains, and prevalence of the use of clinical outcome endpoints (COE) in clinical trials on sodium-glucose cotransporter 2 (SGLT2) inhibitors for chronic heart failure (CHF) registered on ClinicalTrials.gov and compare them to COE for cardiovascular trials.</div></div><div><h3>Study Design and Setting</h3><div>We conducted a cross-sectional methodological study. Trials and trial outcomes were extracted from ClinicalTrials.gov, classified, and analyzed. For pivotal trials, registrations were compared with matching publications and supplementary documentation. The adherence of outcomes in pivotal clinical trials to COE developed by the European Society of Cardiology (ESC) was checked.</div></div><div><h3>Results</h3><div>In 71 included trials, we found 170 individual clinical outcomes and divided them into 11 groups (10 clinical outcome groups and ESC COE). Heart failure with reduced ejection fraction (HFrEF) was analyzed in 33 (46%) trials, and heart failure with preserved ejection fraction (HFpEF) in 25% of trials. ESC COE outcomes were used in less than 30% of trials, and only in 9 as primary outcomes (13%). Trials included 59 different biomarker endpoints. Patient-reported outcomes were highly heterogeneous, utilizing various nonvalidated questionnaires. All five pivotal trials used primary outcomes from ESC COE. The adherence of pivotal trials to the ESC COE was moderately high, with insufficient data on dyspnea and heart failure events such as intensification of diuretic therapy. All pivotal trials had at least one change in study protocol at one point during the trial, in outcome measures, statistical model, enrollment, or trial duration.</div></div><div><h3>Conclusion</h3><div>Outcomes used in CHF trials of SGLT2 inhibitors were highly heterogeneous. Core outcome sets developed especially for CHF were underutilized. Standardization of outcomes is needed in the CHF field to enable between-trial comparisons and evidence syntheses.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111685"},"PeriodicalIF":7.3,"publicationDate":"2025-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143043290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-21DOI: 10.1016/j.jclinepi.2025.111683
Andrea L. Schaffer, William J. Hulme
{"title":"The importance of properly specifying your target trial emulation: commentary on Mésidor et al","authors":"Andrea L. Schaffer, William J. Hulme","doi":"10.1016/j.jclinepi.2025.111683","DOIUrl":"10.1016/j.jclinepi.2025.111683","url":null,"abstract":"","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111683"},"PeriodicalIF":7.3,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143029441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-21DOI: 10.1016/j.jclinepi.2025.111684
Madelin R. Siedler , Hassan Kawtharany , Muayad Azzam , Defne Ezgü , Abrar Alshorman , Ibrahim K. El Mikati , Sadiya Abid , Ali Choaib , Qais Hamarsha , M. Hassan Murad , Rebecca L. Morgan , Yngve Falck-Ytter , Shahnaz Sultan , Philipp Dahm , Reem A. Mustafa
Objectives
We aimed to determine whether the existing risk of bias assessment tools addressed constructs other than risk of bias or internal validity and whether they used numerical scores to express quality, which is discouraged and may be a misleading approach.
Methods
We searched Ovid MEDLINE and Embase to identify quality appraisal tools across all disciplines in human health research. Tools designed specifically to evaluate reporting quality were excluded. Potentially eligible tools were screened by independent pairs of reviewers. We categorized tools according to conceptual constructs and evaluated their scoring methods.
Results
We included 230 tools published from 1995 to 2023. Access to the tool was limited to a peer-reviewed journal article in 63% of the sample. Most tools (76%) provided signaling questions, whereas 39% produced an overall judgment across multiple domains. Most tools (93%) addressed concepts other than risk of bias, such as the appropriateness of statistical analysis (65%), reporting quality (64%), indirectness (41%), imprecision (38%), and ethical considerations and funding (22%). Numerical scoring was used in 25% of tools.
Conclusion
Currently available study quality assessment tools were not explicit about the constructs addressed by their items or signaling questions and addressed multiple constructs in addition to risk of bias. Many tools used numerical scoring systems, which can be misleading. Limitations of the existing tools make the process of rating the certainty of evidence more difficult.
Plain Language Summary
Many tools have been made to assess how well a scientific study was designed, conducted, and written. We searched for these tools to better understand the types of questions they ask and the types of studies to which they apply. We found 230 tools published between 1995 and 2023. One in every four tools used a numerical scoring system. This approach is not recommended because it does not distinguish well between different ways quality can be assessed. Tools assessed quality in a number of different ways, with the most common ways being risk of bias (how a study is designed and run to reduce biased results; 98%), statistical analysis (how the data were analyzed; 65%), and reporting quality (whether important details were included in the article; 64%). People who make tools in the future should carefully consider the aspects of quality that they want the tool to address and distinguish between questions of study design, conduct, analysis, ethics, and reporting.
{"title":"Risk of bias assessment tools often addressed items not related to risk of bias and used numerical scores","authors":"Madelin R. Siedler , Hassan Kawtharany , Muayad Azzam , Defne Ezgü , Abrar Alshorman , Ibrahim K. El Mikati , Sadiya Abid , Ali Choaib , Qais Hamarsha , M. Hassan Murad , Rebecca L. Morgan , Yngve Falck-Ytter , Shahnaz Sultan , Philipp Dahm , Reem A. Mustafa","doi":"10.1016/j.jclinepi.2025.111684","DOIUrl":"10.1016/j.jclinepi.2025.111684","url":null,"abstract":"<div><h3>Objectives</h3><div>We aimed to determine whether the existing risk of bias assessment tools addressed constructs other than risk of bias or internal validity and whether they used numerical scores to express quality, which is discouraged and may be a misleading approach.</div></div><div><h3>Methods</h3><div>We searched Ovid MEDLINE and Embase to identify quality appraisal tools across all disciplines in human health research. Tools designed specifically to evaluate reporting quality were excluded. Potentially eligible tools were screened by independent pairs of reviewers. We categorized tools according to conceptual constructs and evaluated their scoring methods.</div></div><div><h3>Results</h3><div>We included 230 tools published from 1995 to 2023. Access to the tool was limited to a peer-reviewed journal article in 63% of the sample. Most tools (76%) provided signaling questions, whereas 39% produced an overall judgment across multiple domains. Most tools (93%) addressed concepts other than risk of bias, such as the appropriateness of statistical analysis (65%), reporting quality (64%), indirectness (41%), imprecision (38%), and ethical considerations and funding (22%). Numerical scoring was used in 25% of tools.</div></div><div><h3>Conclusion</h3><div>Currently available study quality assessment tools were not explicit about the constructs addressed by their items or signaling questions and addressed multiple constructs in addition to risk of bias. Many tools used numerical scoring systems, which can be misleading. Limitations of the existing tools make the process of rating the certainty of evidence more difficult.</div></div><div><h3>Plain Language Summary</h3><div>Many tools have been made to assess how well a scientific study was designed, conducted, and written. We searched for these tools to better understand the types of questions they ask and the types of studies to which they apply. We found 230 tools published between 1995 and 2023. One in every four tools used a numerical scoring system. This approach is not recommended because it does not distinguish well between different ways quality can be assessed. Tools assessed quality in a number of different ways, with the most common ways being risk of bias (how a study is designed and run to reduce biased results; 98%), statistical analysis (how the data were analyzed; 65%), and reporting quality (whether important details were included in the article; 64%). People who make tools in the future should carefully consider the aspects of quality that they want the tool to address and distinguish between questions of study design, conduct, analysis, ethics, and reporting.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111684"},"PeriodicalIF":7.3,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143030346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-20DOI: 10.1016/j.jclinepi.2025.111680
Claude Pellen , Nchangwi Syntia Munung , Anna Catharina Armond , Daniel Kulp , Ulrich Mansmann , Maximilian Siebert , Florian Naudet
Guided by the FAIR principles (Findable, Accessible, Interoperable, Reusable), responsible data sharing requires well-organized, high-quality datasets. However, researchers often struggle with implementing Data Management and Sharing Plans due to lack of knowledge on how to do this, time constraints, and legal, technical, and financial challenges, particularly concerning data ownership and privacy. While patients support data sharing, researchers and funders may hesitate, fearing the loss of intellectual property or competitive advantage. Although some journals and institutions encourage or mandate data sharing, further progress is needed. Additionally, global solutions are vital to ensure equitable participation from low- and middle-income countries. Ultimately, responsible data sharing requires strategic planning, cultural shifts in research, and coordinated efforts from all stakeholders to become standard practice in biomedical research.
{"title":"Data management and sharing","authors":"Claude Pellen , Nchangwi Syntia Munung , Anna Catharina Armond , Daniel Kulp , Ulrich Mansmann , Maximilian Siebert , Florian Naudet","doi":"10.1016/j.jclinepi.2025.111680","DOIUrl":"10.1016/j.jclinepi.2025.111680","url":null,"abstract":"<div><div>Guided by the FAIR principles (Findable, Accessible, Interoperable, Reusable), responsible data sharing requires well-organized, high-quality datasets. However, researchers often struggle with implementing Data Management and Sharing Plans due to lack of knowledge on how to do this, time constraints, and legal, technical, and financial challenges, particularly concerning data ownership and privacy. While patients support data sharing, researchers and funders may hesitate, fearing the loss of intellectual property or competitive advantage. Although some journals and institutions encourage or mandate data sharing, further progress is needed. Additionally, global solutions are vital to ensure equitable participation from low- and middle-income countries. Ultimately, responsible data sharing requires strategic planning, cultural shifts in research, and coordinated efforts from all stakeholders to become standard practice in biomedical research.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111680"},"PeriodicalIF":7.3,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143025463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-20DOI: 10.1016/j.jclinepi.2025.111682
William T. Gattrell, David Tovey, Patricia Logullo, Amy Price, Paul Blazey, Christopher C. Winchester, Esther J. van Zuuren, Niall Harrison
{"title":"You wait ages, and then two arrive at once: reporting guidelines should not be like buses","authors":"William T. Gattrell, David Tovey, Patricia Logullo, Amy Price, Paul Blazey, Christopher C. Winchester, Esther J. van Zuuren, Niall Harrison","doi":"10.1016/j.jclinepi.2025.111682","DOIUrl":"10.1016/j.jclinepi.2025.111682","url":null,"abstract":"","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111682"},"PeriodicalIF":7.3,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143025481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-17DOI: 10.1016/j.jclinepi.2025.111672
Ling Shan Au , Lizhen Qu , Jeremy Nielsen , Zongyuan Ge , Lyle C. Gurrin , Ben W. Mol , Rui Wang
Background and Objective
Randomized controlled trials (RCTs) are the cornerstone of evidence-based medicine. Unfortunately, not all RCTs are based on real data. This serious breach of research integrity compromises the reliability of systematic reviews and meta-analyses, leading to misinformed clinical guidelines and posing a risk to both individual and public health. While methods to detect problematic RCTs have been proposed, they are time-consuming and labor-intensive. The use of artificial intelligence large language models (LLMs) has the potential to accelerate the data collection needed to assess the trustworthiness of published RCTs.
Methods
We present a case study using ChatGPT powered by OpenAI's GPT-4o to assess an RCT paper. The case study focuses on applying the trustworthiness in randomised controlled trials (TRACT checklist) and automating data table extraction to accelerate statistical analysis targeting the trustworthiness of the data. We provide a detailed step-by-step outline of the process, along with considerations for potential improvements.
Results
ChatGPT completed all tasks by processing the PDF of the selected publication and responding to specific prompts. ChatGPT addressed items in the TRACT checklist effectively, demonstrating an ability to provide precise “yes” or “no” answers while quickly synthesizing information from both the paper and relevant online resources. A comparison of results generated by ChatGPT and the human assessor showed an 84% level of agreement of (16/19) TRACT items. This substantially accelerated the qualitative assessment process. Additionally, ChatGPT was able to extract efficiently the data tables as Microsoft Excel worksheets and reorganize the data, with three out of four extracted tables achieving an accuracy score of 100%, facilitating subsequent analysis and data verification.
Conclusion
ChatGPT demonstrates potential in semiautomating the trustworthiness assessment of RCTs, though in our experience this required repeated prompting from the user. Further testing and refinement will involve applying ChatGPT to collections of RCT papers to improve the accuracy of data capture and lessen the role of the user. The ultimate aim is a completely automated process for large volumes of papers that seems plausible given our initial experience.
{"title":"Using artificial intelligence to semi-automate trustworthiness assessment of randomized controlled trials: a case study","authors":"Ling Shan Au , Lizhen Qu , Jeremy Nielsen , Zongyuan Ge , Lyle C. Gurrin , Ben W. Mol , Rui Wang","doi":"10.1016/j.jclinepi.2025.111672","DOIUrl":"10.1016/j.jclinepi.2025.111672","url":null,"abstract":"<div><h3>Background and Objective</h3><div>Randomized controlled trials (RCTs) are the cornerstone of evidence-based medicine. Unfortunately, not all RCTs are based on real data. This serious breach of research integrity compromises the reliability of systematic reviews and meta-analyses, leading to misinformed clinical guidelines and posing a risk to both individual and public health. While methods to detect problematic RCTs have been proposed, they are time-consuming and labor-intensive. The use of artificial intelligence large language models (LLMs) has the potential to accelerate the data collection needed to assess the trustworthiness of published RCTs.</div></div><div><h3>Methods</h3><div>We present a case study using ChatGPT powered by OpenAI's GPT-4o to assess an RCT paper. The case study focuses on applying the trustworthiness in randomised controlled trials (TRACT checklist) and automating data table extraction to accelerate statistical analysis targeting the trustworthiness of the data. We provide a detailed step-by-step outline of the process, along with considerations for potential improvements.</div></div><div><h3>Results</h3><div>ChatGPT completed all tasks by processing the PDF of the selected publication and responding to specific prompts. ChatGPT addressed items in the TRACT checklist effectively, demonstrating an ability to provide precise “yes” or “no” answers while quickly synthesizing information from both the paper and relevant online resources. A comparison of results generated by ChatGPT and the human assessor showed an 84% level of agreement of (16/19) TRACT items. This substantially accelerated the qualitative assessment process. Additionally, ChatGPT was able to extract efficiently the data tables as Microsoft Excel worksheets and reorganize the data, with three out of four extracted tables achieving an accuracy score of 100%, facilitating subsequent analysis and data verification.</div></div><div><h3>Conclusion</h3><div>ChatGPT demonstrates potential in semiautomating the trustworthiness assessment of RCTs, though in our experience this required repeated prompting from the user. Further testing and refinement will involve applying ChatGPT to collections of RCT papers to improve the accuracy of data capture and lessen the role of the user. The ultimate aim is a completely automated process for large volumes of papers that seems plausible given our initial experience.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111672"},"PeriodicalIF":7.3,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143015518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-16DOI: 10.1016/j.jclinepi.2025.111677
P. Rolland , A. Jutel , K. Douget , F. Naudet , J.C. Roy
Background and Objectives
Relying on published data alone might be insufficient for meta-analyses to be reliable and trustworthy since selective outcome reporting is common, especially for adverse events (AEs). We investigated the existence of selective reporting and its potential for bias in a case study exploring AEs of duloxetine in adults.
Study Design and Setting
We systematically searched all previous meta-analyses/pooled analyses on duloxetine published on PubMed for seven indications approved by the American and European health authorities. We included all randomized controlled trials (RCTs) vs placebo. For each RCT, we extracted the number of serious adverse events (SAEs), AEs, drop-outs (DOs) and drop-outs for safety reasons (DOSRs) using four information sources: published articles, clinical study registries, clinical study reports and data available in meta-analyses/pooled analyses. To assess the range of differences resulting from these four extraction strategies, we performed 4 meta-analyses using random effect models as well as a complete meta-analysis combining all sources.
Results
A total of 70 RCTs (including 24,330 patients) were included. Of those, SAEs were identified for 42 studies (61%) in published articles, 58 (84%) in study reports (8 study reports were not retrieved), 24 (34.7%) in registries, and 21 (30.4%) in meta-analyses/pooled analyses. For 2 (2.9%), 2 (2.9%), 2 (2.9%) and 1 (1.4%) studies, we found respectively no data on SAEs, AEs, DOs, and DOSRs in any sources. Discrepant results across sources were found in 24 (34.5%), 20 (28.5%), 13 (18.6%), and 9 (12.8%) studies, respectively for SAEs, AEs, DOs, and DOSRs. Despite variations in point estimates and their 95% confidence intervals, we did not find different results in the conclusions of meta-analyses depending on the different information sources used, except for DOs, for which no effect was found using results published in registries, in contrast to other information sources.
Conclusion
None of the four information sources provided complete retrieval of safety results for duloxetine in adults across various indications. However, we did not find strong evidence that this underreporting leads to different conclusions in meta-analyses. Nonetheless, this finding remains uncertain, as we were unable to obtain complete information for all studies despite extensive searches.
{"title":"Incomplete reporting of adverse events in duloxetine trials: a meta-research survey of randomized controlled trials vs placebo","authors":"P. Rolland , A. Jutel , K. Douget , F. Naudet , J.C. Roy","doi":"10.1016/j.jclinepi.2025.111677","DOIUrl":"10.1016/j.jclinepi.2025.111677","url":null,"abstract":"<div><h3>Background and Objectives</h3><div>Relying on published data alone might be insufficient for meta-analyses to be reliable and trustworthy since selective outcome reporting is common, especially for adverse events (AEs). We investigated the existence of selective reporting and its potential for bias in a case study exploring AEs of duloxetine in adults.</div></div><div><h3>Study Design and Setting</h3><div>We systematically searched all previous meta-analyses/pooled analyses on duloxetine published on PubMed for seven indications approved by the American and European health authorities. We included all randomized controlled trials (RCTs) vs placebo. For each RCT, we extracted the number of serious adverse events (SAEs), AEs, drop-outs (DOs) and drop-outs for safety reasons (DOSRs) using four information sources: published articles, clinical study registries, clinical study reports and data available in meta-analyses/pooled analyses. To assess the range of differences resulting from these four extraction strategies, we performed 4 meta-analyses using random effect models as well as a complete meta-analysis combining all sources.</div></div><div><h3>Results</h3><div>A total of <em>70</em> RCTs (including 24,330 patients) were included. Of those, SAEs were identified for 42 studies (61%) in published articles, 58 (84%) in study reports (8 study reports were not retrieved), 24 (34.7%) in registries, and 21 (30.4%) in meta-analyses/pooled analyses. For 2 (2.9%), 2 (2.9%), 2 (2.9%) and 1 (1.4%) studies, we found respectively no data on SAEs, AEs, DOs, and DOSRs in any sources. Discrepant results across sources were found in 24 (34.5%), 20 (28.5%), 13 (18.6%), and 9 (12.8%) studies, respectively for SAEs, AEs, DOs, and DOSRs. Despite variations in point estimates and their 95% confidence intervals, we did not find different results in the conclusions of meta-analyses depending on the different information sources used, except for DOs, for which no effect was found using results published in registries, in contrast to other information sources.</div></div><div><h3>Conclusion</h3><div>None of the four information sources provided complete retrieval of safety results for duloxetine in adults across various indications. However, we did not find strong evidence that this underreporting leads to different conclusions in meta-analyses. Nonetheless, this finding remains uncertain, as we were unable to obtain complete information for all studies despite extensive searches.</div></div>","PeriodicalId":51079,"journal":{"name":"Journal of Clinical Epidemiology","volume":"180 ","pages":"Article 111677"},"PeriodicalIF":7.3,"publicationDate":"2025-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143015508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}