Pub Date : 2023-09-12DOI: 10.1186/s41512-023-00155-z
Jane I Grove, Camilla Stephens, M Isabel Lucena, Raúl J Andrade, Sabine Weber, Alexander Gerbes, Einar S Bjornsson, Guido Stirnimann, Ann K Daly, Matthias Hackl, Kseniya Khamina-Kotisch, Jose J G Marin, Maria J Monte, Sara A Paciga, Melanie Lingaya, Shiva S Forootan, Christopher E P Goldring, Oliver Poetz, Rudolf Lombaard, Alexandra Stege, Helgi K Bjorrnsson, Mercedes Robles-Diaz, Dingzhou Li, Thi Dong Binh Tran, Shashi K Ramaiah, Sophia L Samodelov, Gerd A Kullak-Ublick, Guruprasad P Aithal
A lack of biomarkers that detect drug-induced liver injury (DILI) accurately continues to hinder early- and late-stage drug development and remains a challenge in clinical practice. The Innovative Medicines Initiative's TransBioLine consortium comprising academic and industry partners is developing a prospective repository of deeply phenotyped cases and controls with biological samples during liver injury progression to facilitate biomarker discovery, evaluation, validation and qualification.In a nested case-control design, patients who meet one of these criteria, alanine transaminase (ALT) ≥ 5 × the upper limit of normal (ULN), alkaline phosphatase ≥ 2 × ULN or ALT ≥ 3 ULN with total bilirubin > 2 × ULN, are enrolled. After completed clinical investigations, Roussel Uclaf Causality Assessment and expert panel review are used to adjudicate episodes as DILI or alternative liver diseases (acute non-DILI controls). Two blood samples are taken: at recruitment and follow-up. Sample size is as follows: 300 cases of DILI and 130 acute non-DILI controls. Additional cross-sectional cohorts (1 visit) are as follows: Healthy volunteers (n = 120), controls with chronic alcohol-related or non-alcoholic fatty liver disease (n = 100 each) and patients with psoriasis or rheumatoid arthritis (n = 100, 50 treated with methotrexate) are enrolled. Candidate biomarkers prioritised for evaluation include osteopontin, glutamate dehydrogenase, cytokeratin-18 (full length and caspase cleaved), macrophage-colony-stimulating factor 1 receptor and high mobility group protein B1 as well as bile acids, sphingolipids and microRNAs. The TransBioLine project is enabling biomarker discovery and validation that could improve detection, diagnostic accuracy and prognostication of DILI in premarketing clinical trials and for clinical healthcare application.
{"title":"Study design for development of novel safety biomarkers of drug-induced liver injury by the translational safety biomarker pipeline (TransBioLine) consortium: a study protocol for a nested case-control study.","authors":"Jane I Grove, Camilla Stephens, M Isabel Lucena, Raúl J Andrade, Sabine Weber, Alexander Gerbes, Einar S Bjornsson, Guido Stirnimann, Ann K Daly, Matthias Hackl, Kseniya Khamina-Kotisch, Jose J G Marin, Maria J Monte, Sara A Paciga, Melanie Lingaya, Shiva S Forootan, Christopher E P Goldring, Oliver Poetz, Rudolf Lombaard, Alexandra Stege, Helgi K Bjorrnsson, Mercedes Robles-Diaz, Dingzhou Li, Thi Dong Binh Tran, Shashi K Ramaiah, Sophia L Samodelov, Gerd A Kullak-Ublick, Guruprasad P Aithal","doi":"10.1186/s41512-023-00155-z","DOIUrl":"10.1186/s41512-023-00155-z","url":null,"abstract":"<p><p>A lack of biomarkers that detect drug-induced liver injury (DILI) accurately continues to hinder early- and late-stage drug development and remains a challenge in clinical practice. The Innovative Medicines Initiative's TransBioLine consortium comprising academic and industry partners is developing a prospective repository of deeply phenotyped cases and controls with biological samples during liver injury progression to facilitate biomarker discovery, evaluation, validation and qualification.In a nested case-control design, patients who meet one of these criteria, alanine transaminase (ALT) ≥ 5 × the upper limit of normal (ULN), alkaline phosphatase ≥ 2 × ULN or ALT ≥ 3 ULN with total bilirubin > 2 × ULN, are enrolled. After completed clinical investigations, Roussel Uclaf Causality Assessment and expert panel review are used to adjudicate episodes as DILI or alternative liver diseases (acute non-DILI controls). Two blood samples are taken: at recruitment and follow-up. Sample size is as follows: 300 cases of DILI and 130 acute non-DILI controls. Additional cross-sectional cohorts (1 visit) are as follows: Healthy volunteers (n = 120), controls with chronic alcohol-related or non-alcoholic fatty liver disease (n = 100 each) and patients with psoriasis or rheumatoid arthritis (n = 100, 50 treated with methotrexate) are enrolled. Candidate biomarkers prioritised for evaluation include osteopontin, glutamate dehydrogenase, cytokeratin-18 (full length and caspase cleaved), macrophage-colony-stimulating factor 1 receptor and high mobility group protein B1 as well as bile acids, sphingolipids and microRNAs. The TransBioLine project is enabling biomarker discovery and validation that could improve detection, diagnostic accuracy and prognostication of DILI in premarketing clinical trials and for clinical healthcare application.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"7 1","pages":"18"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10496294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10588098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-09-05DOI: 10.1186/s41512-023-00154-0
Jeroen Hoogland, Toshihiko Takada, Maarten van Smeden, Maroeska M Rovers, An I de Sutter, Daniel Merenstein, Laurent Kaiser, Helena Liira, Paul Little, Heiner C Bucher, Karel G M Moons, Johannes B Reitsma, Roderick P Venekamp
Background: A previous individual participant data meta-analysis (IPD-MA) of antibiotics for adults with clinically diagnosed acute rhinosinusitis (ARS) showed a marginal overall effect of antibiotics, but was unable to identify patients that are most likely to benefit from antibiotics when applying conventional (i.e. univariable or one-variable-at-a-time) subgroup analysis. We updated the systematic review and investigated whether multivariable prediction of patient-level prognosis and antibiotic treatment effect may lead to more tailored treatment assignment in adults presenting to primary care with ARS.
Methods: An IPD-MA of nine double-blind placebo-controlled trials of antibiotic treatment (n=2539) was conducted, with the probability of being cured at 8-15 days as the primary outcome. A logistic mixed effects model was developed to predict the probability of being cured based on demographic characteristics, signs and symptoms, and antibiotic treatment assignment. Predictive performance was quantified based on internal-external cross-validation in terms of calibration and discrimination performance, overall model fit, and the accuracy of individual predictions.
Results: Results indicate that the prognosis with respect to risk of cure could not be reliably predicted (c-statistic 0.58 and Brier score 0.24). Similarly, patient-level treatment effect predictions did not reliably distinguish between those that did and did not benefit from antibiotics (c-for-benefit 0.50).
Conclusions: In conclusion, multivariable prediction based on patient demographics and common signs and symptoms did not reliably predict the patient-level probability of cure and antibiotic effect in this IPD-MA. Therefore, these characteristics cannot be expected to reliably distinguish those that do and do not benefit from antibiotics in adults presenting to primary care with ARS.
{"title":"Prognosis and prediction of antibiotic benefit in adults with clinically diagnosed acute rhinosinusitis: an individual participant data meta-analysis.","authors":"Jeroen Hoogland, Toshihiko Takada, Maarten van Smeden, Maroeska M Rovers, An I de Sutter, Daniel Merenstein, Laurent Kaiser, Helena Liira, Paul Little, Heiner C Bucher, Karel G M Moons, Johannes B Reitsma, Roderick P Venekamp","doi":"10.1186/s41512-023-00154-0","DOIUrl":"10.1186/s41512-023-00154-0","url":null,"abstract":"<p><strong>Background: </strong>A previous individual participant data meta-analysis (IPD-MA) of antibiotics for adults with clinically diagnosed acute rhinosinusitis (ARS) showed a marginal overall effect of antibiotics, but was unable to identify patients that are most likely to benefit from antibiotics when applying conventional (i.e. univariable or one-variable-at-a-time) subgroup analysis. We updated the systematic review and investigated whether multivariable prediction of patient-level prognosis and antibiotic treatment effect may lead to more tailored treatment assignment in adults presenting to primary care with ARS.</p><p><strong>Methods: </strong>An IPD-MA of nine double-blind placebo-controlled trials of antibiotic treatment (n=2539) was conducted, with the probability of being cured at 8-15 days as the primary outcome. A logistic mixed effects model was developed to predict the probability of being cured based on demographic characteristics, signs and symptoms, and antibiotic treatment assignment. Predictive performance was quantified based on internal-external cross-validation in terms of calibration and discrimination performance, overall model fit, and the accuracy of individual predictions.</p><p><strong>Results: </strong>Results indicate that the prognosis with respect to risk of cure could not be reliably predicted (c-statistic 0.58 and Brier score 0.24). Similarly, patient-level treatment effect predictions did not reliably distinguish between those that did and did not benefit from antibiotics (c-for-benefit 0.50).</p><p><strong>Conclusions: </strong>In conclusion, multivariable prediction based on patient demographics and common signs and symptoms did not reliably predict the patient-level probability of cure and antibiotic effect in this IPD-MA. Therefore, these characteristics cannot be expected to reliably distinguish those that do and do not benefit from antibiotics in adults presenting to primary care with ARS.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"7 1","pages":"16"},"PeriodicalIF":0.0,"publicationDate":"2023-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10478354/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10168341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-18DOI: 10.1186/s41512-023-00153-1
S Faye Williamson, Cameron J Williams, B Clare Lendrem, Kevin J Wilson
Background: In a pandemic setting, it is critical to evaluate and deploy accurate diagnostic tests rapidly. This relies heavily on the sample size chosen to assess the test accuracy (e.g. sensitivity and specificity) during the diagnostic accuracy study. Too small a sample size will lead to imprecise estimates of the accuracy measures, whereas too large a sample size may delay the development process unnecessarily. This study considers use of a Bayesian method to guide sample size determination for diagnostic accuracy studies, with application to COVID-19 rapid viral detection tests. Specifically, we investigate whether utilising existing information (e.g. from preceding laboratory studies) within a Bayesian framework can reduce the required sample size, whilst maintaining test accuracy to the desired precision.
Methods: The method presented is based on the Bayesian concept of assurance which, in this context, represents the unconditional probability that a diagnostic accuracy study yields sensitivity and/or specificity intervals with the desired precision. We conduct a simulation study to evaluate the performance of this approach in a variety of COVID-19 settings, and compare it to commonly used power-based methods. An accompanying interactive web application is available, which can be used by researchers to perform the sample size calculations.
Results: Results show that the Bayesian assurance method can reduce the required sample size for COVID-19 diagnostic accuracy studies, compared to standard methods, by making better use of laboratory data, without loss of performance. Increasing the size of the laboratory study can further reduce the required sample size in the diagnostic accuracy study.
Conclusions: The method considered in this paper is an important advancement for increasing the efficiency of the evidence development pathway. It has highlighted that the trade-off between lab study sample size and diagnostic accuracy study sample size should be carefully considered, since establishing an adequate lab sample size can bring longer-term gains. Although emphasis is on its use in the COVID-19 pandemic setting, where we envisage it will have the most impact, it can be usefully applied in other clinical areas.
{"title":"Sample size determination for point-of-care COVID-19 diagnostic tests: a Bayesian approach.","authors":"S Faye Williamson, Cameron J Williams, B Clare Lendrem, Kevin J Wilson","doi":"10.1186/s41512-023-00153-1","DOIUrl":"10.1186/s41512-023-00153-1","url":null,"abstract":"<p><strong>Background: </strong>In a pandemic setting, it is critical to evaluate and deploy accurate diagnostic tests rapidly. This relies heavily on the sample size chosen to assess the test accuracy (e.g. sensitivity and specificity) during the diagnostic accuracy study. Too small a sample size will lead to imprecise estimates of the accuracy measures, whereas too large a sample size may delay the development process unnecessarily. This study considers use of a Bayesian method to guide sample size determination for diagnostic accuracy studies, with application to COVID-19 rapid viral detection tests. Specifically, we investigate whether utilising existing information (e.g. from preceding laboratory studies) within a Bayesian framework can reduce the required sample size, whilst maintaining test accuracy to the desired precision.</p><p><strong>Methods: </strong>The method presented is based on the Bayesian concept of assurance which, in this context, represents the unconditional probability that a diagnostic accuracy study yields sensitivity and/or specificity intervals with the desired precision. We conduct a simulation study to evaluate the performance of this approach in a variety of COVID-19 settings, and compare it to commonly used power-based methods. An accompanying interactive web application is available, which can be used by researchers to perform the sample size calculations.</p><p><strong>Results: </strong>Results show that the Bayesian assurance method can reduce the required sample size for COVID-19 diagnostic accuracy studies, compared to standard methods, by making better use of laboratory data, without loss of performance. Increasing the size of the laboratory study can further reduce the required sample size in the diagnostic accuracy study.</p><p><strong>Conclusions: </strong>The method considered in this paper is an important advancement for increasing the efficiency of the evidence development pathway. It has highlighted that the trade-off between lab study sample size and diagnostic accuracy study sample size should be carefully considered, since establishing an adequate lab sample size can bring longer-term gains. Although emphasis is on its use in the COVID-19 pandemic setting, where we envisage it will have the most impact, it can be usefully applied in other clinical areas.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"7 1","pages":"17"},"PeriodicalIF":0.0,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10436636/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10038806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-03DOI: 10.1186/s41512-023-00152-2
Daniel Molano-Franco, Ingrid Arevalo-Rodriguez, Alfonso Muriel, Laura Del Campo-Albendea, Silvia Fernández-García, Ana Alvarez-Méndez, Daniel Simancas-Racines, Andres Viteri, Guillermo Sanchez, Borja Fernandez-Felix, Jesus Lopez-Alcalde, Ivan Solà, Dimelza Osorio, Khalid Saeed Khan, Xavier Nuvials, Ricard Ferrer, Javier Zamora
Background: Numerous biomarkers have been proposed for diagnosis, therapeutic, and prognosis in sepsis. Previous evaluations of the value of biomarkers for predicting mortality due to this life-threatening condition fail to address the complexity of this condition and the risk of bias associated with prognostic studies. We evaluate the predictive performance of four of these biomarkers in the prognosis of mortality through a methodologically sound evaluation.
Methods: We conducted a systematic review a systematic review and meta-analysis to determine, in critically ill adults with sepsis, whether procalcitonin (PCT), C-reactive protein (CRP), interleukin-6 (IL-6), and presepsin (sCD14) are independent prognostic factors for mortality. We searched MEDLINE, EMBASE, and the Cochrane Central Register of Controlled Trials up to March 2023. Only Phase-2 confirmatory prognostic factor studies among critically ill septic adults were included. Random effects meta-analyses pooled the prognostic association estimates.
Results: We included 60 studies (15,681 patients) with 99 biomarker assessments. Quality of the statistical analysis and reporting domains using the QUIPS tool showed high risk of bias in > 60% assessments. The biomarker measurement as a continuous variable in models adjusted by key covariates (age and severity score) for predicting mortality at 28-30 days showed a null or near to null association for basal PCT (pooled OR = 0.99, 95% CI = 0.99-1.003), CRP (OR = 1.01, 95% CI = 0.87 to 1.17), and IL-6 (OR = 1.02, 95% CI = 1.01-1.03) and sCD14 (pooled HR = 1.003, 95% CI = 1.000 to 1.006). Additional meta-analyses accounting for other prognostic covariates had similarly null findings.
Conclusion: Baseline, isolated measurement of PCT, CRP, IL-6, and sCD14 has not been shown to help predict mortality in critically ill patients with sepsis. The role of these biomarkers should be evaluated in new studies where the patient selection would be standardized and the measurement of biomarker results.
Trial registration: PROSPERO (CRD42019128790).
背景:许多生物标志物已被提出用于败血症的诊断、治疗和预后。先前对生物标志物在预测这种危及生命的疾病的死亡率方面的价值的评估未能解决这种疾病的复杂性和与预后研究相关的偏倚风险。我们通过方法学上合理的评估来评估四种生物标志物在死亡率预后中的预测性能。方法:我们进行了一项系统综述和荟萃分析,以确定危重症脓毒症成人患者降钙素原(PCT)、c反应蛋白(CRP)、白细胞介素-6 (IL-6)和尿蛋白酶素(sCD14)是否是死亡率的独立预后因素。我们检索了截至2023年3月的MEDLINE、EMBASE和Cochrane Central Register of Controlled Trials。仅纳入了危重脓毒症成人的2期确诊预后因素研究。随机效应荟萃分析汇总了预后关联估计。结果:我们纳入了60项研究(15681例患者),99项生物标志物评估。使用QUIPS工具的统计分析和报告领域的质量在评估中显示出高达60%的高偏倚风险。在由关键协变量(年龄和严重程度评分)调整的模型中,作为预测28-30天死亡率的连续变量的生物标志物测量显示,基础PCT(合并or = 0.99, 95% CI = 0.99-1.003)、CRP (or = 1.01, 95% CI = 0.87 - 1.17)、IL-6 (or = 1.02, 95% CI = 1.01-1.03)和sCD14(合并HR = 1.003, 95% CI = 1.000 - 1.006)的相关性为零或接近零。考虑其他预后协变量的其他荟萃分析也有类似的无效结果。结论:基线、单独测量PCT、CRP、IL-6和sCD14并不能帮助预测危重症脓毒症患者的死亡率。这些生物标志物的作用应该在新的研究中进行评估,在这些研究中,患者的选择和生物标志物结果的测量将是标准化的。试验注册:PROSPERO (CRD42019128790)。
{"title":"Basal procalcitonin, C-reactive protein, interleukin-6, and presepsin for prediction of mortality in critically ill septic patients: a systematic review and meta-analysis.","authors":"Daniel Molano-Franco, Ingrid Arevalo-Rodriguez, Alfonso Muriel, Laura Del Campo-Albendea, Silvia Fernández-García, Ana Alvarez-Méndez, Daniel Simancas-Racines, Andres Viteri, Guillermo Sanchez, Borja Fernandez-Felix, Jesus Lopez-Alcalde, Ivan Solà, Dimelza Osorio, Khalid Saeed Khan, Xavier Nuvials, Ricard Ferrer, Javier Zamora","doi":"10.1186/s41512-023-00152-2","DOIUrl":"10.1186/s41512-023-00152-2","url":null,"abstract":"<p><strong>Background: </strong>Numerous biomarkers have been proposed for diagnosis, therapeutic, and prognosis in sepsis. Previous evaluations of the value of biomarkers for predicting mortality due to this life-threatening condition fail to address the complexity of this condition and the risk of bias associated with prognostic studies. We evaluate the predictive performance of four of these biomarkers in the prognosis of mortality through a methodologically sound evaluation.</p><p><strong>Methods: </strong>We conducted a systematic review a systematic review and meta-analysis to determine, in critically ill adults with sepsis, whether procalcitonin (PCT), C-reactive protein (CRP), interleukin-6 (IL-6), and presepsin (sCD14) are independent prognostic factors for mortality. We searched MEDLINE, EMBASE, and the Cochrane Central Register of Controlled Trials up to March 2023. Only Phase-2 confirmatory prognostic factor studies among critically ill septic adults were included. Random effects meta-analyses pooled the prognostic association estimates.</p><p><strong>Results: </strong>We included 60 studies (15,681 patients) with 99 biomarker assessments. Quality of the statistical analysis and reporting domains using the QUIPS tool showed high risk of bias in > 60% assessments. The biomarker measurement as a continuous variable in models adjusted by key covariates (age and severity score) for predicting mortality at 28-30 days showed a null or near to null association for basal PCT (pooled OR = 0.99, 95% CI = 0.99-1.003), CRP (OR = 1.01, 95% CI = 0.87 to 1.17), and IL-6 (OR = 1.02, 95% CI = 1.01-1.03) and sCD14 (pooled HR = 1.003, 95% CI = 1.000 to 1.006). Additional meta-analyses accounting for other prognostic covariates had similarly null findings.</p><p><strong>Conclusion: </strong>Baseline, isolated measurement of PCT, CRP, IL-6, and sCD14 has not been shown to help predict mortality in critically ill patients with sepsis. The role of these biomarkers should be evaluated in new studies where the patient selection would be standardized and the measurement of biomarker results.</p><p><strong>Trial registration: </strong>PROSPERO (CRD42019128790).</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"7 1","pages":"15"},"PeriodicalIF":0.0,"publicationDate":"2023-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10399020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9938887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Rapid antigen tests detecting SARS-CoV-2 were shown to be a useful tool in managing the COVID-19 pandemic. Here, we report on the results of a prospective diagnostic accuracy study of four SARS-CoV-2 rapid antigen tests in a South African setting.
Methods: Rapid antigen test evaluations were performed through drive-through testing centres in Durban, South Africa, from July to December 2021. Two evaluation studies were performed: nasal Panbio COVID-19 Ag Rapid Test Device (Abbott) was evaluated in parallel with the nasopharyngeal Espline SARS-CoV-2 Ag test (Fujirebio), followed by the evaluation of nasal RightSign COVID-19 Antigen Rapid test Cassette (Hangzhou Biotest Biotech) in parallel with the nasopharyngeal STANDARD Q COVID-19 Ag test (SD Biosensor). The Abbott RealTime SARS-CoV-2 assay was used as a reference test.
Results: Evaluation of Panbio and Espline Ag tests was performed on 494 samples (31% positivity), while the evaluation of Standard Q and RightTest Ag tests was performed on 539 samples (13.17% positivity). The overall sensitivity for all four tests ranged between 60 and 72% with excellent specificity values (> 98%). Sensitivity increased to > 80% in all tests in samples with cycle number value < 20. All four tests performed best in samples from patients presenting within the first week of symptom onset.
Conclusions: All four evaluated tests detected a majority of the cases within the first week of symptom onset with high viral load.
背景:检测 SARS-CoV-2 的快速抗原检测被证明是控制 COVID-19 大流行的有效工具。在此,我们报告了在南非环境中对四种 SARS-CoV-2 快速抗原检测方法的诊断准确性进行前瞻性研究的结果:方法:2021 年 7 月至 12 月,通过南非德班的驾车检测中心对快速抗原检测进行了评估。共进行了两项评估研究:鼻腔 Panbio COVID-19 Ag 快速检测装置(雅培)与鼻咽 Espline SARS-CoV-2 Ag 检测试剂盒(富士生物)同时进行评估;鼻腔 RightSign COVID-19 抗原快速检测试剂盒(杭州百特生物技术有限公司)与鼻咽 STANDARD Q COVID-19 Ag 检测试剂盒(SD Biosensor)同时进行评估。雅培 RealTime SARS-CoV-2 检测试剂盒被用作参照检测试剂盒:Panbio 和 Espline Ag 检测试剂盒对 494 份样本进行了评估(阳性率为 31%),而 Standard Q 和 RightTest Ag 检测试剂盒对 539 份样本进行了评估(阳性率为 13.17%)。所有四种检测方法的总体灵敏度在 60% 到 72% 之间,特异性极高(> 98%)。在周期数值为结论的样本中,所有检测的灵敏度都提高到了 80%以上:所评估的四种检测方法都能在症状出现的第一周内检测出大部分高病毒载量病例。
{"title":"Field evaluations of four SARS-CoV-2 rapid antigen tests during SARS-CoV-2 Delta variant wave in South Africa.","authors":"Natasha Samsunder, Gila Lustig, Slindile Ngubane, Thando Glory Maseko, Santhuri Rambaran, Sinaye Ngcapu, Stanley Nzuzo Magini, Lara Lewis, Cherie Cawood, Ayesha B M Kharsany, Quarraisha Abdool Karim, Salim Abdool Karim, Kogieleum Naidoo, Aida Sivro","doi":"10.1186/s41512-023-00151-3","DOIUrl":"10.1186/s41512-023-00151-3","url":null,"abstract":"<p><strong>Background: </strong>Rapid antigen tests detecting SARS-CoV-2 were shown to be a useful tool in managing the COVID-19 pandemic. Here, we report on the results of a prospective diagnostic accuracy study of four SARS-CoV-2 rapid antigen tests in a South African setting.</p><p><strong>Methods: </strong>Rapid antigen test evaluations were performed through drive-through testing centres in Durban, South Africa, from July to December 2021. Two evaluation studies were performed: nasal Panbio COVID-19 Ag Rapid Test Device (Abbott) was evaluated in parallel with the nasopharyngeal Espline SARS-CoV-2 Ag test (Fujirebio), followed by the evaluation of nasal RightSign COVID-19 Antigen Rapid test Cassette (Hangzhou Biotest Biotech) in parallel with the nasopharyngeal STANDARD Q COVID-19 Ag test (SD Biosensor). The Abbott RealTime SARS-CoV-2 assay was used as a reference test.</p><p><strong>Results: </strong>Evaluation of Panbio and Espline Ag tests was performed on 494 samples (31% positivity), while the evaluation of Standard Q and RightTest Ag tests was performed on 539 samples (13.17% positivity). The overall sensitivity for all four tests ranged between 60 and 72% with excellent specificity values (> 98%). Sensitivity increased to > 80% in all tests in samples with cycle number value < 20. All four tests performed best in samples from patients presenting within the first week of symptom onset.</p><p><strong>Conclusions: </strong>All four evaluated tests detected a majority of the cases within the first week of symptom onset with high viral load.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"7 1","pages":"14"},"PeriodicalIF":0.0,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369830/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10240470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-13DOI: 10.1186/s41512-023-00150-4
Robert Touitou, Philippe Bidet, Constance Dubois, Henri Partouche, Stéphane Bonacorsi, Camille Jung, Robert Cohen, Corinne Levy, Jérémie F Cohen
Background: Group A streptococcus is found in 20-40% of cases of childhood pharyngitis; the remaining cases are viral. Streptococcal pharyngitis ("strep throat") is usually treated with antibiotics, while these are not indicated in viral cases. Most guidelines recommend relying on a diagnostic test confirming the presence of group A streptococcus before prescribing antibiotics. Conventional first-line tests are rapid antigen detection tests based on throat swabs. Recently, rapid nucleic acid tests were developed; they allow the detection of elements of the genome of group A streptococcus. We hypothesize that these rapid nucleic acid tests are sensitive enough to be performed on saliva samples instead of throat swabs, which could be more convenient in practice.
Methods: This is a multicenter, prospective diagnostic accuracy study evaluating the performance of a rapid nucleic acid test for group A streptococcus (Abbott ID NOW STREP A2) in saliva, compared with a conventional pharyngeal rapid antigen detection test (EXACTO PRO STREPTATEST, lateral flow assay, comparator test), with a composite reference standard of throat culture and group A streptococcus PCR in children with pharyngitis in primary care (i.e., 27 primary care pediatricians or general practitioners). To ensure group A streptococcus is not missed, the salivary rapid nucleic acid test requires a minimally acceptable value of sensitivity (primary outcome) set at 80%. Assuming 35% of participants will have group A streptococcus, we will recruit 800 consecutive children with pharyngitis. Secondary outcomes will include difference in sensitivity between the pharyngeal rapid antigen detection test and the salivary rapid nucleic acid test; variability in sensitivity and specificity of the salivary rapid nucleic acid test with the level of McIsaac score; time to obtain the result of the salivary rapid nucleic acid test; patient, physician, and parents satisfaction; and barriers and facilitators to using rapid tests for group A streptococcus in primary care.
Ethics and dissemination: Approved by the Institutional Review Board "Comité de protection des personnes Ile de France I" (no. 2022-A00085-38). Results will be presented at international meetings and disseminated in peer-reviewed journals.
背景:在 20-40% 的儿童咽炎病例中发现有 A 组链球菌,其余病例则为病毒性咽炎。链球菌性咽炎("链球菌性咽喉炎")通常使用抗生素治疗,而病毒性咽炎则不使用抗生素。大多数指南建议,在开具抗生素处方之前,应先进行诊断性检测,确认是否存在 A 组链球菌。传统的一线检测是基于咽拭子的快速抗原检测。最近,又开发出了快速核酸检测法;这种检测法可以检测到 A 组链球菌基因组的元素。我们假设,这些快速核酸检测的灵敏度足以取代咽拭子,在唾液样本中进行检测,这在实践中可能更方便:这是一项多中心、前瞻性诊断准确性研究,目的是评估唾液中 A 组链球菌快速核酸检测试剂盒(雅培 ID NOW STREP A2)与传统的咽部快速抗原检测试剂盒(EXACTO PRO STREPTATEST,横向流检测试剂盒,比较试剂盒)的性能,并将咽培养和 A 组链球菌 PCR 作为基层医疗机构咽炎患儿(即 27 名基层儿科医生或全科医生)的综合参考标准。为确保不漏检 A 组链球菌,唾液快速核酸检测的灵敏度(主要结果)要求达到 80% 的最低可接受值。假设 35% 的参与者会感染 A 组链球菌,我们将连续招募 800 名咽炎患儿。次要结果将包括:咽部快速抗原检测试验与唾液快速核酸试验的灵敏度差异;唾液快速核酸试验的灵敏度和特异性随 McIsaac 评分水平的变化;获得唾液快速核酸试验结果的时间;患者、医生和家长的满意度;以及在初级保健中使用 A 组链球菌快速试验的障碍和促进因素:伦理与传播:已获机构审查委员会 "Comité de protection des personnes Ile de France I "批准(编号:2022-A00085-38)。试验结果将在国际会议上公布,并在同行评审期刊上发表:试验注册号:ClinicalTrials.gov:NCT05521568.
{"title":"Diagnostic accuracy of a rapid nucleic acid test for group A streptococcal pharyngitis using saliva samples: protocol for a prospective multicenter study in primary care.","authors":"Robert Touitou, Philippe Bidet, Constance Dubois, Henri Partouche, Stéphane Bonacorsi, Camille Jung, Robert Cohen, Corinne Levy, Jérémie F Cohen","doi":"10.1186/s41512-023-00150-4","DOIUrl":"10.1186/s41512-023-00150-4","url":null,"abstract":"<p><strong>Background: </strong>Group A streptococcus is found in 20-40% of cases of childhood pharyngitis; the remaining cases are viral. Streptococcal pharyngitis (\"strep throat\") is usually treated with antibiotics, while these are not indicated in viral cases. Most guidelines recommend relying on a diagnostic test confirming the presence of group A streptococcus before prescribing antibiotics. Conventional first-line tests are rapid antigen detection tests based on throat swabs. Recently, rapid nucleic acid tests were developed; they allow the detection of elements of the genome of group A streptococcus. We hypothesize that these rapid nucleic acid tests are sensitive enough to be performed on saliva samples instead of throat swabs, which could be more convenient in practice.</p><p><strong>Methods: </strong>This is a multicenter, prospective diagnostic accuracy study evaluating the performance of a rapid nucleic acid test for group A streptococcus (Abbott ID NOW STREP A2) in saliva, compared with a conventional pharyngeal rapid antigen detection test (EXACTO PRO STREPTATEST, lateral flow assay, comparator test), with a composite reference standard of throat culture and group A streptococcus PCR in children with pharyngitis in primary care (i.e., 27 primary care pediatricians or general practitioners). To ensure group A streptococcus is not missed, the salivary rapid nucleic acid test requires a minimally acceptable value of sensitivity (primary outcome) set at 80%. Assuming 35% of participants will have group A streptococcus, we will recruit 800 consecutive children with pharyngitis. Secondary outcomes will include difference in sensitivity between the pharyngeal rapid antigen detection test and the salivary rapid nucleic acid test; variability in sensitivity and specificity of the salivary rapid nucleic acid test with the level of McIsaac score; time to obtain the result of the salivary rapid nucleic acid test; patient, physician, and parents satisfaction; and barriers and facilitators to using rapid tests for group A streptococcus in primary care.</p><p><strong>Ethics and dissemination: </strong>Approved by the Institutional Review Board \"Comité de protection des personnes Ile de France I\" (no. 2022-A00085-38). Results will be presented at international meetings and disseminated in peer-reviewed journals.</p><p><strong>Trial registration number: </strong>ClinicalTrials.gov: NCT05521568.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"7 1","pages":"13"},"PeriodicalIF":0.0,"publicationDate":"2023-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10347703/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10193771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-13DOI: 10.1186/s41512-023-00149-x
Maxime Pautrat, Remy Palluau, Loic Druilhe, Jean Pierre Lebeau
Background: Clinical scores help physicians to make clinical decisions, and some are recommended by health authorities for primary care use. As an increasing number of scores are becoming available, there is a need to understand general practitioner expectations for their use in primary care. The aim of this study was to explore general practitioner opinions about using scores in general practice.
Method: This qualitative study, with a grounded theory approach, used focus groups with general practitioners recruited from their own surgeries to obtain verbatim. Two investigators performed verbatim analysis to ensure data triangulation. The verbatim was double-blind labeled for inductive categorization to conceptualize score use in general practice.
Results: Five focus groups were planned, 21 general practitioners from central France participated. Participants appreciated scores for their clinical efficacy but felt that they were difficult to use in primary care. Their opinions revolved around validity, acceptability, and feasibility. Participants have little regard for score validity, they felt many scores are difficult to accept and do not capture contextual and human elements. Participants also felt that scores are unfeasible for primary care use. There are too many, they are hard to find, and either too short or too long. They also felt that scores were complex to administer and took up time for both patient and physician. Many participants felt learned societies should choose appropriate scores.
Discussion: This study conceptualizes general practitioner opinions about score use in primary care. The participants weighed up score effectiveness with efficiency. For some participants, scores helped make decisions faster, others expressed being disappointed with the lack of patient-centeredness and limited bio-psycho-social approach.
{"title":"Exploring the general practitioners' point of view about clinical scores: a qualitative study.","authors":"Maxime Pautrat, Remy Palluau, Loic Druilhe, Jean Pierre Lebeau","doi":"10.1186/s41512-023-00149-x","DOIUrl":"https://doi.org/10.1186/s41512-023-00149-x","url":null,"abstract":"<p><strong>Background: </strong>Clinical scores help physicians to make clinical decisions, and some are recommended by health authorities for primary care use. As an increasing number of scores are becoming available, there is a need to understand general practitioner expectations for their use in primary care. The aim of this study was to explore general practitioner opinions about using scores in general practice.</p><p><strong>Method: </strong>This qualitative study, with a grounded theory approach, used focus groups with general practitioners recruited from their own surgeries to obtain verbatim. Two investigators performed verbatim analysis to ensure data triangulation. The verbatim was double-blind labeled for inductive categorization to conceptualize score use in general practice.</p><p><strong>Results: </strong>Five focus groups were planned, 21 general practitioners from central France participated. Participants appreciated scores for their clinical efficacy but felt that they were difficult to use in primary care. Their opinions revolved around validity, acceptability, and feasibility. Participants have little regard for score validity, they felt many scores are difficult to accept and do not capture contextual and human elements. Participants also felt that scores are unfeasible for primary care use. There are too many, they are hard to find, and either too short or too long. They also felt that scores were complex to administer and took up time for both patient and physician. Many participants felt learned societies should choose appropriate scores.</p><p><strong>Discussion: </strong>This study conceptualizes general practitioner opinions about score use in primary care. The participants weighed up score effectiveness with efficiency. For some participants, scores helped make decisions faster, others expressed being disappointed with the lack of patient-centeredness and limited bio-psycho-social approach.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"7 1","pages":"12"},"PeriodicalIF":0.0,"publicationDate":"2023-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10262349/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10011074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-06DOI: 10.1186/s41512-023-00148-y
Andrew J Vickers, Ben Van Claster, Laure Wynants, Ewout W Steyerberg
Background: A number of recent papers have proposed methods to calculate confidence intervals and p values for net benefit used in decision curve analysis. These papers are sparse on the rationale for doing so. We aim to assess the relation between sampling variability, inference, and decision-analytic concepts.
Methods and results: We review the underlying theory of decision analysis. When we are forced into a decision, we should choose the option with the highest expected utility, irrespective of p values or uncertainty. This is in some distinction to traditional hypothesis testing, where a decision such as whether to reject a given hypothesis can be postponed. Application of inference for net benefit would generally be harmful. In particular, insisting that differences in net benefit be statistically significant would dramatically change the criteria by which we consider a prediction model to be of value. We argue instead that uncertainty related to sampling variation for net benefit should be thought of in terms of the value of further research. Decision analysis tells us which decision to make for now, but we may also want to know how much confidence we should have in that decision. If we are insufficiently confident that we are right, further research is warranted.
Conclusion: Null hypothesis testing or simple consideration of confidence intervals are of questionable value for decision curve analysis, and methods such as value of information analysis or approaches to assess the probability of benefit should be considered instead.
背景:最近有多篇论文提出了计算决策曲线分析中净效益的置信区间和 p 值的方法。这些论文对这样做的理由论述不多。我们旨在评估抽样变异性、推论和决策分析概念之间的关系:我们回顾了决策分析的基本理论。当我们被迫做出决策时,无论 p 值或不确定性如何,我们都应选择预期效用最大的选项。这与传统的假设检验有所不同,在传统的假设检验中,是否拒绝某一假设等决策可以推迟。应用净效益推论通常是有害的。特别是,坚持净收益的差异必须在统计上具有显著性,这将极大地改变我们认为预测模型具有价值的标准。相反,我们认为,与净收益抽样变化有关的不确定性应从进一步研究的价值角度来考虑。决策分析告诉我们现在应该做什么决策,但我们可能还想知道我们对该决策应该有多大的信心。如果我们对自己的正确性没有足够的信心,就有必要开展进一步的研究:结论:零假设检验或简单地考虑置信区间对决策曲线分析的价值值得怀疑,应考虑信息价值分析或评估受益概率等方法。
{"title":"Decision curve analysis: confidence intervals and hypothesis testing for net benefit.","authors":"Andrew J Vickers, Ben Van Claster, Laure Wynants, Ewout W Steyerberg","doi":"10.1186/s41512-023-00148-y","DOIUrl":"10.1186/s41512-023-00148-y","url":null,"abstract":"<p><strong>Background: </strong>A number of recent papers have proposed methods to calculate confidence intervals and p values for net benefit used in decision curve analysis. These papers are sparse on the rationale for doing so. We aim to assess the relation between sampling variability, inference, and decision-analytic concepts.</p><p><strong>Methods and results: </strong>We review the underlying theory of decision analysis. When we are forced into a decision, we should choose the option with the highest expected utility, irrespective of p values or uncertainty. This is in some distinction to traditional hypothesis testing, where a decision such as whether to reject a given hypothesis can be postponed. Application of inference for net benefit would generally be harmful. In particular, insisting that differences in net benefit be statistically significant would dramatically change the criteria by which we consider a prediction model to be of value. We argue instead that uncertainty related to sampling variation for net benefit should be thought of in terms of the value of further research. Decision analysis tells us which decision to make for now, but we may also want to know how much confidence we should have in that decision. If we are insufficiently confident that we are right, further research is warranted.</p><p><strong>Conclusion: </strong>Null hypothesis testing or simple consideration of confidence intervals are of questionable value for decision curve analysis, and methods such as value of information analysis or approaches to assess the probability of benefit should be considered instead.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"7 1","pages":"11"},"PeriodicalIF":0.0,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10243069/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9962890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-16DOI: 10.1186/s41512-023-00147-z
Yuan Xia, Paul Gustafson, Mohsen Sadatsafavi
Prediction algorithms that quantify the expected benefit of a given treatment conditional on patient characteristics can critically inform medical decisions. Quantifying the performance of treatment benefit prediction algorithms is an active area of research. A recently proposed metric, the concordance statistic for benefit (cfb), evaluates the discriminative ability of a treatment benefit predictor by directly extending the concept of the concordance statistic from a risk model with a binary outcome to a model for treatment benefit. In this work, we scrutinize cfb on multiple fronts. Through numerical examples and theoretical developments, we show that cfb is not a proper scoring rule. We also show that it is sensitive to the unestimable correlation between counterfactual outcomes and to the definition of matched pairs. We argue that measures of statistical dispersion applied to predicted benefits do not suffer from these issues and can be an alternative metric for the discriminatory performance of treatment benefit predictors.
{"title":"Methodological concerns about \"concordance-statistic for benefit\" as a measure of discrimination in predicting treatment benefit.","authors":"Yuan Xia, Paul Gustafson, Mohsen Sadatsafavi","doi":"10.1186/s41512-023-00147-z","DOIUrl":"10.1186/s41512-023-00147-z","url":null,"abstract":"<p><p>Prediction algorithms that quantify the expected benefit of a given treatment conditional on patient characteristics can critically inform medical decisions. Quantifying the performance of treatment benefit prediction algorithms is an active area of research. A recently proposed metric, the concordance statistic for benefit (cfb), evaluates the discriminative ability of a treatment benefit predictor by directly extending the concept of the concordance statistic from a risk model with a binary outcome to a model for treatment benefit. In this work, we scrutinize cfb on multiple fronts. Through numerical examples and theoretical developments, we show that cfb is not a proper scoring rule. We also show that it is sensitive to the unestimable correlation between counterfactual outcomes and to the definition of matched pairs. We argue that measures of statistical dispersion applied to predicted benefits do not suffer from these issues and can be an alternative metric for the discriminatory performance of treatment benefit predictors.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"7 1","pages":"10"},"PeriodicalIF":0.0,"publicationDate":"2023-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10186693/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9478062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-02DOI: 10.1186/s41512-023-00146-0
Angelika Geroldinger, Lara Lusa, Mariana Nold, Georg Heinze
Background: The performance of models for binary outcomes can be described by measures such as the concordance statistic (c-statistic, area under the curve), the discrimination slope, or the Brier score. At internal validation, data resampling techniques, e.g., cross-validation, are frequently employed to correct for optimism in these model performance criteria. Especially with small samples or rare events, leave-one-out cross-validation is a popular choice.
Methods: Using simulations and a real data example, we compared the effect of different resampling techniques on the estimation of c-statistics, discrimination slopes, and Brier scores for three estimators of logistic regression models, including the maximum likelihood and two maximum penalized likelihood estimators.
Results: Our simulation study confirms earlier studies reporting that leave-one-out cross-validated c-statistics can be strongly biased towards zero. In addition, our study reveals that this bias is even more pronounced for model estimators shrinking estimated probabilities towards the observed event fraction, such as ridge regression. Leave-one-out cross-validation also provided pessimistic estimates of the discrimination slope but nearly unbiased estimates of the Brier score.
Conclusions: We recommend to use leave-pair-out cross-validation, fivefold cross-validation with repetitions, the enhanced or the .632+ bootstrap to estimate c-statistics, and leave-pair-out or fivefold cross-validation to estimate discrimination slopes.
{"title":"Leave-one-out cross-validation, penalization, and differential bias of some prediction model performance measures-a simulation study.","authors":"Angelika Geroldinger, Lara Lusa, Mariana Nold, Georg Heinze","doi":"10.1186/s41512-023-00146-0","DOIUrl":"https://doi.org/10.1186/s41512-023-00146-0","url":null,"abstract":"<p><strong>Background: </strong>The performance of models for binary outcomes can be described by measures such as the concordance statistic (c-statistic, area under the curve), the discrimination slope, or the Brier score. At internal validation, data resampling techniques, e.g., cross-validation, are frequently employed to correct for optimism in these model performance criteria. Especially with small samples or rare events, leave-one-out cross-validation is a popular choice.</p><p><strong>Methods: </strong>Using simulations and a real data example, we compared the effect of different resampling techniques on the estimation of c-statistics, discrimination slopes, and Brier scores for three estimators of logistic regression models, including the maximum likelihood and two maximum penalized likelihood estimators.</p><p><strong>Results: </strong>Our simulation study confirms earlier studies reporting that leave-one-out cross-validated c-statistics can be strongly biased towards zero. In addition, our study reveals that this bias is even more pronounced for model estimators shrinking estimated probabilities towards the observed event fraction, such as ridge regression. Leave-one-out cross-validation also provided pessimistic estimates of the discrimination slope but nearly unbiased estimates of the Brier score.</p><p><strong>Conclusions: </strong>We recommend to use leave-pair-out cross-validation, fivefold cross-validation with repetitions, the enhanced or the .632+ bootstrap to estimate c-statistics, and leave-pair-out or fivefold cross-validation to estimate discrimination slopes.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"7 1","pages":"9"},"PeriodicalIF":0.0,"publicationDate":"2023-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10152625/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9460319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}