Pub Date : 2024-06-18DOI: 10.1080/10543406.2024.2364716
Shijie Yuan, Zhanbo Huang, Jiaxin Liu, Yuan Ji
We consider a dose-optimization design for a first-in-human oncology trial that aims to identify a suitable dose for late-phase drug development. The proposed approach, called the Pharmacometrics-Enabled DOse OPtimization (PEDOOP) design, incorporates observed patient-level pharmacokinetics (PK) measurements and latent pharmacodynamics (PD) information for trial decision-making and dose optimization. PEDOOP consists of two seamless phases. In phase I, patient-level time-course drug concentrations, derived PD effects, and the toxicity outcomes from patients are integrated into a statistical model to estimate the dose-toxicity response. A simple dose-finding design guides dose escalation in phase I. At the end of the phase I dose finding, a graduation rule is used to assess the safety and efficacy of all the doses and select those with promising efficacy and acceptable safety for a randomized comparison against a control arm in phase II. In phase II, patients are randomized to the selected doses based on a fixed or adaptive randomization ratio. At the end of phase II, an optimal biological dose (OBD) is selected for late-phase development. We conduct simulation studies to assess the PEDOOP design in comparison to an existing seamless design that also combines phases I and II in a single trial.
我们考虑了首次人体肿瘤试验的剂量优化设计,目的是为后期药物开发确定合适的剂量。所提出的方法被称为药物计量学支持的剂量优化(PEDOOP)设计,它将观察到的患者级药代动力学(PK)测量结果和潜在的药效学(PD)信息结合起来,用于试验决策和剂量优化。PEDOOP 包括两个无缝衔接的阶段。在第一阶段,患者水平的时程药物浓度、衍生的药效学效应和患者的毒性结果被整合到一个统计模型中,以估计剂量-毒性反应。在 I 期剂量寻找结束时,采用分级规则评估所有剂量的安全性和疗效,并选择疗效好、安全性可接受的剂量,在 II 期与对照组进行随机比较。在第二阶段,根据固定或自适应随机化比例,将患者随机分配到选定的剂量。在 II 期结束时,选出一个最佳生物剂量 (OBD),用于后期开发。我们进行了模拟研究,以评估 PEDOOP 设计与现有无缝设计的比较,后者也是将 I 期和 II 期结合在一项试验中。
{"title":"Pharmacometrics-Enabled DOse OPtimization (PEDOOP) for seamless phase I-II trials in oncology.","authors":"Shijie Yuan, Zhanbo Huang, Jiaxin Liu, Yuan Ji","doi":"10.1080/10543406.2024.2364716","DOIUrl":"https://doi.org/10.1080/10543406.2024.2364716","url":null,"abstract":"<p><p>We consider a dose-optimization design for a first-in-human oncology trial that aims to identify a suitable dose for late-phase drug development. The proposed approach, called the Pharmacometrics-Enabled DOse OPtimization (PEDOOP) design, incorporates observed patient-level pharmacokinetics (PK) measurements and latent pharmacodynamics (PD) information for trial decision-making and dose optimization. PEDOOP consists of two seamless phases. In phase I, patient-level time-course drug concentrations, derived PD effects, and the toxicity outcomes from patients are integrated into a statistical model to estimate the dose-toxicity response. A simple dose-finding design guides dose escalation in phase I. At the end of the phase I dose finding, a graduation rule is used to assess the safety and efficacy of all the doses and select those with promising efficacy and acceptable safety for a randomized comparison against a control arm in phase II. In phase II, patients are randomized to the selected doses based on a fixed or adaptive randomization ratio. At the end of phase II, an optimal biological dose (OBD) is selected for late-phase development. We conduct simulation studies to assess the PEDOOP design in comparison to an existing seamless design that also combines phases I and II in a single trial.</p>","PeriodicalId":54870,"journal":{"name":"Journal of Biopharmaceutical Statistics","volume":" ","pages":"1-20"},"PeriodicalIF":1.1,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141421880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-18DOI: 10.1080/10543406.2024.2364723
Tingting Hu, Berkman Sahiner, Nicholas Petrick, Kenny Cha, Si Wen, Gene Pennello
Background: Positive and negative likelihood ratios (PLR and NLR) are important metrics of accuracy for diagnostic devices with a binary output. However, the properties of Bayesian and frequentist interval estimators of PLR/NLR have not been extensively studied and compared. In this study, we explore the potential use of the Bayesian method for interval estimation of PLR/NLR, and, more broadly, for interval estimation of the ratio of two independent proportions.
Methods: We develop a Bayesian-based approach for interval estimation of PLR/NLR for use as a part of a diagnostic device performance evaluation. Our approach is applicable to a broader setting for interval estimation of any ratio of two independent proportions. We compare score and Bayesian interval estimators for the ratio of two proportions in terms of the coverage probability (CP) and expected interval width (EW) via extensive experiments and applications to two case studies. A supplementary experiment was also conducted to assess the performance of the proposed exact Bayesian method under different priors.
Results: Our experimental results show that the overall mean CP for Bayesian interval estimation is consistent with that for the score method (0.950 vs. 0.952), and the overall mean EW for Bayesian is shorter than that for score method (15.929 vs. 19.724). Application to two case studies showed that the intervals estimated using the Bayesian and frequentist approaches are very similar.
Discussion: Our numerical results indicate that the proposed Bayesian approach has a comparable CP performance with the score method while yielding higher precision (i.e. a shorter EW).
{"title":"A comparison of Bayesian and score methods for interval estimates of positive/negative likelihood ratios in support of diagnostic device performance evaluation.","authors":"Tingting Hu, Berkman Sahiner, Nicholas Petrick, Kenny Cha, Si Wen, Gene Pennello","doi":"10.1080/10543406.2024.2364723","DOIUrl":"https://doi.org/10.1080/10543406.2024.2364723","url":null,"abstract":"<p><strong>Background: </strong>Positive and negative likelihood ratios (PLR and NLR) are important metrics of accuracy for diagnostic devices with a binary output. However, the properties of Bayesian and frequentist interval estimators of PLR/NLR have not been extensively studied and compared. In this study, we explore the potential use of the Bayesian method for interval estimation of PLR/NLR, and, more broadly, for interval estimation of the ratio of two independent proportions.</p><p><strong>Methods: </strong>We develop a Bayesian-based approach for interval estimation of PLR/NLR for use as a part of a diagnostic device performance evaluation. Our approach is applicable to a broader setting for interval estimation of any ratio of two independent proportions. We compare score and Bayesian interval estimators for the ratio of two proportions in terms of the coverage probability (CP) and expected interval width (EW) via extensive experiments and applications to two case studies. A supplementary experiment was also conducted to assess the performance of the proposed exact Bayesian method under different priors.</p><p><strong>Results: </strong>Our experimental results show that the overall mean CP for Bayesian interval estimation is consistent with that for the score method (0.950 vs. 0.952), and the overall mean EW for Bayesian is shorter than that for score method (15.929 vs. 19.724). Application to two case studies showed that the intervals estimated using the Bayesian and frequentist approaches are very similar.</p><p><strong>Discussion: </strong>Our numerical results indicate that the proposed Bayesian approach has a comparable CP performance with the score method while yielding higher precision (i.e. a shorter EW).</p>","PeriodicalId":54870,"journal":{"name":"Journal of Biopharmaceutical Statistics","volume":" ","pages":"1-19"},"PeriodicalIF":1.1,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141421852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-18DOI: 10.1080/10543406.2024.2365389
Jackson P Lautier, Stella Grosser, Jessica Kim, Hyewon Kim, Junghi Kim
Pharmaceutical researchers are continually searching for techniques to improve both drug development processes and patient outcomes. An area of recent interest is the potential for machine learning (ML) applications within pharmacology. One such application not yet given close study is the unsupervised clustering of plasma concentration-time curves, hereafter, pharmacokinetic (PK) curves. In this paper, we present our findings on how to cluster PK curves by their similarity. Specifically, we find clustering to be effective at identifying similar-shaped PK curves and informative for understanding patterns within each cluster of PK curves. Because PK curves are time series data objects, our approach utilizes the extensive body of research related to the clustering of time series data as a starting point. As such, we examine many dissimilarity measures between time series data objects to find those most suitable for PK curves. We identify Euclidean distance as generally most appropriate for clustering PK curves, and we further show that dynamic time warping, Fréchet, and structure-based measures of dissimilarity like correlation may produce unexpected results. As an illustration, we apply these methods in a case study with 250 PK curves used in a previous pharmacogenomic study. Our case study finds that an unsupervised ML clustering with Euclidean distance, without any subject genetic information, is able to independently validate the same conclusions as the reference pharmacogenomic results. To our knowledge, this is the first such demonstration. Further, the case study demonstrates how the clustering of PK curves may generate insights that could be difficult to perceive solely with population level summary statistics of PK metrics.
{"title":"Clustering plasma concentration-time curves: applications of unsupervised learning in pharmacogenomics.","authors":"Jackson P Lautier, Stella Grosser, Jessica Kim, Hyewon Kim, Junghi Kim","doi":"10.1080/10543406.2024.2365389","DOIUrl":"10.1080/10543406.2024.2365389","url":null,"abstract":"<p><p>Pharmaceutical researchers are continually searching for techniques to improve both drug development processes and patient outcomes. An area of recent interest is the potential for machine learning (ML) applications within pharmacology. One such application not yet given close study is the unsupervised clustering of plasma concentration-time curves, hereafter, pharmacokinetic (PK) curves. In this paper, we present our findings on how to cluster PK curves by their similarity. Specifically, we find clustering to be effective at identifying similar-shaped PK curves and informative for understanding patterns within each cluster of PK curves. Because PK curves are time series data objects, our approach utilizes the extensive body of research related to the clustering of time series data as a starting point. As such, we examine many dissimilarity measures between time series data objects to find those most suitable for PK curves. We identify Euclidean distance as generally most appropriate for clustering PK curves, and we further show that dynamic time warping, Fréchet, and structure-based measures of dissimilarity like correlation may produce unexpected results. As an illustration, we apply these methods in a case study with 250 PK curves used in a previous pharmacogenomic study. Our case study finds that an unsupervised ML clustering with Euclidean distance, without any subject genetic information, is able to independently validate the same conclusions as the reference pharmacogenomic results. To our knowledge, this is the first such demonstration. Further, the case study demonstrates how the clustering of PK curves may generate insights that could be difficult to perceive solely with population level summary statistics of PK metrics.</p>","PeriodicalId":54870,"journal":{"name":"Journal of Biopharmaceutical Statistics","volume":" ","pages":"1-19"},"PeriodicalIF":1.2,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141421878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-18DOI: 10.1080/10543406.2024.2365976
Nan Miles Xi, Dalong Patrick Huang
The evaluation of drug-induced Torsades de pointes (TdP) risks is crucial in drug safety assessment. In this study, we discuss machine learning approaches in the prediction of drug-induced TdP risks using preclinical data. Specifically, a random forest model was trained on the dataset generated by the rabbit ventricular wedge assay. The model prediction performance was measured on 28 drugs from the Comprehensive In Vitro Proarrhythmia Assay initiative. Leave-one-drug-out cross-validation provided an unbiased estimation of model performance. Stratified bootstrap revealed the uncertainty in the asymptotic model prediction. Our study validated the utility of machine learning approaches in predicting drug-induced TdP risks from preclinical data. Our methods can be extended to other preclinical protocols and serve as a supplementary evaluation in drug safety assessment.
{"title":"Drug safety assessment by machine learning models.","authors":"Nan Miles Xi, Dalong Patrick Huang","doi":"10.1080/10543406.2024.2365976","DOIUrl":"https://doi.org/10.1080/10543406.2024.2365976","url":null,"abstract":"<p><p>The evaluation of drug-induced Torsades de pointes (TdP) risks is crucial in drug safety assessment. In this study, we discuss machine learning approaches in the prediction of drug-induced TdP risks using preclinical data. Specifically, a random forest model was trained on the dataset generated by the rabbit ventricular wedge assay. The model prediction performance was measured on 28 drugs from the Comprehensive In Vitro Proarrhythmia Assay initiative. Leave-one-drug-out cross-validation provided an unbiased estimation of model performance. Stratified bootstrap revealed the uncertainty in the asymptotic model prediction. Our study validated the utility of machine learning approaches in predicting drug-induced TdP risks from preclinical data. Our methods can be extended to other preclinical protocols and serve as a supplementary evaluation in drug safety assessment.</p>","PeriodicalId":54870,"journal":{"name":"Journal of Biopharmaceutical Statistics","volume":" ","pages":"1-12"},"PeriodicalIF":1.1,"publicationDate":"2024-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141421879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-14DOI: 10.1080/10543406.2024.2365972
Xutong Zhao, Jing Sun, Dalong Huang
One objective of meta-analysis, which synthesizes evidence across multiple studies, is to assess the consistency and investigate the heterogeneity across studies. In this project, we performed a meta-analysis on moxifloxacin (positive control in QT assessment studies) data to characterize the exposure-response relationship and determine the safety margin associated with 10-msec QTc effects for moxifloxacin based on 26 thorough QT studies submitted to the FDA. Multiple meta-analysis methods were used (including two novel methods) to evaluate the exposure-response relationship and estimate the critical concentration and the corresponding confidence interval of moxifloxacin associated with a 10-msec QTc effect based on the concentration-QTc models. These meta-analysis methods (aggregate data vs. individual participant data; fixed effect vs. random effect) were compared in terms of their precision and robustness. With the selected meta-analysis method, we demonstrated the homogeneity and heterogeneity of the moxifloxacin concentration-QTc relationship in studies. We also estimated the critical concentration of moxifloxacin that can be used to calculate the hERG safety margin of this drug.
{"title":"Meta-analysis application to hERG safety evaluation in clinical trials.","authors":"Xutong Zhao, Jing Sun, Dalong Huang","doi":"10.1080/10543406.2024.2365972","DOIUrl":"10.1080/10543406.2024.2365972","url":null,"abstract":"<p><p>One objective of meta-analysis, which synthesizes evidence across multiple studies, is to assess the consistency and investigate the heterogeneity across studies. In this project, we performed a meta-analysis on moxifloxacin (positive control in QT assessment studies) data to characterize the exposure-response relationship and determine the safety margin associated with 10-msec QTc effects for moxifloxacin based on 26 thorough QT studies submitted to the FDA. Multiple meta-analysis methods were used (including two novel methods) to evaluate the exposure-response relationship and estimate the critical concentration and the corresponding confidence interval of moxifloxacin associated with a 10-msec QTc effect based on the concentration-QTc models. These meta-analysis methods (aggregate data vs. individual participant data; fixed effect vs. random effect) were compared in terms of their precision and robustness. With the selected meta-analysis method, we demonstrated the homogeneity and heterogeneity of the moxifloxacin concentration-QTc relationship in studies. We also estimated the critical concentration of moxifloxacin that can be used to calculate the hERG safety margin of this drug.</p>","PeriodicalId":54870,"journal":{"name":"Journal of Biopharmaceutical Statistics","volume":" ","pages":"1-13"},"PeriodicalIF":1.2,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141321958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-13DOI: 10.1080/10543406.2024.2365966
Jinxiang Hu, Xiaohang Mei, Sam Pepper, Yu Wang, Bo Zhang, Colin Cernik, Byron Gajewski
Patient Reported Outcomes (PROs) are widely used in quality of life (QOL) studies, health outcomes research, and clinical trials. The importance of PRO has been advocated by health authorities. We propose this R shiny web application, PROpwr, that estimates power for two-arm clinical trials with PRO measures as endpoints using Item Response Theory (GRM: Graded Response Model) and simulations. PROpwr also supports the analysis of PRO data for convenience of estimating the effect size. There are seven function tabs in PROpwr: Frequentist Analysis, Bayesian Analysis, GRM power, T-test Power Given Sample Size, T-test Sample Size Given Power, Download, and References. PROpwr is user-friendly with point-and-click functions. PROpwr can assist researchers to analyze and calculate power and sample size for PRO endpoints in clinical trials without prior programming knowledge.
患者报告结果(PROs)被广泛应用于生活质量(QOL)研究、健康结果研究和临床试验中。患者报告结果的重要性已得到卫生部门的重视。我们提出了这款 R 闪网络应用程序 PROpwr,它可以使用项目反应理论(GRM:分级反应模型)和模拟来估算以患者报告结果为终点的双臂临床试验的功率。PROpwr 还支持对 PRO 数据进行分析,以方便估计效应大小。PROpwr 中有七个功能选项卡:频繁分析、贝叶斯分析、GRM 功率、给定样本量的 T 检验功率、给定功率的 T 检验样本量、下载和参考文献。PROpwr 具有点选功能,使用方便。PROpwr 可帮助研究人员在没有编程知识的情况下,分析和计算临床试验中PRO终点的功率和样本量。
{"title":"PROpwr: a Shiny R application to analyze patient-reported outcomes data and estimate power.","authors":"Jinxiang Hu, Xiaohang Mei, Sam Pepper, Yu Wang, Bo Zhang, Colin Cernik, Byron Gajewski","doi":"10.1080/10543406.2024.2365966","DOIUrl":"https://doi.org/10.1080/10543406.2024.2365966","url":null,"abstract":"<p><p>Patient Reported Outcomes (PROs) are widely used in quality of life (QOL) studies, health outcomes research, and clinical trials. The importance of PRO has been advocated by health authorities. We propose this R shiny web application, PROpwr, that estimates power for two-arm clinical trials with PRO measures as endpoints using Item Response Theory (GRM: Graded Response Model) and simulations. PROpwr also supports the analysis of PRO data for convenience of estimating the effect size. There are seven function tabs in PROpwr: Frequentist Analysis, Bayesian Analysis, GRM power, T-test Power Given Sample Size, T-test Sample Size Given Power, Download, and References. PROpwr is user-friendly with point-and-click functions. PROpwr can assist researchers to analyze and calculate power and sample size for PRO endpoints in clinical trials without prior programming knowledge.</p>","PeriodicalId":54870,"journal":{"name":"Journal of Biopharmaceutical Statistics","volume":" ","pages":"1-12"},"PeriodicalIF":1.1,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141312285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-13DOI: 10.1080/10543406.2024.2358803
Yougui Wu
The accuracy of a screening test is often measured by the area under the receiver characteristic (ROC) curve (AUC) of a screening test. Two-phase designs have been widely used in diagnostic studies for estimating one single AUC and comparing two AUCs where the screening test results are measured for a large sample (Phase one sample) while the disease status is only verified for a subset of Phase one sample (Phase two sample) by a gold standard. In this paper, we consider the optimal two-phase sampling design for comparing the performance of two ordinal screening tests in classifying disease status. Specifically, we derive an analytical variance formula for the AUC difference estimator and use it to find the optimal sampling probabilities that minimize the variance formula for the AUC difference estimator. According to the proposed optimal two-phase design, the strata with the levels of two tests far apart from each other should be over-sampled while the strata with the levels of two tests close to each other should be under-sampled. Simulation results indicate that two-phase sampling under optimal allocation (OA) achieves a substantial amount of variance reduction, compared with two-phase sampling under proportional allocation (PA). Furthermore, in comparison with a one-phase random sampling, two-phase sampling under OA or PA has a clear advantage in reducing the variance of AUC difference estimator when the variances of the two screening test results in the disease population differ greatly from their counterparts in non-disease population.
{"title":"Optimal two-phase sampling for comparing correlated areas under the ROC curves of two screening tests in the presence of verification bias.","authors":"Yougui Wu","doi":"10.1080/10543406.2024.2358803","DOIUrl":"https://doi.org/10.1080/10543406.2024.2358803","url":null,"abstract":"<p><p>The accuracy of a screening test is often measured by the area under the receiver characteristic (ROC) curve (AUC) of a screening test. Two-phase designs have been widely used in diagnostic studies for estimating one single AUC and comparing two AUCs where the screening test results are measured for a large sample (Phase one sample) while the disease status is only verified for a subset of Phase one sample (Phase two sample) by a gold standard. In this paper, we consider the optimal two-phase sampling design for comparing the performance of two ordinal screening tests in classifying disease status. Specifically, we derive an analytical variance formula for the AUC difference estimator and use it to find the optimal sampling probabilities that minimize the variance formula for the AUC difference estimator. According to the proposed optimal two-phase design, the strata with the levels of two tests far apart from each other should be over-sampled while the strata with the levels of two tests close to each other should be under-sampled. Simulation results indicate that two-phase sampling under optimal allocation (OA) achieves a substantial amount of variance reduction, compared with two-phase sampling under proportional allocation (PA). Furthermore, in comparison with a one-phase random sampling, two-phase sampling under OA or PA has a clear advantage in reducing the variance of AUC difference estimator when the variances of the two screening test results in the disease population differ greatly from their counterparts in non-disease population.</p>","PeriodicalId":54870,"journal":{"name":"Journal of Biopharmaceutical Statistics","volume":" ","pages":"1-17"},"PeriodicalIF":1.1,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141312284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-11DOI: 10.1080/10543406.2024.2364722
Sheng Zhong, Jane Zhang, Jenny Jiao, Hongjian Zhu, Yunzhao Xing, Li Wang
Accurate prediction of a rare and clinically important event following study treatment has been crucial in drug development. For instance, the rarity of an adverse event is often commensurate with the seriousness of medical consequences, and delayed detection of the rare adverse event can pose significant or even life-threatening health risks to patients. In this machine learning case study, we demonstrate with an example originated from a real clinical trial setting how to define and solve the rare clinical event prediction problem using machine learning in pharmaceutical industry. The unique contributions of this work include the proposal of a six-step investigation framework that facilitates the communication with non-technical stakeholders and the interpretation of the model performance in terms of practical consequences in the context of patient screenings for conducting a future clinical trial. In terms of machine learning methodology, for data splitting into the training and test sets, we adapt the rare-event stratified split approach (from scikit-learn) to further account for group splitting for multiple records of a patient simultaneously. To handle imbalanced data due to rare events in model training, the cost-sensitive learning approach is employed to give more weights to the minor class and the metrics precision together with recall are used to capture prediction performance instead of the raw accuracy rate. Finally, we demonstrate how to apply the state-of-the-art SHAP values to identify important risk factors to improve model interpretability.
{"title":"A machine learning case study to predict rare clinical event of interest: imbalanced data, interpretability, and practical considerations.","authors":"Sheng Zhong, Jane Zhang, Jenny Jiao, Hongjian Zhu, Yunzhao Xing, Li Wang","doi":"10.1080/10543406.2024.2364722","DOIUrl":"https://doi.org/10.1080/10543406.2024.2364722","url":null,"abstract":"<p><p>Accurate prediction of a rare and clinically important event following study treatment has been crucial in drug development. For instance, the rarity of an adverse event is often commensurate with the seriousness of medical consequences, and delayed detection of the rare adverse event can pose significant or even life-threatening health risks to patients. In this machine learning case study, we demonstrate with an example originated from a real clinical trial setting how to define and solve the rare clinical event prediction problem using machine learning in pharmaceutical industry. The unique contributions of this work include the proposal of a six-step investigation framework that facilitates the communication with non-technical stakeholders and the interpretation of the model performance in terms of practical consequences in the context of patient screenings for conducting a future clinical trial. In terms of machine learning methodology, for data splitting into the training and test sets, we adapt the rare-event stratified split approach (from scikit-learn) to further account for group splitting for multiple records of a patient simultaneously. To handle imbalanced data due to rare events in model training, the cost-sensitive learning approach is employed to give more weights to the minor class and the metrics precision together with recall are used to capture prediction performance instead of the raw accuracy rate. Finally, we demonstrate how to apply the state-of-the-art SHAP values to identify important risk factors to improve model interpretability.</p>","PeriodicalId":54870,"journal":{"name":"Journal of Biopharmaceutical Statistics","volume":" ","pages":"1-14"},"PeriodicalIF":1.1,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141302167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Physiologically based pharmacokinetic (PBPK) modeling serves as a valuable tool for determining the distribution and disposition of substances in the body of an organism. It involves a mathematical representation of the interrelationships among crucial physiological, biochemical, and physicochemical parameters. A lack of the values of pharmacokinetic parameters can be challenging in constructing a PBPK model. Herein, we propose an artificial intelligence framework to evaluate a key pharmacokinetic parameter, the intestinal effective permeability (Peff). The publicly available Peff dataset was utilized to develop regression machine learning models. The XGBoost model demonstrates the best test accuracy of R-squared (R2, coefficient of determination) of 0.68. The model is then applied to compute the Peff of asiaticoside and madecassoside, the parent compounds found in Centella asiatica. Subsequently, PBPK modeling was conducted to evaluate the biodistribution of the herbal substances following oral administration in a rat model. The simulation results were evaluated and validated, which agreed with the existing in vivo studies in rats. This in silico pipeline presents a potential approach for investigating the pharmacokinetic parameters and profiles of drugs or herbal substances, which can be used independently or integrated into other modeling systems.
{"title":"Investigating pharmacokinetic profiles of <i>Centella asiatica</i> using machine learning and PBPK modelling.","authors":"Siriwan Pumkathin, Yuranan Hanlumyuang, Worawat Wattanathana, Teeraphan Laomettachit, Monrudee Liangruksa","doi":"10.1080/10543406.2024.2358797","DOIUrl":"https://doi.org/10.1080/10543406.2024.2358797","url":null,"abstract":"<p><p>Physiologically based pharmacokinetic (PBPK) modeling serves as a valuable tool for determining the distribution and disposition of substances in the body of an organism. It involves a mathematical representation of the interrelationships among crucial physiological, biochemical, and physicochemical parameters. A lack of the values of pharmacokinetic parameters can be challenging in constructing a PBPK model. Herein, we propose an artificial intelligence framework to evaluate a key pharmacokinetic parameter, the intestinal effective permeability (<i>P</i><sub><i>eff</i></sub>). The publicly available <i>P</i><sub><i>eff</i></sub> dataset was utilized to develop regression machine learning models. The XGBoost model demonstrates the best test accuracy of <i>R</i>-squared (<i>R</i><sup>2</sup>, coefficient of determination) of 0.68. The model is then applied to compute the <i>P</i><sub><i>eff</i></sub> of asiaticoside and madecassoside, the parent compounds found in <i>Centella asiatica</i>. Subsequently, PBPK modeling was conducted to evaluate the biodistribution of the herbal substances following oral administration in a rat model. The simulation results were evaluated and validated, which agreed with the existing <i>in vivo</i> studies in rats. This <i>in silico</i> pipeline presents a potential approach for investigating the pharmacokinetic parameters and profiles of drugs or herbal substances, which can be used independently or integrated into other modeling systems.</p>","PeriodicalId":54870,"journal":{"name":"Journal of Biopharmaceutical Statistics","volume":" ","pages":"1-16"},"PeriodicalIF":1.1,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141302168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-10DOI: 10.1080/10543406.2024.2358810
Abd El-Raheem M Abd El-Raheem, Haidy N Mohamed, Ehab F Abd-Elfattah
The main idea of this paper is to approximate the exact p-value of a class of non-parametric, two-sample location-scale tests. In this paper, the most famous non-parametric two-sample location-scale tests are formulated in a class of linear rank tests. The permutation distribution of this class is derived from a random allocation design. This allows us to approximate the exact p-value of the non-parametric two-sample location-scale tests of the considered class using the saddlepoint approximation method. The proposed method shows high accuracy in approximating the exact p-value compared to the normal approximation method. Moreover, the proposed method only requires a few calculations and time, as in the case of the simulated method. The procedures of the proposed method are clarified through four sets of real data that represent applications for a number of different fields. In addition, a simulation study compares the proposed method with the traditional methods to approximate the exact p-value of the specified class of the non-parametric two-sample location-scale tests.
本文的主要思想是近似计算一类非参数双样本位置标度检验的精确 p 值。本文将最著名的非参数双样本位置标度检验归结为一类线性秩检验。该类检验的置换分布来自随机分配设计。因此,我们可以利用鞍点近似法近似得到该类非参数双样本位置标度检验的精确 p 值。与正态近似法相比,所提出的方法在近似精确 p 值方面具有很高的准确性。此外,与模拟方法一样,拟议方法只需要少量计算和时间。通过四组真实数据(代表多个不同领域的应用),阐明了拟议方法的程序。此外,模拟研究比较了建议的方法和传统方法,以逼近非参数双样本位置尺度检验指定类别的精确 p 值。
{"title":"Saddlepoint p-values for a class of location-scale tests.","authors":"Abd El-Raheem M Abd El-Raheem, Haidy N Mohamed, Ehab F Abd-Elfattah","doi":"10.1080/10543406.2024.2358810","DOIUrl":"https://doi.org/10.1080/10543406.2024.2358810","url":null,"abstract":"<p><p>The main idea of this paper is to approximate the exact p-value of a class of non-parametric, two-sample location-scale tests. In this paper, the most famous non-parametric two-sample location-scale tests are formulated in a class of linear rank tests. The permutation distribution of this class is derived from a random allocation design. This allows us to approximate the exact p-value of the non-parametric two-sample location-scale tests of the considered class using the saddlepoint approximation method. The proposed method shows high accuracy in approximating the exact p-value compared to the normal approximation method. Moreover, the proposed method only requires a few calculations and time, as in the case of the simulated method. The procedures of the proposed method are clarified through four sets of real data that represent applications for a number of different fields. In addition, a simulation study compares the proposed method with the traditional methods to approximate the exact p-value of the specified class of the non-parametric two-sample location-scale tests.</p>","PeriodicalId":54870,"journal":{"name":"Journal of Biopharmaceutical Statistics","volume":" ","pages":"1-20"},"PeriodicalIF":1.1,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141297310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}