Characterization of uncertainty in the classification of multivariate assays: application to PAM50 centroid-based genomic predictors for breast cancer treatment plans.

Journal of clinical bioinformatics Pub Date : 2011-12-23 DOI:10.1186/2043-9113-1-37

Mark Tw Ebbert, Roy Rl Bastien, Kenneth M Boucher, Miguel Martín, Eva Carrasco, Rosalía Caballero, Inge J Stijleman, Philip S Bernard, Julio C Facelli

{"title":"Characterization of uncertainty in the classification of multivariate assays: application to PAM50 centroid-based genomic predictors for breast cancer treatment plans.","authors":"Mark Tw Ebbert, Roy Rl Bastien, Kenneth M Boucher, Miguel Martín, Eva Carrasco, Rosalía Caballero, Inge J Stijleman, Philip S Bernard, Julio C Facelli","doi":"10.1186/2043-9113-1-37","DOIUrl":null,"url":null,"abstract":"Background: Multivariate assays (MVAs) for assisting clinical decisions are becoming commonly available, but due to complexity, are often considered a high-risk approach. A key concern is that uncertainty on the assay's final results is not well understood. This study focuses on developing a process to characterize error introduced in the MVA's results from the intrinsic error in the laboratory process: sample preparation and measurement of the contributing factors, such as gene expression.Methods: Using the PAM50 Breast Cancer Intrinsic Classifier, we show how to characterize error within an MVA, and how these errors may affect results reported to clinicians. First we estimated the error distribution for measured factors within the PAM50 assay by performing repeated measures on four archetypal samples representative of the major breast cancer tumor subtypes. Then, using the error distributions and the original archetypal sample data, we used Monte Carlo simulations to generate a sufficient number of simulated samples. The effect of these errors on the PAM50 tumor subtype classification was estimated by measuring subtype reproducibility after classifying all simulated samples. Subtype reproducibility was measured as the percentage of simulated samples classified identically to the parent sample. The simulation was thereafter repeated on a large, independent data set of samples from the GEICAM 9906 clinical trial. Simulated samples from the GEICAM sample set were used to explore a more realistic scenario where, unlike archetypal samples, many samples are not easily classified.Results: All simulated samples derived from the archetypal samples were classified identically to the parent sample. Subtypes for simulated samples from the GEICAM set were also highly reproducible, but there were a non-negligible number of samples that exhibit significant variability in their classification.Conclusions: We have developed a general methodology to estimate the effects of intrinsic errors within MVAs. We have applied the method to the PAM50 assay, showing that the PAM50 results are resilient to intrinsic errors within the assay, but also finding that in non-archetypal samples, experimental errors can lead to quite different classification of a tumor. Finally we propose a way to provide the uncertainty information in a usable way for clinicians.","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":"1 ","pages":"37"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-1-37","citationCount":"27","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of clinical bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/2043-9113-1-37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 27

Abstract

Background: Multivariate assays (MVAs) for assisting clinical decisions are becoming commonly available, but due to complexity, are often considered a high-risk approach. A key concern is that uncertainty on the assay's final results is not well understood. This study focuses on developing a process to characterize error introduced in the MVA's results from the intrinsic error in the laboratory process: sample preparation and measurement of the contributing factors, such as gene expression.

Methods: Using the PAM50 Breast Cancer Intrinsic Classifier, we show how to characterize error within an MVA, and how these errors may affect results reported to clinicians. First we estimated the error distribution for measured factors within the PAM50 assay by performing repeated measures on four archetypal samples representative of the major breast cancer tumor subtypes. Then, using the error distributions and the original archetypal sample data, we used Monte Carlo simulations to generate a sufficient number of simulated samples. The effect of these errors on the PAM50 tumor subtype classification was estimated by measuring subtype reproducibility after classifying all simulated samples. Subtype reproducibility was measured as the percentage of simulated samples classified identically to the parent sample. The simulation was thereafter repeated on a large, independent data set of samples from the GEICAM 9906 clinical trial. Simulated samples from the GEICAM sample set were used to explore a more realistic scenario where, unlike archetypal samples, many samples are not easily classified.

Results: All simulated samples derived from the archetypal samples were classified identically to the parent sample. Subtypes for simulated samples from the GEICAM set were also highly reproducible, but there were a non-negligible number of samples that exhibit significant variability in their classification.

Conclusions: We have developed a general methodology to estimate the effects of intrinsic errors within MVAs. We have applied the method to the PAM50 assay, showing that the PAM50 results are resilient to intrinsic errors within the assay, but also finding that in non-archetypal samples, experimental errors can lead to quite different classification of a tumor. Finally we propose a way to provide the uncertainty information in a usable way for clinicians.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

多变量分析分类中不确定性的表征:应用于基于PAM50质心的乳腺癌治疗计划基因组预测因子

背景:用于辅助临床决策的多变量分析(MVAs)越来越普遍，但由于其复杂性，通常被认为是一种高风险的方法。一个关键的问题是，化验最终结果的不确定性还没有得到很好的理解。本研究的重点是开发一种过程来表征由实验室过程中的固有误差引起的MVA结果中的误差:样品制备和测量因素，如基因表达。方法:使用PAM50乳腺癌固有分类器，我们展示了如何表征MVA中的错误，以及这些错误如何影响向临床医生报告的结果。首先，我们通过对代表主要乳腺癌肿瘤亚型的四个原型样本进行重复测量，估计了PAM50测定中测量因素的误差分布。然后，利用误差分布和原始原型样本数据，我们使用蒙特卡罗模拟生成足够数量的模拟样本。在对所有模拟样本进行分类后，通过测量亚型可重复性来估计这些误差对PAM50肿瘤亚型分类的影响。亚型再现性以与母样本分类相同的模拟样本的百分比来衡量。此后，在GEICAM 9906临床试验的大量独立样本数据集上重复了模拟。来自GEICAM样本集的模拟样本用于探索更现实的场景，其中，与原型样本不同，许多样本不容易分类。结果:所有原型样本衍生的模拟样本与母样本分类相同。来自GEICAM集合的模拟样本的亚型也具有高度可重复性，但有不可忽略的数量的样本在其分类中表现出显着的可变性。结论:我们已经开发了一种通用的方法来估计mva内固有误差的影响。我们已经将该方法应用于PAM50测定，表明PAM50结果对测定中的固有误差具有弹性，但也发现在非原型样品中，实验误差可能导致肿瘤的完全不同分类。最后，我们提出了一种为临床医生提供可用的不确定度信息的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of clinical bioinformatics

自引率

0.00%

发文量

期刊最新文献

Clinical research informatics (CRI): overview over new tools and services First Clinical Research Informatics (CRI) Solutions Day: advanced IT support from EU projects for clinical trials Mobile eHealth solution (ePRO) EHR4CR local workbench TRANSFoRm Data quality tool