首页 > 最新文献

Systems biomedicine (Austin, Tex.)最新文献

英文 中文
Prediction of gene expression in human using rat in vivo gene expression in Japanese Toxicogenomics Project 日本毒物基因组计划中利用大鼠体内基因表达预测人类基因表达
Pub Date : 2014-01-02 DOI: 10.4161/sysb.29412
Martin Otava, Z. Shkedy, Adetayo S Kasim
The Japanese Toxicogenomics Project (TGP) provides large amount of data for the toxicology and safety framework. We focus on gene expression data of rat in vivo and human in vitro. We consider two different analyses for the TGP data. The first analysis is based on two-way analysis of variance model and the goal is to detect genes with significant dose-response relationship in both humans and rats. The second analysis consists of a trend analysis at each time point and the goal is to detect genes in the rat in order to predict gene expression in humans. The first analysis leads us to conclusions about the heterogeneity of the compound set and will suggest how to address this issue to improve future analyses. In the second part, we identify, for particular compounds, groups of genes that are translatable from rats to humans, so they can be used for prediction of human in vitro data based on rat in vivo data.
日本毒物基因组计划(TGP)为毒理学和安全框架提供了大量数据。我们关注的是大鼠体内和人体外的基因表达数据。我们考虑对三峡水库数据进行两种不同的分析。第一个分析是基于方差的双向分析模型,目的是在人和大鼠中检测出具有显著剂量-反应关系的基因。第二个分析包括每个时间点的趋势分析,目的是检测大鼠的基因,以预测人类的基因表达。第一个分析使我们得出关于复合集的异质性的结论,并将建议如何解决这个问题,以改进未来的分析。在第二部分中,我们确定了特定化合物的基因组,这些基因组可以从大鼠翻译到人类,因此它们可以用于基于大鼠体内数据的人类体外数据预测。
{"title":"Prediction of gene expression in human using rat in vivo gene expression in Japanese Toxicogenomics Project","authors":"Martin Otava, Z. Shkedy, Adetayo S Kasim","doi":"10.4161/sysb.29412","DOIUrl":"https://doi.org/10.4161/sysb.29412","url":null,"abstract":"The Japanese Toxicogenomics Project (TGP) provides large amount of data for the toxicology and safety framework. We focus on gene expression data of rat in vivo and human in vitro. We consider two different analyses for the TGP data. The first analysis is based on two-way analysis of variance model and the goal is to detect genes with significant dose-response relationship in both humans and rats. The second analysis consists of a trend analysis at each time point and the goal is to detect genes in the rat in order to predict gene expression in humans. The first analysis leads us to conclusions about the heterogeneity of the compound set and will suggest how to address this issue to improve future analyses. In the second part, we identify, for particular compounds, groups of genes that are translatable from rats to humans, so they can be used for prediction of human in vitro data based on rat in vivo data.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"2 1","pages":"15 - 8"},"PeriodicalIF":0.0,"publicationDate":"2014-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.29412","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Classification of lung adenocarcinoma and squamous cell carcinoma samples based on their gene expression profile in the sbv IMPROVER Diagnostic Signature Challenge sbv improved诊断特征挑战中肺腺癌和鳞状细胞癌样本的基因表达谱分类
Pub Date : 2013-09-20 DOI: 10.4161/sysb.25983
Rotem Ben-Hamo, S. Boué, F. Martin, M. Talikka, S. Efroni
Barriers, such as the lack of confidence in the robustness of disease signatures based on gene expression measurements, still hinder progress toward personalized medicine. It is therefore important that once derived, a signature is verified via an unbiased process. The IMPROVER initiative was set up to establish an impartial view of methods and results for the classification of patients, based on molecular profiles of disease-relevant or surrogate tissues. Here, the focus is on the Lung Cancer Signature Challenge, in which participants have been asked to classify lung tumor gene expression profiles into 4 classes: adenocarcinoma (AC) and squamous cell carcinoma (SCC), each at either stage 1 or 2. The method reported here was the best performing method in the 4-way classification. The original method is presented as well as an algorithmic approach to replace the empirical (non-computational) steps used in the challenge. In the discussion, the difficulty in classifying stages of tumors as compared with the relatively good classification of subtypes is examined. Hypotheses are made concerning possible reasons for erroneous classification of some of the samples, in view of additional information on the test samples that was not made available to challenge participants.
一些障碍,如对基于基因表达测量的疾病特征的稳健性缺乏信心,仍然阻碍着个性化医疗的进展。因此,重要的是,一旦导出,通过公正的过程验证签名。基于疾病相关组织或替代组织的分子特征,建立了对患者分类的方法和结果的公正观点。在这里,重点是肺癌特征挑战,参与者被要求将肺肿瘤基因表达谱分为4类:腺癌(AC)和鳞状细胞癌(SCC),每一种都处于1期或2期。本文报道的方法是四向分类中表现最好的方法。提出了原始方法以及一种算法方法来取代挑战中使用的经验(非计算)步骤。在讨论中,与相对良好的亚型分类相比,肿瘤分期分类的困难被检查。鉴于没有向挑战参与者提供有关测试样本的额外信息,对某些样本分类错误的可能原因进行了假设。
{"title":"Classification of lung adenocarcinoma and squamous cell carcinoma samples based on their gene expression profile in the sbv IMPROVER Diagnostic Signature Challenge","authors":"Rotem Ben-Hamo, S. Boué, F. Martin, M. Talikka, S. Efroni","doi":"10.4161/sysb.25983","DOIUrl":"https://doi.org/10.4161/sysb.25983","url":null,"abstract":"Barriers, such as the lack of confidence in the robustness of disease signatures based on gene expression measurements, still hinder progress toward personalized medicine. It is therefore important that once derived, a signature is verified via an unbiased process. The IMPROVER initiative was set up to establish an impartial view of methods and results for the classification of patients, based on molecular profiles of disease-relevant or surrogate tissues. Here, the focus is on the Lung Cancer Signature Challenge, in which participants have been asked to classify lung tumor gene expression profiles into 4 classes: adenocarcinoma (AC) and squamous cell carcinoma (SCC), each at either stage 1 or 2. The method reported here was the best performing method in the 4-way classification. The original method is presented as well as an algorithmic approach to replace the empirical (non-computational) steps used in the challenge. In the discussion, the difficulty in classifying stages of tumors as compared with the relatively good classification of subtypes is examined. Hypotheses are made concerning possible reasons for erroneous classification of some of the samples, in view of additional information on the test samples that was not made available to challenge participants.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"268 - 277"},"PeriodicalIF":0.0,"publicationDate":"2013-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25983","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Hierarchical-TGDR 分层TGDR
Pub Date : 2013-09-20 DOI: 10.4161/sysb.25979
S. Tian, M. Suárez-Fariñas
Regularization methods that simultaneously select a small set of the most relevant features and build a classifier using the selected features have gained much attention recently in problems of classification of “omics” data. In many multi-class classification problems, which are of practical importance, the classes are naturally endowed with a hierarchical structure. However, such natural hierarchical structure is often ignored. Here, we use an existing regularization algorithm, Threshold Gradient Descent Regularization, in a hierarchical fashion, which takes advantage of natural biological structure to specifically tackle multi-class classification of microarray data. We apply this approach to one of the tasks presented by the sbv IMPROVER Diagnostic Signature Challenge: the Lung Cancer Sub-Challenge. Gene expression data from non-small cell lung carcinoma were used to classify tumors into adenocarcinoma and squamous cell carcinoma subtypes, and their clinical stages (I and II). Genetic and transcriptomic differences between AC and SCC have been reported, indicating a potentially different pathological mechanism of differentiation and invasion. The results from this analysis show that hierarchical-TGDR outperforms pairwise TGDRs in terms of predictive performance, and is substantially more parsimonious. In conclusion, the hierarchical-TGDR approach trains classifiers in a top-down fashion by considering the naturally existing structure within the data, reducing the number of pairwise-TGDRs to be trained. It also highlights different mechanisms of “invasion” in the two subtypes. This work suggests that incorporating known biological information into classification algorithms, such as data hierarchies, can improve the discriminative performance and biological interpretation of this classifier.
在“组学”数据的分类问题中,同时选择一小部分最相关的特征并使用所选择的特征构建分类器的正则化方法受到了广泛的关注。在许多具有实际意义的多类分类问题中,类天生就具有层次结构。然而,这种自然的层次结构往往被忽视。在这里,我们使用一种现有的正则化算法,阈值梯度下降正则化,以一种分层的方式,它利用自然生物结构来专门处理微阵列数据的多类分类。我们将这种方法应用于sbv improved诊断签名挑战提出的任务之一:肺癌子挑战。来自非小细胞肺癌的基因表达数据被用于将肿瘤分为腺癌和鳞状细胞癌亚型及其临床分期(I和II)。AC和SCC之间的遗传和转录组学差异已被报道,表明其分化和侵袭的病理机制可能不同。该分析的结果表明,分层tgdr在预测性能方面优于成对tgdr,并且实质上更加简洁。总之,分层- tgdr方法通过考虑数据中自然存在的结构,以自上而下的方式训练分类器,减少了需要训练的成对- tgdr的数量。它还强调了两种亚型“入侵”的不同机制。这项工作表明,将已知的生物信息纳入分类算法,如数据层次结构,可以提高该分类器的判别性能和生物学解释。
{"title":"Hierarchical-TGDR","authors":"S. Tian, M. Suárez-Fariñas","doi":"10.4161/sysb.25979","DOIUrl":"https://doi.org/10.4161/sysb.25979","url":null,"abstract":"Regularization methods that simultaneously select a small set of the most relevant features and build a classifier using the selected features have gained much attention recently in problems of classification of “omics” data. In many multi-class classification problems, which are of practical importance, the classes are naturally endowed with a hierarchical structure. However, such natural hierarchical structure is often ignored. Here, we use an existing regularization algorithm, Threshold Gradient Descent Regularization, in a hierarchical fashion, which takes advantage of natural biological structure to specifically tackle multi-class classification of microarray data. We apply this approach to one of the tasks presented by the sbv IMPROVER Diagnostic Signature Challenge: the Lung Cancer Sub-Challenge. Gene expression data from non-small cell lung carcinoma were used to classify tumors into adenocarcinoma and squamous cell carcinoma subtypes, and their clinical stages (I and II). Genetic and transcriptomic differences between AC and SCC have been reported, indicating a potentially different pathological mechanism of differentiation and invasion. The results from this analysis show that hierarchical-TGDR outperforms pairwise TGDRs in terms of predictive performance, and is substantially more parsimonious. In conclusion, the hierarchical-TGDR approach trains classifiers in a top-down fashion by considering the naturally existing structure within the data, reducing the number of pairwise-TGDRs to be trained. It also highlights different mechanisms of “invasion” in the two subtypes. This work suggests that incorporating known biological information into classification algorithms, such as data hierarchies, can improve the discriminative performance and biological interpretation of this classifier.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"278 - 287"},"PeriodicalIF":0.0,"publicationDate":"2013-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25979","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Kernel-based method for feature selection and disease diagnosis using transcriptomics data 基于核的转录组学数据特征选择和疾病诊断方法
Pub Date : 2013-09-19 DOI: 10.4161/sysb.25978
Ji-Hoon Cho, Alan Lin, Kai Wang
Global transcriptome profiling is the foundation of systems biology and has been extensively used in biomarker discovery. Tools have been developed to extract meaningful biological information and useful gene features from transcriptomics data. However, there is no commonly accepted method for such purposes. The first IMPROVER (industrial methodology for process verification of research) challenge was launched to assess and verify classification methods using transcriptomics data from clinical samples. We established a computational approach that combined a kernel Fisher discriminant classifier and a feature selection scheme, which used scaled alignment selection and recursive feature elimination methods. A simple and reliable batch effect correction approach was also used. With this approach, a set of informative genes, i.e., biomarker candidates, could be identified for disease diagnosis and classification. We applied this approach to the sbv IMPROVER Challenge and achieved the highest rank in the psoriasis sub-challenge. Here, we describe our methodology and results for the sub-challenge.
全局转录组分析是系统生物学的基础,已广泛应用于生物标志物的发现。已经开发出了从转录组学数据中提取有意义的生物信息和有用的基因特征的工具。然而,对于这种目的,没有普遍接受的方法。第一个IMPROVER(研究过程验证的工业方法)挑战是利用临床样本的转录组学数据来评估和验证分类方法。我们建立了一种结合核Fisher判别分类器和特征选择方案的计算方法,该方法使用缩放对齐选择和递归特征消除方法。采用了一种简单可靠的批量效果校正方法。通过这种方法,可以确定一组信息丰富的基因,即生物标志物候选基因,用于疾病诊断和分类。我们将这种方法应用于sbv IMPROVER挑战赛,并在牛皮癣子挑战赛中获得了最高的排名。在这里,我们描述了子挑战的方法和结果。
{"title":"Kernel-based method for feature selection and disease diagnosis using transcriptomics data","authors":"Ji-Hoon Cho, Alan Lin, Kai Wang","doi":"10.4161/sysb.25978","DOIUrl":"https://doi.org/10.4161/sysb.25978","url":null,"abstract":"Global transcriptome profiling is the foundation of systems biology and has been extensively used in biomarker discovery. Tools have been developed to extract meaningful biological information and useful gene features from transcriptomics data. However, there is no commonly accepted method for such purposes. The first IMPROVER (industrial methodology for process verification of research) challenge was launched to assess and verify classification methods using transcriptomics data from clinical samples. We established a computational approach that combined a kernel Fisher discriminant classifier and a feature selection scheme, which used scaled alignment selection and recursive feature elimination methods. A simple and reliable batch effect correction approach was also used. With this approach, a set of informative genes, i.e., biomarker candidates, could be identified for disease diagnosis and classification. We applied this approach to the sbv IMPROVER Challenge and achieved the highest rank in the psoriasis sub-challenge. Here, we describe our methodology and results for the sub-challenge.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"254 - 260"},"PeriodicalIF":0.0,"publicationDate":"2013-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25978","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Relapsing-remitting multiple sclerosis classification using elastic net logistic regression on gene expression data 基于基因表达数据的弹性网络逻辑回归的复发缓解型多发性硬化症分类
Pub Date : 2013-09-19 DOI: 10.4161/sysb.26131
Cheng Zhao, A. Deshwar, Q. Morris
As part of the first Industrial Methodology for Process Verification in Research Challenge, the aim of the MS Diagnostic sub-challenge was to identify a robust diagnostic signature for relapsing-remitting multiple sclerosis from gene expression data. In this regard, we built a classifier that discriminates samples into two phenotype groups, either RRMS or controls, using the transcriptome of peripheral blood mononuclear cells. For our classifier, we used logistic regression with elastic net regression as implemented in the glmnet package in R. We selected the values of the regularization hyper-parameters using cross-validation performance on the provided training data, number of non-zero parameters in our model, and based on the distribution of output values when the input vector for the test data were used with our classifier. We analyzed our classifier performance with two different strategies for feature extraction, using either only genes or including additional constructed features from gene pathways data. The two different strategies produced little differences in performance when comparing the 10-fold cross-validation of the training data and prediction on the test data. Our final submission for the sub-challenge used only genes as features, and identified a diagnostic signature consisting of 58 genes, that was ranked second out of a total of 39 submissions.
作为研究挑战中第一个过程验证工业方法的一部分,MS诊断子挑战的目的是从基因表达数据中确定复发缓解型多发性硬化症的可靠诊断特征。在这方面,我们建立了一个分类器,将样本区分为两个表型组,RRMS或对照组,使用外周血单核细胞的转录组。对于我们的分类器,我们使用了r中glmnet包实现的逻辑回归和弹性网络回归。我们通过对提供的训练数据的交叉验证性能、模型中非零参数的数量以及当测试数据的输入向量与我们的分类器一起使用时,基于输出值的分布来选择正则化超参数的值。我们用两种不同的特征提取策略来分析分类器的性能,要么只使用基因,要么包括来自基因通路数据的额外构建的特征。在对训练数据和测试数据的预测进行10倍交叉验证时,两种不同的策略在性能上的差异很小。我们最终提交的子挑战仅使用基因作为特征,并确定了由58个基因组成的诊断签名,该签名在总共39个提交的文件中排名第二。
{"title":"Relapsing-remitting multiple sclerosis classification using elastic net logistic regression on gene expression data","authors":"Cheng Zhao, A. Deshwar, Q. Morris","doi":"10.4161/sysb.26131","DOIUrl":"https://doi.org/10.4161/sysb.26131","url":null,"abstract":"As part of the first Industrial Methodology for Process Verification in Research Challenge, the aim of the MS Diagnostic sub-challenge was to identify a robust diagnostic signature for relapsing-remitting multiple sclerosis from gene expression data. In this regard, we built a classifier that discriminates samples into two phenotype groups, either RRMS or controls, using the transcriptome of peripheral blood mononuclear cells. For our classifier, we used logistic regression with elastic net regression as implemented in the glmnet package in R. We selected the values of the regularization hyper-parameters using cross-validation performance on the provided training data, number of non-zero parameters in our model, and based on the distribution of output values when the input vector for the test data were used with our classifier. We analyzed our classifier performance with two different strategies for feature extraction, using either only genes or including additional constructed features from gene pathways data. The two different strategies produced little differences in performance when comparing the 10-fold cross-validation of the training data and prediction on the test data. Our final submission for the sub-challenge used only genes as features, and identified a diagnostic signature consisting of 58 genes, that was ranked second out of a total of 39 submissions.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"247 - 253"},"PeriodicalIF":0.0,"publicationDate":"2013-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.26131","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Predicting COPD status with a random generalized linear model 用随机广义线性模型预测COPD状态
Pub Date : 2013-09-19 DOI: 10.4161/sysb.25981
Lin Song, S. Horvath
Sample classification, especially disease status prediction, is an important area of investigation for gene expression studies. Many machine learning methods have been developed to tackle this problem. To evaluate different prediction methods, the IMPROVER Challenge made several data sets available. Here we focus on one sub-challenge: chronic obstructive pulmonary disease (COPD). We outlined critical preprocessing steps to make training and test data comparable. We compared our recently introduced random generalized linear model (RGLM) predictor with Leo Breiman’s random forest (RF) predictor on the COPD data set. We discussed potential reasons for the superior performance of the RGLM predictor in this sub-challenge. Interestingly, we found that although several genes were highly predictive of COPD status, none were necessary to achieve accurate prediction when demographic features smoking status and age were used. In conclusion, RGLM achieved superior predictive accuracy for predicting COPD status with smoking status and age as mandatory features. Future cohort studies could evaluate whether the resulting predictor has clinical utility.
样本分类,特别是疾病状态预测,是基因表达研究的一个重要研究领域。为了解决这个问题,已经开发了许多机器学习方法。为了评估不同的预测方法,IMPROVER挑战赛提供了几个数据集。在这里,我们关注一个子挑战:慢性阻塞性肺疾病(COPD)。我们概述了关键的预处理步骤,以使训练和测试数据具有可比性。我们比较了我们最近引入的随机广义线性模型(RGLM)预测器与Leo Breiman的随机森林(RF)预测器对COPD数据集的影响。我们讨论了RGLM预测器在这个子挑战中表现优异的潜在原因。有趣的是,我们发现,虽然有几个基因可以高度预测COPD状态,但当使用人口统计学特征吸烟状况和年龄时,没有必要实现准确的预测。综上所述,RGLM在预测吸烟状况和年龄为强制性特征的COPD状态方面具有优越的预测准确性。未来的队列研究可以评估预测结果是否具有临床实用性。
{"title":"Predicting COPD status with a random generalized linear model","authors":"Lin Song, S. Horvath","doi":"10.4161/sysb.25981","DOIUrl":"https://doi.org/10.4161/sysb.25981","url":null,"abstract":"Sample classification, especially disease status prediction, is an important area of investigation for gene expression studies. Many machine learning methods have been developed to tackle this problem. To evaluate different prediction methods, the IMPROVER Challenge made several data sets available. Here we focus on one sub-challenge: chronic obstructive pulmonary disease (COPD). We outlined critical preprocessing steps to make training and test data comparable. We compared our recently introduced random generalized linear model (RGLM) predictor with Leo Breiman’s random forest (RF) predictor on the COPD data set. We discussed potential reasons for the superior performance of the RGLM predictor in this sub-challenge. Interestingly, we found that although several genes were highly predictive of COPD status, none were necessary to achieve accurate prediction when demographic features smoking status and age were used. In conclusion, RGLM achieved superior predictive accuracy for predicting COPD status with smoking status and age as mandatory features. Future cohort studies could evaluate whether the resulting predictor has clinical utility.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"261 - 267"},"PeriodicalIF":0.0,"publicationDate":"2013-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25981","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
sbv IMPROVER Diagnostic Signature Challenge sbv improved诊断签名挑战
Pub Date : 2013-09-17 DOI: 10.4161/sysb.26325
K. Rhrissorrakrai, John Rice, S. Boué, M. Talikka, E. Bilal, F. Martin, Pablo Meyer, R. Norel, Yang Xiang, G. Stolovitzky, J. Hoeng, M. Peitsch
The sbv IMPROVER (systems biology verification—Industrial Methodology for Process Verification in Research) process aims to help companies verify component steps or tasks in larger research workflows for industrial applications. IMPROVER is built on challenges posed to the community that draws on the wisdom of crowds to assess the most suitable methods for a given research task. The Diagnostic Signature Challenge, open to the public from Mar. 5 to Jun. 21, 2012, was the first instantiation of the IMPROVER methodology and evaluated a fundamental biological question, specifically, if there is sufficient information in gene expression data to diagnose diseases. Fifty-four teams used publically available data to develop prediction models in four disease areas: multiple sclerosis, lung cancer, psoriasis, and chronic obstructive pulmonary disease. The predictions were scored against unpublished, blinded data provided by the organizers, and the results, including methods of the top performers, presented at a conference in Boston on Oct. 2–3, 2012. This paper offers an overview of the Diagnostic Signature Challenge and the accompanying symposium, and is the first article in a special issue of Systems Biomedicine, providing focused reviews of the submitted methods and general conclusions from the challenge. Overall, it was observed that optimal method choice and performance appeared largely dependent on endpoint, and results indicate the psoriasis and lung cancer subtypes sub-challenges were more accurately predicted, while the remaining classification tasks were much more challenging. Though no one approach was superior for every sub-challenge, there were methods, like linear discriminant analysis, that were found to perform consistently well in all.
sbv IMPROVER(系统生物学验证-研究过程验证的工业方法)过程旨在帮助公司在工业应用的大型研究工作流程中验证组件步骤或任务。IMPROVER是建立在对社区提出的挑战上的,它利用人群的智慧来评估最适合特定研究任务的方法。诊断签名挑战赛于2012年3月5日至6月21日向公众开放,是首次使用IMPROVER方法,评估一个基本的生物学问题,特别是基因表达数据中是否有足够的信息来诊断疾病。54个团队利用公开数据开发了四个疾病领域的预测模型:多发性硬化症、肺癌、牛皮癣和慢性阻塞性肺病。这些预测是根据组织者提供的未发表的盲法数据进行评分的,结果在2012年10月2日至3日波士顿的一次会议上公布,其中包括表现最佳的方法。本文概述了诊断签名挑战和伴随的研讨会,是《系统生物医学》特刊上的第一篇文章,重点综述了提交的方法和来自挑战的一般结论。总体而言,观察到最佳方法选择和性能在很大程度上依赖于终点,结果表明银屑病和肺癌亚型的亚挑战预测更准确,而其余分类任务更具挑战性。虽然没有一种方法对每个子挑战都是优越的,但有一些方法,如线性判别分析,被发现在所有子挑战中都表现得很好。
{"title":"sbv IMPROVER Diagnostic Signature Challenge","authors":"K. Rhrissorrakrai, John Rice, S. Boué, M. Talikka, E. Bilal, F. Martin, Pablo Meyer, R. Norel, Yang Xiang, G. Stolovitzky, J. Hoeng, M. Peitsch","doi":"10.4161/sysb.26325","DOIUrl":"https://doi.org/10.4161/sysb.26325","url":null,"abstract":"The sbv IMPROVER (systems biology verification—Industrial Methodology for Process Verification in Research) process aims to help companies verify component steps or tasks in larger research workflows for industrial applications. IMPROVER is built on challenges posed to the community that draws on the wisdom of crowds to assess the most suitable methods for a given research task. The Diagnostic Signature Challenge, open to the public from Mar. 5 to Jun. 21, 2012, was the first instantiation of the IMPROVER methodology and evaluated a fundamental biological question, specifically, if there is sufficient information in gene expression data to diagnose diseases. Fifty-four teams used publically available data to develop prediction models in four disease areas: multiple sclerosis, lung cancer, psoriasis, and chronic obstructive pulmonary disease. The predictions were scored against unpublished, blinded data provided by the organizers, and the results, including methods of the top performers, presented at a conference in Boston on Oct. 2–3, 2012. This paper offers an overview of the Diagnostic Signature Challenge and the accompanying symposium, and is the first article in a special issue of Systems Biomedicine, providing focused reviews of the submitted methods and general conclusions from the challenge. Overall, it was observed that optimal method choice and performance appeared largely dependent on endpoint, and results indicate the psoriasis and lung cancer subtypes sub-challenges were more accurately predicted, while the remaining classification tasks were much more challenging. Though no one approach was superior for every sub-challenge, there were methods, like linear discriminant analysis, that were found to perform consistently well in all.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"196 - 207"},"PeriodicalIF":0.0,"publicationDate":"2013-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.26325","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
sbv IMPROVER Diagnostic Signature Challenge sbv improved诊断签名挑战
Pub Date : 2013-09-17 DOI: 10.4161/sysb.26326
R. Norel, E. Bilal, Nathalie Conrad-Chemineau, Richard Bonneau, A. G. de la Fuente, I. Jurisica, D. Marbach, Pablo Meyer, J. Rice, T. Tuller, G. Stolovitzky
Evaluating the performance of computational methods to analyze high throughput data are an integral component of model development and critical to progress in computational biology. In collaborative-competitions, model performance evaluation is crucial to determine the best performing submission. Here we present the scoring methodology used to assess 54 submissions to the IMPROVER Diagnostic Signature Challenge. Participants were tasked with classifying patients’ disease phenotype based on gene expression data in four disease areas: Psoriasis, Chronic Obstructive Pulmonary Disease, Lung Cancer, and Multiple Sclerosis. We discuss the criteria underlying the choice of the three scoring metrics we chose to assess the performance of the submitted models. The statistical significance of the difference in performance between individual submissions and classification tasks varied according to these different metrics. Accordingly, we consider an aggregation of these three assessment methods and present the approaches considered for aggregating the ranking and ultimately determining the final overall best performer.
评估分析高通量数据的计算方法的性能是模型开发的一个组成部分,对计算生物学的进步至关重要。在协作竞赛中,模型性能评估是确定最佳表现的关键。在这里,我们提出了用于评估54份提交给IMPROVER诊断签名挑战的评分方法。参与者的任务是根据四个疾病领域的基因表达数据对患者的疾病表型进行分类:牛皮癣、慢性阻塞性肺病、肺癌和多发性硬化症。我们讨论了我们选择用来评估所提交模型的性能的三个评分指标的基本选择标准。单个提交和分类任务之间性能差异的统计显著性根据这些不同的度量而变化。因此,我们考虑了这三种评估方法的汇总,并提出了用于汇总排名并最终确定最终整体最佳表现的方法。
{"title":"sbv IMPROVER Diagnostic Signature Challenge","authors":"R. Norel, E. Bilal, Nathalie Conrad-Chemineau, Richard Bonneau, A. G. de la Fuente, I. Jurisica, D. Marbach, Pablo Meyer, J. Rice, T. Tuller, G. Stolovitzky","doi":"10.4161/sysb.26326","DOIUrl":"https://doi.org/10.4161/sysb.26326","url":null,"abstract":"Evaluating the performance of computational methods to analyze high throughput data are an integral component of model development and critical to progress in computational biology. In collaborative-competitions, model performance evaluation is crucial to determine the best performing submission. Here we present the scoring methodology used to assess 54 submissions to the IMPROVER Diagnostic Signature Challenge. Participants were tasked with classifying patients’ disease phenotype based on gene expression data in four disease areas: Psoriasis, Chronic Obstructive Pulmonary Disease, Lung Cancer, and Multiple Sclerosis. We discuss the criteria underlying the choice of the three scoring metrics we chose to assess the performance of the submitted models. The statistical significance of the difference in performance between individual submissions and classification tasks varied according to these different metrics. Accordingly, we consider an aggregation of these three assessment methods and present the approaches considered for aggregating the ranking and ultimately determining the final overall best performer.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"208 - 216"},"PeriodicalIF":0.0,"publicationDate":"2013-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.26326","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
sbv IMPROVER Diagnostic Signature Challenge sbv improved诊断签名挑战
Pub Date : 2013-09-12 DOI: 10.4161/sysb.26324
J. Hoeng, G. Stolovitzky, M. Peitsch
The task of predicting disease phenotype from gene expression data has been addressed hundreds if not thousands of times in the recent literature. This expanding body of work is not only an indication that the problem is of great importance and general interest, but it also reveals that neither the experimental nor the computational limitations of translating data to disease information have been satisfactorily understood. To contribute to the advancement of the field, promote collaborative thinking and enable a fair and unbiased comparison of methods, IMPROVER revisited the problem of gene-expression to phenotype prediction using a collaborative-competition paradigm. This special issue of Systems Biomedicine reports the results of the sbv IMPROVER Diagnostic Signature Challenge designed to identify best analytic approaches to predict phenotype from gene expression data.
在最近的文献中,从基因表达数据预测疾病表型的任务已经被解决了数百次,如果不是数千次的话。这一不断扩大的工作不仅表明这个问题非常重要和普遍感兴趣,而且还表明,将数据转化为疾病信息的实验和计算限制都没有得到令人满意的理解。为了促进该领域的发展,促进协作思维,并使方法的比较公平和公正,IMPROVER使用协作-竞争范式重新审视了基因表达到表型预测的问题。本期《系统生物医学》特刊报道了sbv IMPROVER诊断特征挑战的结果,该挑战旨在确定从基因表达数据中预测表型的最佳分析方法。
{"title":"sbv IMPROVER Diagnostic Signature Challenge","authors":"J. Hoeng, G. Stolovitzky, M. Peitsch","doi":"10.4161/sysb.26324","DOIUrl":"https://doi.org/10.4161/sysb.26324","url":null,"abstract":"The task of predicting disease phenotype from gene expression data has been addressed hundreds if not thousands of times in the recent literature. This expanding body of work is not only an indication that the problem is of great importance and general interest, but it also reveals that neither the experimental nor the computational limitations of translating data to disease information have been satisfactorily understood. To contribute to the advancement of the field, promote collaborative thinking and enable a fair and unbiased comparison of methods, IMPROVER revisited the problem of gene-expression to phenotype prediction using a collaborative-competition paradigm. This special issue of Systems Biomedicine reports the results of the sbv IMPROVER Diagnostic Signature Challenge designed to identify best analytic approaches to predict phenotype from gene expression data.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"193 - 195"},"PeriodicalIF":0.0,"publicationDate":"2013-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.26324","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70655379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Methodological approach from the Best Overall Team in the sbv IMPROVER Diagnostic Signature Challenge 来自sbv improved诊断签名挑战赛最佳整体团队的方法学方法
Pub Date : 2013-09-12 DOI: 10.4161/sysb.25980
A. Tarca, N. Than, R. Romero
The sbv IMPROVER Diagnostic Signature Challenge used crowdsourcing to identify the best methods to classify clinical samples using transcriptomics data. Participating teams used public microarray data sets to develop prediction models in four disease areas, and then made predictions on blinded test data generated by the organizers. Here we describe the approach of the team for the Perinatology Research Branch (Team PRB; AL Tarca, R Romero), that was awarded the best performing entrant prize out of 54 entrants. The key elements of our approach included: (1) selection of training data sets by trial and error; (2) removal of batch effects by pre-processing the test and training data together; (3) the use of statistical significance and magnitude of change to select biomarkers; and (4) optimization of the number of biomarkers via the cross-validated performance of a simple linear discriminant analysis (LDA) model. Not only were our resulting models ranked consistently high, but they also generated parsimonious signatures of as low as two genes, unlike most of the other top-ranked teams that used hundreds of genes for prediction.
sbv IMPROVER诊断签名挑战赛采用众包的方式,利用转录组学数据确定对临床样本进行分类的最佳方法。参与团队使用公共微阵列数据集开发了四个疾病领域的预测模型,然后根据组织者生成的盲法测试数据进行预测。在这里,我们描述了围产期研究部门(团队PRB;AL Tarca, R Romero),从54个参赛者中获得了最佳表现参赛者奖。我们方法的关键要素包括:(1)通过试错法选择训练数据集;(2)通过对测试数据和训练数据进行预处理,去除批次效应;(3)利用统计显著性和变化幅度来选择生物标志物;(4)通过交叉验证的简单线性判别分析(LDA)模型优化生物标记物的数量。我们的结果模型不仅排名一直很高,而且它们还生成了低至两个基因的简约特征,这与其他大多数排名靠前的团队使用数百个基因进行预测不同。
{"title":"Methodological approach from the Best Overall Team in the sbv IMPROVER Diagnostic Signature Challenge","authors":"A. Tarca, N. Than, R. Romero","doi":"10.4161/sysb.25980","DOIUrl":"https://doi.org/10.4161/sysb.25980","url":null,"abstract":"The sbv IMPROVER Diagnostic Signature Challenge used crowdsourcing to identify the best methods to classify clinical samples using transcriptomics data. Participating teams used public microarray data sets to develop prediction models in four disease areas, and then made predictions on blinded test data generated by the organizers. Here we describe the approach of the team for the Perinatology Research Branch (Team PRB; AL Tarca, R Romero), that was awarded the best performing entrant prize out of 54 entrants. The key elements of our approach included: (1) selection of training data sets by trial and error; (2) removal of batch effects by pre-processing the test and training data together; (3) the use of statistical significance and magnitude of change to select biomarkers; and (4) optimization of the number of biomarkers via the cross-validated performance of a simple linear discriminant analysis (LDA) model. Not only were our resulting models ranked consistently high, but they also generated parsimonious signatures of as low as two genes, unlike most of the other top-ranked teams that used hundreds of genes for prediction.","PeriodicalId":90057,"journal":{"name":"Systems biomedicine (Austin, Tex.)","volume":"1 1","pages":"217 - 227"},"PeriodicalIF":0.0,"publicationDate":"2013-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.4161/sysb.25980","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"70654950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
期刊
Systems biomedicine (Austin, Tex.)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1