FC1000:系统扰动人类细胞的归一化基因表达变化。

IF 0.8 4区数学 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Statistical Applications in Genetics and Molecular Biology Pub Date : 2017-09-26 DOI:10.1515/sagmb-2016-0072

Ingrid M Lönnstedt, Sven Nelander

{"title":"FC1000:系统扰动人类细胞的归一化基因表达变化。","authors":"Ingrid M Lönnstedt, Sven Nelander","doi":"10.1515/sagmb-2016-0072","DOIUrl":null,"url":null,"abstract":"The systematic study of transcriptional responses to genetic and chemical perturbations in human cells is still in its early stages. The largest available dataset to date is the newly released L1000 compendium. With its 1.3 million gene expression profiles of treated human cells it offers many opportunities for biomedical data mining, but also data normalization challenges of new dimensions. We developed a novel and practical approach to obtain accurate estimates of fold change response profiles from L1000, based on the RUV (Remove Unwanted Variation) statistical framework. Extending RUV to a big data setting, we propose an estimation procedure, in which an underlying RUV model is tuned by feedback through dataset specific statistical measures, reflecting p-value distributions and internal gene knockdown controls. Applying these metrics - termed evaluation endpoints - to disjoint data splits and integrating the results to select an optimal normalization, the procedure reduces bias and noise in the L1000 data, which in turn broadens the potential of this resource for pharmacological and functional genomic analyses. Our pipeline and normalization results are distributed as an R package (nelanderlab.org/FC1000.html).","PeriodicalId":48980,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"16 4","pages":"217-242"},"PeriodicalIF":0.8000,"publicationDate":"2017-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/sagmb-2016-0072","citationCount":"4","resultStr":"{\"title\":\"FC1000: normalized gene expression changes of systematically perturbed human cells.\",\"authors\":\"Ingrid M Lönnstedt, Sven Nelander\",\"doi\":\"10.1515/sagmb-2016-0072\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The systematic study of transcriptional responses to genetic and chemical perturbations in human cells is still in its early stages. The largest available dataset to date is the newly released L1000 compendium. With its 1.3 million gene expression profiles of treated human cells it offers many opportunities for biomedical data mining, but also data normalization challenges of new dimensions. We developed a novel and practical approach to obtain accurate estimates of fold change response profiles from L1000, based on the RUV (Remove Unwanted Variation) statistical framework. Extending RUV to a big data setting, we propose an estimation procedure, in which an underlying RUV model is tuned by feedback through dataset specific statistical measures, reflecting p-value distributions and internal gene knockdown controls. Applying these metrics - termed evaluation endpoints - to disjoint data splits and integrating the results to select an optimal normalization, the procedure reduces bias and noise in the L1000 data, which in turn broadens the potential of this resource for pharmacological and functional genomic analyses. Our pipeline and normalization results are distributed as an R package (nelanderlab.org/FC1000.html).\",\"PeriodicalId\":48980,\"journal\":{\"name\":\"Statistical Applications in Genetics and Molecular Biology\",\"volume\":\"16 4\",\"pages\":\"217-242\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2017-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1515/sagmb-2016-0072\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Applications in Genetics and Molecular Biology\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1515/sagmb-2016-0072\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Applications in Genetics and Molecular Biology","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1515/sagmb-2016-0072","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 4

摘要

对人类细胞中遗传和化学扰动的转录反应的系统研究仍处于早期阶段。迄今为止最大的可用数据集是新发布的L1000纲要。它拥有130万个处理过的人类细胞的基因表达谱，为生物医学数据挖掘提供了许多机会，但也为数据规范化带来了新的挑战。我们开发了一种新颖实用的方法，基于RUV(去除不必要的变化)统计框架，获得L1000的折叠变化响应曲线的准确估计。将RUV扩展到大数据环境，我们提出了一种估计过程，其中底层RUV模型通过数据集特定统计措施的反馈进行调整，反映p值分布和内部基因敲低控制。将这些指标(称为评估终点)应用于不相交的数据分割并整合结果以选择最佳归一化，该过程减少了L1000数据中的偏差和噪声，从而扩大了该资源用于药理学和功能基因组分析的潜力。我们的管道和规范化结果作为R包发布(nelanderlab.org/FC1000.html)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

FC1000: normalized gene expression changes of systematically perturbed human cells.

The systematic study of transcriptional responses to genetic and chemical perturbations in human cells is still in its early stages. The largest available dataset to date is the newly released L1000 compendium. With its 1.3 million gene expression profiles of treated human cells it offers many opportunities for biomedical data mining, but also data normalization challenges of new dimensions. We developed a novel and practical approach to obtain accurate estimates of fold change response profiles from L1000, based on the RUV (Remove Unwanted Variation) statistical framework. Extending RUV to a big data setting, we propose an estimation procedure, in which an underlying RUV model is tuned by feedback through dataset specific statistical measures, reflecting p-value distributions and internal gene knockdown controls. Applying these metrics - termed evaluation endpoints - to disjoint data splits and integrating the results to select an optimal normalization, the procedure reduces bias and noise in the L1000 data, which in turn broadens the potential of this resource for pharmacological and functional genomic analyses. Our pipeline and normalization results are distributed as an R package (nelanderlab.org/FC1000.html).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Statistical Applications in Genetics and Molecular Biology BIOCHEMISTRY & MOLECULAR BIOLOGY-MATHEMATICAL & COMPUTATIONAL BIOLOGY

自引率

11.10%

发文量

期刊介绍： Statistical Applications in Genetics and Molecular Biology seeks to publish significant research on the application of statistical ideas to problems arising from computational biology. The focus of the papers should be on the relevant statistical issues but should contain a succinct description of the relevant biological problem being considered. The range of topics is wide and will include topics such as linkage mapping, association studies, gene finding and sequence alignment, protein structure prediction, design and analysis of microarray data, molecular evolution and phylogenetic trees, DNA topology, and data base search strategies. Both original research and review articles will be warmly received.