关于ChIP-seq数据的可变性、质量和可重复性

Hamzavi-Pinon Violaine, Cholley, M. Mendoza-Parra, H. Gronemeyer
{"title":"关于ChIP-seq数据的可变性、质量和可重复性","authors":"Hamzavi-Pinon Violaine, Cholley, M. Mendoza-Parra, H. Gronemeyer","doi":"10.14293/S2199-1006.1.SOR-LIFE.ARGGHM.V1","DOIUrl":null,"url":null,"abstract":"The emergence of high throughput technologies with the production of Gigabyte omics datasets has led to revolutionary changes in molecular biology and functional genomics.  Despite the incorporation of increasingly quantitative technologies, the field suffers from important reproducibility problems. Some causes have been identified: they include poor quality management, competition for publishing, funding and jobs, problems in experimental and statistical design of assays. The consequences are - among others - delays in the implementation of efficient and specific anti-cancer treatments, the unnecessary duplication/validation of improperly conducted studies, and the waste of public funding.  Here we wish to discuss another cause of poor reproducibility, which will become increasingly important with the advent of personalized medicine: the generation of poor quality datasets from Next Generation Sequencing (NGS) technologies, specifically those that involve enrichment assays like ChIP-sequencing. Today NGS-derived applications are becoming increasingly popular, which is further supported by decreasing sequencing costs, the rapid development of novel sequencing-based technologies, and the power of genome-wide data interpretation by functional genomics and systems biology approaches. However, the complexity and sensitivity of these technologies bear the risk of introducing various types of bias. Thus, it is rather surprising that only very few quality indicators have been developed to date. The public availability of omics data in large repositories, such as GEO, is no doubt an enormously valuable source. However, by working extensively with such datasets, we realized that the lack of universal quality control indicators in publications and data repositories seriously limits the use of existing data and can contribute to irreproducibility issues. Here we provide examples that illustrate the problems generated by the use of poor quality datasets and propose solutions that would ultimately enhance reproducibility, encourage scientists to use existing datasets in the design and interpretation of their own research projects. Our goal is to increase awareness about the need of linking quality assessment to datasets in the scientific community, and to initiate a discussion on the quality control of big data.","PeriodicalId":91169,"journal":{"name":"ScienceOpen research","volume":"45 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"About the variability, quality and reproducibility of ChIP-seq data\",\"authors\":\"Hamzavi-Pinon Violaine, Cholley, M. Mendoza-Parra, H. Gronemeyer\",\"doi\":\"10.14293/S2199-1006.1.SOR-LIFE.ARGGHM.V1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The emergence of high throughput technologies with the production of Gigabyte omics datasets has led to revolutionary changes in molecular biology and functional genomics.  Despite the incorporation of increasingly quantitative technologies, the field suffers from important reproducibility problems. Some causes have been identified: they include poor quality management, competition for publishing, funding and jobs, problems in experimental and statistical design of assays. The consequences are - among others - delays in the implementation of efficient and specific anti-cancer treatments, the unnecessary duplication/validation of improperly conducted studies, and the waste of public funding.  Here we wish to discuss another cause of poor reproducibility, which will become increasingly important with the advent of personalized medicine: the generation of poor quality datasets from Next Generation Sequencing (NGS) technologies, specifically those that involve enrichment assays like ChIP-sequencing. Today NGS-derived applications are becoming increasingly popular, which is further supported by decreasing sequencing costs, the rapid development of novel sequencing-based technologies, and the power of genome-wide data interpretation by functional genomics and systems biology approaches. However, the complexity and sensitivity of these technologies bear the risk of introducing various types of bias. Thus, it is rather surprising that only very few quality indicators have been developed to date. The public availability of omics data in large repositories, such as GEO, is no doubt an enormously valuable source. However, by working extensively with such datasets, we realized that the lack of universal quality control indicators in publications and data repositories seriously limits the use of existing data and can contribute to irreproducibility issues. Here we provide examples that illustrate the problems generated by the use of poor quality datasets and propose solutions that would ultimately enhance reproducibility, encourage scientists to use existing datasets in the design and interpretation of their own research projects. Our goal is to increase awareness about the need of linking quality assessment to datasets in the scientific community, and to initiate a discussion on the quality control of big data.\",\"PeriodicalId\":91169,\"journal\":{\"name\":\"ScienceOpen research\",\"volume\":\"45 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ScienceOpen research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14293/S2199-1006.1.SOR-LIFE.ARGGHM.V1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ScienceOpen research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14293/S2199-1006.1.SOR-LIFE.ARGGHM.V1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着千兆字节组学数据集的产生,高通量技术的出现导致了分子生物学和功能基因组学的革命性变化。尽管结合了越来越多的定量技术,该领域仍存在重要的可重复性问题。已经确定了一些原因:它们包括质量管理差、对出版、资助和工作的竞争、实验和分析的统计设计问题。其后果是——除其他外——延迟实施有效和特定的抗癌治疗,对不当进行的研究进行不必要的重复/验证,以及浪费公共资金。在这里,我们希望讨论可重复性差的另一个原因,随着个性化医疗的出现,这将变得越来越重要:下一代测序(NGS)技术产生的低质量数据集,特别是那些涉及富集分析的数据集,如芯片测序。如今,ngs衍生的应用正变得越来越受欢迎,这进一步得到了测序成本下降、新型测序技术的快速发展以及功能基因组学和系统生物学方法的全基因组数据解释能力的支持。然而,这些技术的复杂性和敏感性承担了引入各种类型偏见的风险。因此,令人相当惊讶的是,迄今为止只制定了很少的质量指标。在诸如GEO这样的大型存储库中,组学数据的公开可用性无疑是一个非常有价值的资源。然而,通过与这些数据集的广泛合作,我们意识到出版物和数据存储库中缺乏通用的质量控制指标严重限制了现有数据的使用,并可能导致不可复制问题。在这里,我们提供了一些例子来说明使用低质量数据集所产生的问题,并提出了解决方案,这些解决方案将最终提高可重复性,鼓励科学家在设计和解释他们自己的研究项目时使用现有的数据集。我们的目标是提高科学界对将质量评估与数据集联系起来的必要性的认识,并发起关于大数据质量控制的讨论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
About the variability, quality and reproducibility of ChIP-seq data
The emergence of high throughput technologies with the production of Gigabyte omics datasets has led to revolutionary changes in molecular biology and functional genomics.  Despite the incorporation of increasingly quantitative technologies, the field suffers from important reproducibility problems. Some causes have been identified: they include poor quality management, competition for publishing, funding and jobs, problems in experimental and statistical design of assays. The consequences are - among others - delays in the implementation of efficient and specific anti-cancer treatments, the unnecessary duplication/validation of improperly conducted studies, and the waste of public funding.  Here we wish to discuss another cause of poor reproducibility, which will become increasingly important with the advent of personalized medicine: the generation of poor quality datasets from Next Generation Sequencing (NGS) technologies, specifically those that involve enrichment assays like ChIP-sequencing. Today NGS-derived applications are becoming increasingly popular, which is further supported by decreasing sequencing costs, the rapid development of novel sequencing-based technologies, and the power of genome-wide data interpretation by functional genomics and systems biology approaches. However, the complexity and sensitivity of these technologies bear the risk of introducing various types of bias. Thus, it is rather surprising that only very few quality indicators have been developed to date. The public availability of omics data in large repositories, such as GEO, is no doubt an enormously valuable source. However, by working extensively with such datasets, we realized that the lack of universal quality control indicators in publications and data repositories seriously limits the use of existing data and can contribute to irreproducibility issues. Here we provide examples that illustrate the problems generated by the use of poor quality datasets and propose solutions that would ultimately enhance reproducibility, encourage scientists to use existing datasets in the design and interpretation of their own research projects. Our goal is to increase awareness about the need of linking quality assessment to datasets in the scientific community, and to initiate a discussion on the quality control of big data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
1 weeks
期刊最新文献
A review: CRISPR/Cas12-mediated genome editing in fungal cells: advancements, mechanisms, and future directions in plant-fungal pathology Psychosocial risks in the working environment – approaches to formative risk assessment Technological, legal, and sociological summary of biometric technology usage Policy learning from influenza and the preparedness of the public health sector: 2006/2007 influenza season in Latvia Mpemba Effect- the Effect of Time
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1