合成生物学中的数据危害。

IF 2.6 Q2 BIOCHEMICAL RESEARCH METHODS Synthetic biology (Oxford, England) Pub Date : 2024-06-21 eCollection Date: 2024-01-01 DOI:10.1093/synbio/ysae010

Natalie R Zelenka, Nina Di Cara, Kieren Sharma, Seeralan Sarvaharman, Jasdeep S Ghataora, Fabio Parmeggiani, Jeff Nivala, Zahraa S Abdallah, Lucia Marucci, Thomas E Gorochowski

{"title":"合成生物学中的数据危害。","authors":"Natalie R Zelenka, Nina Di Cara, Kieren Sharma, Seeralan Sarvaharman, Jasdeep S Ghataora, Fabio Parmeggiani, Jeff Nivala, Zahraa S Abdallah, Lucia Marucci, Thomas E Gorochowski","doi":"10.1093/synbio/ysae010","DOIUrl":null,"url":null,"abstract":"Data science is playing an increasingly important role in the design and analysis of engineered biology. This has been fueled by the development of high-throughput methods like massively parallel reporter assays, data-rich microscopy techniques, computational protein structure prediction and design, and the development of whole-cell models able to generate huge volumes of data. Although the ability to apply data-centric analyses in these contexts is appealing and increasingly simple to do, it comes with potential risks. For example, how might biases in the underlying data affect the validity of a result and what might the environmental impact of large-scale data analyses be? Here, we present a community-developed framework for assessing data hazards to help address these concerns and demonstrate its application to two synthetic biology case studies. We show the diversity of considerations that arise in common types of bioengineering projects and provide some guidelines and mitigating steps. Understanding potential issues and dangers when working with data and proactively addressing them will be essential for ensuring the appropriate use of emerging data-intensive AI methods and help increase the trustworthiness of their applications in synthetic biology.","PeriodicalId":74902,"journal":{"name":"Synthetic biology (Oxford, England)","volume":"9 1","pages":"ysae010"},"PeriodicalIF":2.6000,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11227101/pdf/","citationCount":"0","resultStr":"{\"title\":\"Data hazards in synthetic biology.\",\"authors\":\"Natalie R Zelenka, Nina Di Cara, Kieren Sharma, Seeralan Sarvaharman, Jasdeep S Ghataora, Fabio Parmeggiani, Jeff Nivala, Zahraa S Abdallah, Lucia Marucci, Thomas E Gorochowski\",\"doi\":\"10.1093/synbio/ysae010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data science is playing an increasingly important role in the design and analysis of engineered biology. This has been fueled by the development of high-throughput methods like massively parallel reporter assays, data-rich microscopy techniques, computational protein structure prediction and design, and the development of whole-cell models able to generate huge volumes of data. Although the ability to apply data-centric analyses in these contexts is appealing and increasingly simple to do, it comes with potential risks. For example, how might biases in the underlying data affect the validity of a result and what might the environmental impact of large-scale data analyses be? Here, we present a community-developed framework for assessing data hazards to help address these concerns and demonstrate its application to two synthetic biology case studies. We show the diversity of considerations that arise in common types of bioengineering projects and provide some guidelines and mitigating steps. Understanding potential issues and dangers when working with data and proactively addressing them will be essential for ensuring the appropriate use of emerging data-intensive AI methods and help increase the trustworthiness of their applications in synthetic biology.\",\"PeriodicalId\":74902,\"journal\":{\"name\":\"Synthetic biology (Oxford, England)\",\"volume\":\"9 1\",\"pages\":\"ysae010\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11227101/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Synthetic biology (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/synbio/ysae010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Synthetic biology (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/synbio/ysae010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

摘要

数据科学在工程生物学的设计和分析中发挥着越来越重要的作用。这得益于高通量方法的发展，如大规模并行报告检测、数据丰富的显微镜技术、计算蛋白质结构预测和设计，以及能够生成大量数据的全细胞模型的发展。虽然在这些情况下应用以数据为中心的分析能力很有吸引力，而且越来越容易做到，但它也伴随着潜在的风险。例如，基础数据中的偏差会如何影响结果的有效性？在这里，我们提出了一个社区开发的数据危害评估框架，以帮助解决这些问题，并展示了该框架在两个合成生物学案例研究中的应用。我们展示了常见类型的生物工程项目中出现的各种考虑因素，并提供了一些指导原则和缓解步骤。了解数据工作中的潜在问题和危险并积极主动地加以解决，对于确保适当使用新兴的数据密集型人工智能方法至关重要，并有助于提高其在合成生物学中应用的可信度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Data hazards in synthetic biology.

Data science is playing an increasingly important role in the design and analysis of engineered biology. This has been fueled by the development of high-throughput methods like massively parallel reporter assays, data-rich microscopy techniques, computational protein structure prediction and design, and the development of whole-cell models able to generate huge volumes of data. Although the ability to apply data-centric analyses in these contexts is appealing and increasingly simple to do, it comes with potential risks. For example, how might biases in the underlying data affect the validity of a result and what might the environmental impact of large-scale data analyses be? Here, we present a community-developed framework for assessing data hazards to help address these concerns and demonstrate its application to two synthetic biology case studies. We show the diversity of considerations that arise in common types of bioengineering projects and provide some guidelines and mitigating steps. Understanding potential issues and dangers when working with data and proactively addressing them will be essential for ensuring the appropriate use of emerging data-intensive AI methods and help increase the trustworthiness of their applications in synthetic biology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Synthetic biology (Oxford, England)

自引率

0.00%

发文量