Harmonizing the Generation and Pre-publication Stewardship of FAIR bioimage data.

ArXiv Pub Date : 2025-09-04

Nikki Bialy, Frank Alber, Brenda Andrews, Michael Angelo, Brian Beliveau, Lacramioara Bintu, Alistair Boettiger, Ulrike Boehm, Claire M Brown, Mahmoud Bukar Maina, James J Chambers, Beth A Cimini, Kevin Eliceiri, Rachel Errington, Orestis Faklaris, Nathalie Gaudreault, Ronald N Germain, Wojtek Goscinski, David Grunwald, Michael Halter, Dorit Hanein, John W Hickey, Judith Lacoste, Alex Laude, Emma Lundberg, Jian Ma, Leonel Malacrida, Josh Moore, Glyn Nelson, Elizabeth Kathleen Neumann, Roland Nitschke, Shuichi Onami, Jaime A Pimentel, Anne L Plant, Andrea J Radtke, Bikash Sabata, Denis Schapiro, Johannes Schöneberg, Jeffrey M Spraggins, Damir Sudar, Wouter-Michiel Adrien Maria Vierdag, Niels Volkmann, Carolina Wählby, Siyuan Steven Wang, Ziv Yaniv, Caterina Strambio-De-Castillia

{"title":"Harmonizing the Generation and Pre-publication Stewardship of FAIR bioimage data.","authors":"Nikki Bialy, Frank Alber, Brenda Andrews, Michael Angelo, Brian Beliveau, Lacramioara Bintu, Alistair Boettiger, Ulrike Boehm, Claire M Brown, Mahmoud Bukar Maina, James J Chambers, Beth A Cimini, Kevin Eliceiri, Rachel Errington, Orestis Faklaris, Nathalie Gaudreault, Ronald N Germain, Wojtek Goscinski, David Grunwald, Michael Halter, Dorit Hanein, John W Hickey, Judith Lacoste, Alex Laude, Emma Lundberg, Jian Ma, Leonel Malacrida, Josh Moore, Glyn Nelson, Elizabeth Kathleen Neumann, Roland Nitschke, Shuichi Onami, Jaime A Pimentel, Anne L Plant, Andrea J Radtke, Bikash Sabata, Denis Schapiro, Johannes Schöneberg, Jeffrey M Spraggins, Damir Sudar, Wouter-Michiel Adrien Maria Vierdag, Niels Volkmann, Carolina Wählby, Siyuan Steven Wang, Ziv Yaniv, Caterina Strambio-De-Castillia","doi":"","DOIUrl":null,"url":null,"abstract":"Biological imaging, combined with molecular insights into genes and proteins, holds immense promise for deepening our understanding of complex cellular systems and accelerating the development of predictive, personalized therapies for human health. To fully realize this potential at scale-and harness the power of AI/ML to extract novel biological insights and therapeutic interventions-it is necessary to transition from siloed datasets to globally shared, rigorously annotated, and computationally ready image data. This demands systematic harmonization of multidimensional bioimaging data, where interoperable formats, and standardized context-rich annotation, quality controls, and analytical pipelines transform scattered observations into a coherent knowledge base ripe for computational mining. Only this machine-actionable aggregation can provide the substrate for AI/ML to extract mechanistic insights into fundamental biological mechanisms, novel diagnostic biomarkers and intervention targets. Enabling seamless image data sharing in the life sciences requires addressing two key areas. The first is outlined in an accompanying publication, Enabling Global Image Data Sharing in the Life Sciences, which focuses on the publicly available repositories needed to share digital array data 1. This White Paper details a comprehensive set of requirements for integrated image data and metadata management - from acquisition through dissemination - ensuring the contextual information necessary for assessing quality, interpreting scientific validity, and enabling meaningful reuse remains intrinsically linked to the data throughout its lifecycle. Critically, it recognizes that generating harmonized, well-annotated publicly available corpora of FAIR bioimage data requires these datasets to be \"FAIR-from-the-start\" - an objective that can only be achieved by enabling experimental scientists to manage, organize, and analyze their data according to community standards from the very first experiment. Building on recent progress made by the bioimaging field towards establishing shared practices for bioimaging Quality Control (QC) and metadata capture, we present actionable recommendations to advance these efforts through embedding researcher-friendly integrated software infrastructure directly into pre-publication workflows, thus transforming disorganized data capture into structured, shareable resources ready for aggregation and reuse. Our ultimate goal is to expand the use of streamlined tools and practices thus transforming how researchers capture, annotate, analyze and eventually publish bioimaging data thus laying the foundation for a new era of data-driven discovery.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10862930/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ArXiv","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Biological imaging, combined with molecular insights into genes and proteins, holds immense promise for deepening our understanding of complex cellular systems and accelerating the development of predictive, personalized therapies for human health. To fully realize this potential at scale-and harness the power of AI/ML to extract novel biological insights and therapeutic interventions-it is necessary to transition from siloed datasets to globally shared, rigorously annotated, and computationally ready image data. This demands systematic harmonization of multidimensional bioimaging data, where interoperable formats, and standardized context-rich annotation, quality controls, and analytical pipelines transform scattered observations into a coherent knowledge base ripe for computational mining. Only this machine-actionable aggregation can provide the substrate for AI/ML to extract mechanistic insights into fundamental biological mechanisms, novel diagnostic biomarkers and intervention targets. Enabling seamless image data sharing in the life sciences requires addressing two key areas. The first is outlined in an accompanying publication, Enabling Global Image Data Sharing in the Life Sciences, which focuses on the publicly available repositories needed to share digital array data ¹. This White Paper details a comprehensive set of requirements for integrated image data and metadata management - from acquisition through dissemination - ensuring the contextual information necessary for assessing quality, interpreting scientific validity, and enabling meaningful reuse remains intrinsically linked to the data throughout its lifecycle. Critically, it recognizes that generating harmonized, well-annotated publicly available corpora of FAIR bioimage data requires these datasets to be "FAIR-from-the-start" - an objective that can only be achieved by enabling experimental scientists to manage, organize, and analyze their data according to community standards from the very first experiment. Building on recent progress made by the bioimaging field towards establishing shared practices for bioimaging Quality Control (QC) and metadata capture, we present actionable recommendations to advance these efforts through embedding researcher-friendly integrated software infrastructure directly into pre-publication workflows, thus transforming disorganized data capture into structured, shareable resources ready for aggregation and reuse. Our ultimate goal is to expand the use of streamlined tools and practices thus transforming how researchers capture, annotate, analyze and eventually publish bioimaging data thus laying the foundation for a new era of data-driven discovery.

微信好友朋友圈 QQ好友复制链接

本刊更多论文

统一 FAIR 图像数据的生成和出版前管理。

生物图像与基因和蛋白质的分子知识一起，有望极大地提高人们对复杂细胞系统的科学认识，并为人类健康提供预测性和个性化的治疗产品。要实现这一潜力，实验室之间必须在全球范围内共享有质量保证的图像数据，以便进行比较、汇集和重新分析，从而释放出超出数据生成原始目的的巨大潜力。要实现生命科学领域的图像数据共享，需要满足两大类要求。其中一组要求在题为 "促进生命科学领域的全球图像数据共享 "的配套白皮书中有所阐述，该白皮书同时发布，旨在满足建立共享数字阵列数据的网络基础设施的需求。在本白皮书中，我们详细介绍了一系列广泛的要求，其中包括收集、管理、展示和传播背景信息，这些信息对于评估质量、理解内容、解释科学意义以及在实验细节背景下重复使用图像数据至关重要。我们首先概述了迄今为止从国际社区活动中吸取的主要经验教训，这些活动最近在制定成像质量控制（QC）和元数据的社区标准实践方面取得了重大进展。然后，我们提出了一系列明确的建议，以扩大这项工作。推动这项工作的目标是解决剩余的挑战，并使各种生物医学研究人员都能获得日常实践和工具，无论他们的专业知识、资源获取能力和地理位置如何。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ArXiv

自引率

0.00%

发文量