科学数据分析工作流程管理系统评价标准

Aleyna Dilan Kiran, Mehmet Can Ay, J. Allmer
{"title":"科学数据分析工作流程管理系统评价标准","authors":"Aleyna Dilan Kiran, Mehmet Can Ay, J. Allmer","doi":"10.26502/jbsb.5107055","DOIUrl":null,"url":null,"abstract":"Many scientific endeavors, such as molecular biology, have become dependent on big data and its analysis. For example, precision medicine depends on molecular measurements and data analysis per patient. Data analyses supporting medical decisions must be standardized and performed consistently across patients. While perhaps not life-threatening, data analyses in basic research have become increasingly complex. RNA-seq data, for example, entails a multi-step analysis ranging from quality assessment of the measurements to statistical analyses. Workflow management systems (WFMS) enable the development of data analysis workflows (WF), their reproduction, and their application to datasets of the same type. However, far more than a hundred WFMS are available, and there is no way to convert data analysis WFs among WFMS. Therefore, the initial choice of a WFMS is important as it entails a lock-in to the system. The reach in their particular field (number of citations) can be used as a proxy for selecting a WFMS, but of the about 25 WFMS we mention in this work, at least 5 have a large reach in scientific data analysis. Hence other criteria are needed to delineate among WFMS. By extracting such criteria from selected studies concerning WFMS and adding additional criteria, we arrived at five critical criteria: reproducibility, reusability, FAIRness, versioning support, and security. Another five criteria (providing a graphical user interface, WF flexibility, WF scalability, WF shareability, and computational transparency) we deemed important but not critical for the assessment of WFMS. We applied the criteria to the most cited WFMS in PubMed and found none that support all criteria. We hope that suggesting these criteria will spark a discussion on what features are important for WFMS in scientific data analysis and may lead to developing WFMS that fulfill such criteria.","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Criteria for the Evaluation of Workflow Management Systems for Scientific Data Analysis\",\"authors\":\"Aleyna Dilan Kiran, Mehmet Can Ay, J. Allmer\",\"doi\":\"10.26502/jbsb.5107055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many scientific endeavors, such as molecular biology, have become dependent on big data and its analysis. For example, precision medicine depends on molecular measurements and data analysis per patient. Data analyses supporting medical decisions must be standardized and performed consistently across patients. While perhaps not life-threatening, data analyses in basic research have become increasingly complex. RNA-seq data, for example, entails a multi-step analysis ranging from quality assessment of the measurements to statistical analyses. Workflow management systems (WFMS) enable the development of data analysis workflows (WF), their reproduction, and their application to datasets of the same type. However, far more than a hundred WFMS are available, and there is no way to convert data analysis WFs among WFMS. Therefore, the initial choice of a WFMS is important as it entails a lock-in to the system. The reach in their particular field (number of citations) can be used as a proxy for selecting a WFMS, but of the about 25 WFMS we mention in this work, at least 5 have a large reach in scientific data analysis. Hence other criteria are needed to delineate among WFMS. By extracting such criteria from selected studies concerning WFMS and adding additional criteria, we arrived at five critical criteria: reproducibility, reusability, FAIRness, versioning support, and security. Another five criteria (providing a graphical user interface, WF flexibility, WF scalability, WF shareability, and computational transparency) we deemed important but not critical for the assessment of WFMS. We applied the criteria to the most cited WFMS in PubMed and found none that support all criteria. We hope that suggesting these criteria will spark a discussion on what features are important for WFMS in scientific data analysis and may lead to developing WFMS that fulfill such criteria.\",\"PeriodicalId\":73617,\"journal\":{\"name\":\"Journal of bioinformatics and systems biology : Open access\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of bioinformatics and systems biology : Open access\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26502/jbsb.5107055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of bioinformatics and systems biology : Open access","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26502/jbsb.5107055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

许多科学研究,如分子生物学,都依赖于大数据及其分析。例如,精准医疗依赖于每位患者的分子测量和数据分析。支持医疗决策的数据分析必须标准化,并在患者之间一致执行。虽然可能不会危及生命,但基础研究中的数据分析已经变得越来越复杂。例如,RNA-seq数据需要从测量的质量评估到统计分析的多步骤分析。工作流管理系统(WFMS)支持数据分析工作流(WF)的开发、复制以及对同一类型数据集的应用。然而,可用的WFMS远远超过100个,并且没有办法在WFMS之间转换数据分析wf。因此,WFMS的初始选择很重要,因为它需要对系统进行锁定。在其特定领域的影响力(引用次数)可以用作选择WFMS的代理,但在我们在本工作中提到的大约25个WFMS中,至少有5个在科学数据分析方面具有很大的影响力。因此,需要其他标准来描述WFMS。通过从有关WFMS的选定研究中提取这些标准并添加其他标准,我们得出了五个关键标准:再现性、可重用性、公平性、版本支持和安全性。另外五个标准(提供图形用户界面、WF灵活性、WF可伸缩性、WF可共享性和计算透明性)我们认为对评估WFMS很重要,但不是关键。我们将这些标准应用到PubMed中被引用最多的WFMS中,发现没有一个符合所有标准。我们希望提出这些标准将引发关于WFMS在科学数据分析中哪些特征是重要的讨论,并可能导致开发满足这些标准的WFMS。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Criteria for the Evaluation of Workflow Management Systems for Scientific Data Analysis
Many scientific endeavors, such as molecular biology, have become dependent on big data and its analysis. For example, precision medicine depends on molecular measurements and data analysis per patient. Data analyses supporting medical decisions must be standardized and performed consistently across patients. While perhaps not life-threatening, data analyses in basic research have become increasingly complex. RNA-seq data, for example, entails a multi-step analysis ranging from quality assessment of the measurements to statistical analyses. Workflow management systems (WFMS) enable the development of data analysis workflows (WF), their reproduction, and their application to datasets of the same type. However, far more than a hundred WFMS are available, and there is no way to convert data analysis WFs among WFMS. Therefore, the initial choice of a WFMS is important as it entails a lock-in to the system. The reach in their particular field (number of citations) can be used as a proxy for selecting a WFMS, but of the about 25 WFMS we mention in this work, at least 5 have a large reach in scientific data analysis. Hence other criteria are needed to delineate among WFMS. By extracting such criteria from selected studies concerning WFMS and adding additional criteria, we arrived at five critical criteria: reproducibility, reusability, FAIRness, versioning support, and security. Another five criteria (providing a graphical user interface, WF flexibility, WF scalability, WF shareability, and computational transparency) we deemed important but not critical for the assessment of WFMS. We applied the criteria to the most cited WFMS in PubMed and found none that support all criteria. We hope that suggesting these criteria will spark a discussion on what features are important for WFMS in scientific data analysis and may lead to developing WFMS that fulfill such criteria.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Linear Regression of Sampling Distributions of the Mean. Transcriptional and Translational Regulation of Differentially Expressed Genes in Yucatan Miniswine Brain Tissues following Traumatic Brain Injury. The Growing Liberality Observed in Primary Animal and Plant Cultures is Common to the Social Amoeba. Role of Transcription Factors and MicroRNAs in Regulating Fibroblast Reprogramming in Wound Healing. CDKs Functional Analysis in Low Proliferating Early-Stage Pancreatic Ductal Adenocarcinoma.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1