机器学习任务的规范工作流

IF 1.3 3区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Intelligence Pub Date : 2022-04-01 DOI:10.1162/dint_a_00124

Christophe Blanchi, B. Gebre, P. Wittenburg

{"title":"机器学习任务的规范工作流","authors":"Christophe Blanchi, B. Gebre, P. Wittenburg","doi":"10.1162/dint_a_00124","DOIUrl":null,"url":null,"abstract":"Abstract There is a huge gap between (1) the state of workflow technology on the one hand and the practices in the many labs working with data driven methods on the other and (2) the awareness of the FAIR principles and the lack of changes in practices during the last 5 years. The CWFR concept has been defined which is meant to combine these two intentions, increasing the use of workflow technology and improving FAIR compliance. In the study described in this paper we indicate how this could be applied to machine learning which is now used by almost all research disciplines with the well-known effects of a huge lack of repeatability and reproducibility. Researchers will only change practices if they can work efficiently and are not loaded with additional tasks. A comprehensive CWFR framework would be an umbrella for all steps that need to be carried out to do machine learning on selected data collections and immediately create a comprehensive and FAIR compliant documentation. The researcher is guided by such a framework and information once entered can easily be shared and reused. The many iterations normally required in machine learning can be dealt with efficiently using CWFR methods. Libraries of components that can be easily orchestrated using FAIR Digital Objects as a common entity to document all actions and to exchange information between steps without the researcher needing to understand anything about PIDs and FDO details is probably the way to increase efficiency in repeating research workflows. As the Galaxy project indicates, the availability of supporting tools will be important to let researchers use these methods. Other as the Galaxy framework suggests, however, it would be necessary to include all steps necessary for doing a machine learning task including those that require human interaction and to document all phases with the help of structured FDOs.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"173-185"},"PeriodicalIF":1.3000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Canonical Workflow for Machine Learning Tasks\",\"authors\":\"Christophe Blanchi, B. Gebre, P. Wittenburg\",\"doi\":\"10.1162/dint_a_00124\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract There is a huge gap between (1) the state of workflow technology on the one hand and the practices in the many labs working with data driven methods on the other and (2) the awareness of the FAIR principles and the lack of changes in practices during the last 5 years. The CWFR concept has been defined which is meant to combine these two intentions, increasing the use of workflow technology and improving FAIR compliance. In the study described in this paper we indicate how this could be applied to machine learning which is now used by almost all research disciplines with the well-known effects of a huge lack of repeatability and reproducibility. Researchers will only change practices if they can work efficiently and are not loaded with additional tasks. A comprehensive CWFR framework would be an umbrella for all steps that need to be carried out to do machine learning on selected data collections and immediately create a comprehensive and FAIR compliant documentation. The researcher is guided by such a framework and information once entered can easily be shared and reused. The many iterations normally required in machine learning can be dealt with efficiently using CWFR methods. Libraries of components that can be easily orchestrated using FAIR Digital Objects as a common entity to document all actions and to exchange information between steps without the researcher needing to understand anything about PIDs and FDO details is probably the way to increase efficiency in repeating research workflows. As the Galaxy project indicates, the availability of supporting tools will be important to let researchers use these methods. Other as the Galaxy framework suggests, however, it would be necessary to include all steps necessary for doing a machine learning task including those that require human interaction and to document all phases with the help of structured FDOs.\",\"PeriodicalId\":34023,\"journal\":{\"name\":\"Data Intelligence\",\"volume\":\"4 1\",\"pages\":\"173-185\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2022-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1162/dint_a_00124\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/dint_a_00124","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

摘要：（1）工作流技术的现状与许多实验室使用数据驱动方法的实践之间存在巨大差距，（2）对FAIR原则的认识以及过去5年实践中缺乏变化。CWFR概念已经定义，旨在将这两个意图结合起来，增加工作流技术的使用并提高FAIR合规性。在本文描述的研究中，我们指出了如何将其应用于机器学习，现在几乎所有的研究学科都在使用机器学习，其众所周知的影响是严重缺乏可重复性和再现性。研究人员只有在能够有效工作且没有额外任务的情况下才会改变实践。一个全面的CWFR框架将是所有需要执行的步骤的保护伞，以对选定的数据收集进行机器学习，并立即创建一个全面且符合FAIR的文档。研究人员在这样一个框架的指导下，输入的信息可以很容易地共享和重用。使用CWFR方法可以有效地处理机器学习中通常需要的许多迭代。可以使用FAIR数字对象作为一个通用实体来轻松编排组件库，以记录所有操作并在步骤之间交换信息，而研究人员无需了解PID和FDO细节，这可能是提高重复研究工作流程效率的方法。正如银河项目所表明的那样，支持工具的可用性对于让研究人员使用这些方法至关重要。然而，正如银河系框架所建议的那样，有必要包括执行机器学习任务所需的所有步骤，包括需要人工交互的步骤，并在结构化FDO的帮助下记录所有阶段。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Canonical Workflow for Machine Learning Tasks

Abstract There is a huge gap between (1) the state of workflow technology on the one hand and the practices in the many labs working with data driven methods on the other and (2) the awareness of the FAIR principles and the lack of changes in practices during the last 5 years. The CWFR concept has been defined which is meant to combine these two intentions, increasing the use of workflow technology and improving FAIR compliance. In the study described in this paper we indicate how this could be applied to machine learning which is now used by almost all research disciplines with the well-known effects of a huge lack of repeatability and reproducibility. Researchers will only change practices if they can work efficiently and are not loaded with additional tasks. A comprehensive CWFR framework would be an umbrella for all steps that need to be carried out to do machine learning on selected data collections and immediately create a comprehensive and FAIR compliant documentation. The researcher is guided by such a framework and information once entered can easily be shared and reused. The many iterations normally required in machine learning can be dealt with efficiently using CWFR methods. Libraries of components that can be easily orchestrated using FAIR Digital Objects as a common entity to document all actions and to exchange information between steps without the researcher needing to understand anything about PIDs and FDO details is probably the way to increase efficiency in repeating research workflows. As the Galaxy project indicates, the availability of supporting tools will be important to let researchers use these methods. Other as the Galaxy framework suggests, however, it would be necessary to include all steps necessary for doing a machine learning task including those that require human interaction and to document all phases with the help of structured FDOs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊