{"title":"AutoBDA约束驱动的复杂性感知数据科学工作流","authors":"Akila Siriweera;Incheon Paik;Huawei Huang","doi":"10.1109/TBDATA.2023.3256043","DOIUrl":null,"url":null,"abstract":"The Internet of Things, privacy, and technical constraints increase the demand for edge-based data-driven services, which is one of the major goals of Industry 4.0 and Society 5.0. Big data analysis (BDA) is the preferred approach to unleash hidden knowledge. However, BDA consumes excessive resources and time. These limitations hamper the meaningful adoption of BDA, especially the time and situation critical edge use cases, and hinder the goals of Industry 4.0 and Society 5.0. Automating the BDA process at the edge is a cognitive approach to address the aforementioned concerns. Data science workflow is an indispensable challenge for successful automation. Therefore, we conducted a systematic literature survey on data science workflow platforms as the first contribution. Moreover, we learned that the BDA workflow depends on diversified constraints and undergoes rigorous data-mining stages. These caused an increase in the solution space, dynamic constraints, complexity issues, and NP-hardness of BDA workflow. Graphplan is a heuristic AI-planning technique that can address concerns associated with BDA workflow. Therefore, as the second contribution, we adopted the graphplan to generate a workflow for edge-based BDA automation. Experiments demonstrate that the proposed method achieved our objectives.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1438-1457"},"PeriodicalIF":7.5000,"publicationDate":"2023-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Constraint-Driven Complexity-Aware Data Science Workflow for AutoBDA\",\"authors\":\"Akila Siriweera;Incheon Paik;Huawei Huang\",\"doi\":\"10.1109/TBDATA.2023.3256043\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Internet of Things, privacy, and technical constraints increase the demand for edge-based data-driven services, which is one of the major goals of Industry 4.0 and Society 5.0. Big data analysis (BDA) is the preferred approach to unleash hidden knowledge. However, BDA consumes excessive resources and time. These limitations hamper the meaningful adoption of BDA, especially the time and situation critical edge use cases, and hinder the goals of Industry 4.0 and Society 5.0. Automating the BDA process at the edge is a cognitive approach to address the aforementioned concerns. Data science workflow is an indispensable challenge for successful automation. Therefore, we conducted a systematic literature survey on data science workflow platforms as the first contribution. Moreover, we learned that the BDA workflow depends on diversified constraints and undergoes rigorous data-mining stages. These caused an increase in the solution space, dynamic constraints, complexity issues, and NP-hardness of BDA workflow. Graphplan is a heuristic AI-planning technique that can address concerns associated with BDA workflow. Therefore, as the second contribution, we adopted the graphplan to generate a workflow for edge-based BDA automation. Experiments demonstrate that the proposed method achieved our objectives.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"9 6\",\"pages\":\"1438-1457\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2023-03-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10066508/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10066508/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Constraint-Driven Complexity-Aware Data Science Workflow for AutoBDA
The Internet of Things, privacy, and technical constraints increase the demand for edge-based data-driven services, which is one of the major goals of Industry 4.0 and Society 5.0. Big data analysis (BDA) is the preferred approach to unleash hidden knowledge. However, BDA consumes excessive resources and time. These limitations hamper the meaningful adoption of BDA, especially the time and situation critical edge use cases, and hinder the goals of Industry 4.0 and Society 5.0. Automating the BDA process at the edge is a cognitive approach to address the aforementioned concerns. Data science workflow is an indispensable challenge for successful automation. Therefore, we conducted a systematic literature survey on data science workflow platforms as the first contribution. Moreover, we learned that the BDA workflow depends on diversified constraints and undergoes rigorous data-mining stages. These caused an increase in the solution space, dynamic constraints, complexity issues, and NP-hardness of BDA workflow. Graphplan is a heuristic AI-planning technique that can address concerns associated with BDA workflow. Therefore, as the second contribution, we adopted the graphplan to generate a workflow for edge-based BDA automation. Experiments demonstrate that the proposed method achieved our objectives.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.