Wail Y. Alkowaileet, Sattam Alsubaiee, M. Carey, Chen Li, H. Ramampiaro, Phanwadee Sinthong, Xikui Wang
{"title":"End-to-End Machine Learning with Apache AsterixDB","authors":"Wail Y. Alkowaileet, Sattam Alsubaiee, M. Carey, Chen Li, H. Ramampiaro, Phanwadee Sinthong, Xikui Wang","doi":"10.1145/3209889.3209894","DOIUrl":null,"url":null,"abstract":"Recent developments in machine learning and data science provide a foundation for extracting underlying information from Big Data. Unfortunately, current platforms and tools often require data scientists to glue together and maintain custom-built platforms consisting of multiple Big Data component technologies. In this paper, we explain how Apache AsterixDB, an open source Big Data Management System, can help to reduce the burden involved in using machine learning algorithms in Big Data analytics. In particular, we describe how AsterixDB's built-in support for user-defined functions (UDFs), the availability of UDFs in data ingestion pipelines and queries, and the provision of machine learning platform and notebook inter-operation capabilities can together enable data analysts to more easily create and manage end-to-end analytical dataflows.","PeriodicalId":92710,"journal":{"name":"Proceedings of the Second Workshop on Data Management for End-to-End Machine Learning. Workshop on Data Management for End-to-End Machine Learning (2nd : 2018 : Houston, Tex.)","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second Workshop on Data Management for End-to-End Machine Learning. Workshop on Data Management for End-to-End Machine Learning (2nd : 2018 : Houston, Tex.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3209889.3209894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
Recent developments in machine learning and data science provide a foundation for extracting underlying information from Big Data. Unfortunately, current platforms and tools often require data scientists to glue together and maintain custom-built platforms consisting of multiple Big Data component technologies. In this paper, we explain how Apache AsterixDB, an open source Big Data Management System, can help to reduce the burden involved in using machine learning algorithms in Big Data analytics. In particular, we describe how AsterixDB's built-in support for user-defined functions (UDFs), the availability of UDFs in data ingestion pipelines and queries, and the provision of machine learning platform and notebook inter-operation capabilities can together enable data analysts to more easily create and manage end-to-end analytical dataflows.