{"title":"Scientific Discovery and Rigor with ML","authors":"Abdul-Gafoor Mohamed, P. Mahanta","doi":"10.1109/MPCIT51588.2020.9350455","DOIUrl":null,"url":null,"abstract":"The evolution of Data Management Scenarios augmented by scientific discovery and rigor is apparent in the industry, judging by the sheer focus on it by analysts and others over the past couple of years. Machine Learning helps immensely playing its part in simplifying enterprise data landscapes, contributing to many aspects of Data Management. We see value in focusing on the Data Discovery and Data Quality aspects in this context, as enterprises these days have complex landscapes, with the average enterprise using more than 5 Cloud storages in addition to their on-prem data sources.A greater affinity for enterprise grade Machine Learning has created a significant pull for system design. This leads platforms towards capabilities like standard APIs for scaled-database queries and integration scenarios. This paper explores the integration of Machine Learning tools and customized libraries with any Cloud Platform for enhancing the stakeholders’ experience with Analytics. As far as concepts are concerned, we propose a hypothesis for scaling an existent platform to a community-based approach, which helps enable sharing of experimental iterations, ideally translating into industry specific solutions that should stay extremely reusable. The intent is to offer a data model flexible enough to handle diverse data scenarios, evaluating confidence scores for each of these. It should enable reproducible shared experiments with consistent evaluated scores, thereby easing the integration process through automated guidance. This paper will touch upon the good practices and architectural recommendations that need to be considered for general Machine Learning applications.","PeriodicalId":136514,"journal":{"name":"2020 Third International Conference on Multimedia Processing, Communication & Information Technology (MPCIT)","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Third International Conference on Multimedia Processing, Communication & Information Technology (MPCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MPCIT51588.2020.9350455","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The evolution of Data Management Scenarios augmented by scientific discovery and rigor is apparent in the industry, judging by the sheer focus on it by analysts and others over the past couple of years. Machine Learning helps immensely playing its part in simplifying enterprise data landscapes, contributing to many aspects of Data Management. We see value in focusing on the Data Discovery and Data Quality aspects in this context, as enterprises these days have complex landscapes, with the average enterprise using more than 5 Cloud storages in addition to their on-prem data sources.A greater affinity for enterprise grade Machine Learning has created a significant pull for system design. This leads platforms towards capabilities like standard APIs for scaled-database queries and integration scenarios. This paper explores the integration of Machine Learning tools and customized libraries with any Cloud Platform for enhancing the stakeholders’ experience with Analytics. As far as concepts are concerned, we propose a hypothesis for scaling an existent platform to a community-based approach, which helps enable sharing of experimental iterations, ideally translating into industry specific solutions that should stay extremely reusable. The intent is to offer a data model flexible enough to handle diverse data scenarios, evaluating confidence scores for each of these. It should enable reproducible shared experiments with consistent evaluated scores, thereby easing the integration process through automated guidance. This paper will touch upon the good practices and architectural recommendations that need to be considered for general Machine Learning applications.