一个适配器IBM流和Apache Spark，以促进多层次的数据分析

2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON) Pub Date : 2020-11-04 DOI:10.1109/IEMCON51383.2020.9284818

Yinchen Shi, Sazia Mahfuz, F. Zulkernine, Peter Nicholls

{"title":"一个适配器IBM流和Apache Spark，以促进多层次的数据分析","authors":"Yinchen Shi, Sazia Mahfuz, F. Zulkernine, Peter Nicholls","doi":"10.1109/IEMCON51383.2020.9284818","DOIUrl":null,"url":null,"abstract":"Data analytics with unsupervised clustering of data streams has provided revolutionary breakthroughs in fields like healthcare, and E-commerce. IBM Streams and Apache Spark are among the most useful and popular data analytics tools that help engineers and researchers extend the abilities to store, analyze, transform, and visualize data for business use. IBM Streams is capable of ingesting, filtering, analyzing, and associating massive volumes of continuous data streams and the Streams Processing Language (SPL) enables coding custom stream graphs to process data and handle real-time events. Apache Spark has unified analytics edge for large scale data processing with high performance for both batch and streaming data. We developed adapters without using third party tools to facilitate data transfer between IBM Streams and Apache Spark to support new and legacy data analytic systems. An example use case would be IBM Streams ingesting and processing realtime data streams, and then passing the data to Spark to train or update machine learning algorithms in real time that can be re-deployed in the IBM Streams data processing pipeline. This paper provides an overview of the structure of the data processing pipeline, describes the implementation details and the principle behind the design.","PeriodicalId":6871,"journal":{"name":"2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)","volume":"2 1","pages":"0230-0235"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Adapter for IBM Streams and Apache Spark to Facilitate Multi-level Data Analytics\",\"authors\":\"Yinchen Shi, Sazia Mahfuz, F. Zulkernine, Peter Nicholls\",\"doi\":\"10.1109/IEMCON51383.2020.9284818\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data analytics with unsupervised clustering of data streams has provided revolutionary breakthroughs in fields like healthcare, and E-commerce. IBM Streams and Apache Spark are among the most useful and popular data analytics tools that help engineers and researchers extend the abilities to store, analyze, transform, and visualize data for business use. IBM Streams is capable of ingesting, filtering, analyzing, and associating massive volumes of continuous data streams and the Streams Processing Language (SPL) enables coding custom stream graphs to process data and handle real-time events. Apache Spark has unified analytics edge for large scale data processing with high performance for both batch and streaming data. We developed adapters without using third party tools to facilitate data transfer between IBM Streams and Apache Spark to support new and legacy data analytic systems. An example use case would be IBM Streams ingesting and processing realtime data streams, and then passing the data to Spark to train or update machine learning algorithms in real time that can be re-deployed in the IBM Streams data processing pipeline. This paper provides an overview of the structure of the data processing pipeline, describes the implementation details and the principle behind the design.\",\"PeriodicalId\":6871,\"journal\":{\"name\":\"2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)\",\"volume\":\"2 1\",\"pages\":\"0230-0235\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IEMCON51383.2020.9284818\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEMCON51383.2020.9284818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

数据流无监督聚类的数据分析在医疗保健和电子商务等领域带来了革命性的突破。IBM Streams和Apache Spark是最有用和最流行的数据分析工具之一，它们可以帮助工程师和研究人员扩展存储、分析、转换和可视化数据的能力，以供业务使用。IBM Streams能够摄取、过滤、分析和关联大量连续数据流，并且Streams Processing Language (SPL)支持编写自定义流图来处理数据和处理实时事件。Apache Spark为大规模数据处理提供了统一的分析优势，对批处理和流数据都具有高性能。我们开发了适配器，而不使用第三方工具来促进IBM Streams和Apache Spark之间的数据传输，以支持新的和遗留的数据分析系统。一个示例用例是IBM Streams摄取和处理实时数据流，然后将数据传递给Spark以实时训练或更新机器学习算法，这些算法可以重新部署在IBM Streams数据处理管道中。本文概述了数据处理管道的结构，描述了数据处理管道的实现细节和设计原理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Adapter for IBM Streams and Apache Spark to Facilitate Multi-level Data Analytics

Data analytics with unsupervised clustering of data streams has provided revolutionary breakthroughs in fields like healthcare, and E-commerce. IBM Streams and Apache Spark are among the most useful and popular data analytics tools that help engineers and researchers extend the abilities to store, analyze, transform, and visualize data for business use. IBM Streams is capable of ingesting, filtering, analyzing, and associating massive volumes of continuous data streams and the Streams Processing Language (SPL) enables coding custom stream graphs to process data and handle real-time events. Apache Spark has unified analytics edge for large scale data processing with high performance for both batch and streaming data. We developed adapters without using third party tools to facilitate data transfer between IBM Streams and Apache Spark to support new and legacy data analytic systems. An example use case would be IBM Streams ingesting and processing realtime data streams, and then passing the data to Spark to train or update machine learning algorithms in real time that can be re-deployed in the IBM Streams data processing pipeline. This paper provides an overview of the structure of the data processing pipeline, describes the implementation details and the principle behind the design.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)

自引率

0.00%

发文量