An Adapter for IBM Streams and Apache Spark to Facilitate Multi-level Data Analytics

Yinchen Shi, Sazia Mahfuz, F. Zulkernine, Peter Nicholls
{"title":"An Adapter for IBM Streams and Apache Spark to Facilitate Multi-level Data Analytics","authors":"Yinchen Shi, Sazia Mahfuz, F. Zulkernine, Peter Nicholls","doi":"10.1109/IEMCON51383.2020.9284818","DOIUrl":null,"url":null,"abstract":"Data analytics with unsupervised clustering of data streams has provided revolutionary breakthroughs in fields like healthcare, and E-commerce. IBM Streams and Apache Spark are among the most useful and popular data analytics tools that help engineers and researchers extend the abilities to store, analyze, transform, and visualize data for business use. IBM Streams is capable of ingesting, filtering, analyzing, and associating massive volumes of continuous data streams and the Streams Processing Language (SPL) enables coding custom stream graphs to process data and handle real-time events. Apache Spark has unified analytics edge for large scale data processing with high performance for both batch and streaming data. We developed adapters without using third party tools to facilitate data transfer between IBM Streams and Apache Spark to support new and legacy data analytic systems. An example use case would be IBM Streams ingesting and processing realtime data streams, and then passing the data to Spark to train or update machine learning algorithms in real time that can be re-deployed in the IBM Streams data processing pipeline. This paper provides an overview of the structure of the data processing pipeline, describes the implementation details and the principle behind the design.","PeriodicalId":6871,"journal":{"name":"2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)","volume":"2 1","pages":"0230-0235"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IEMCON51383.2020.9284818","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data analytics with unsupervised clustering of data streams has provided revolutionary breakthroughs in fields like healthcare, and E-commerce. IBM Streams and Apache Spark are among the most useful and popular data analytics tools that help engineers and researchers extend the abilities to store, analyze, transform, and visualize data for business use. IBM Streams is capable of ingesting, filtering, analyzing, and associating massive volumes of continuous data streams and the Streams Processing Language (SPL) enables coding custom stream graphs to process data and handle real-time events. Apache Spark has unified analytics edge for large scale data processing with high performance for both batch and streaming data. We developed adapters without using third party tools to facilitate data transfer between IBM Streams and Apache Spark to support new and legacy data analytic systems. An example use case would be IBM Streams ingesting and processing realtime data streams, and then passing the data to Spark to train or update machine learning algorithms in real time that can be re-deployed in the IBM Streams data processing pipeline. This paper provides an overview of the structure of the data processing pipeline, describes the implementation details and the principle behind the design.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一个适配器IBM流和Apache Spark,以促进多层次的数据分析
数据流无监督聚类的数据分析在医疗保健和电子商务等领域带来了革命性的突破。IBM Streams和Apache Spark是最有用和最流行的数据分析工具之一,它们可以帮助工程师和研究人员扩展存储、分析、转换和可视化数据的能力,以供业务使用。IBM Streams能够摄取、过滤、分析和关联大量连续数据流,并且Streams Processing Language (SPL)支持编写自定义流图来处理数据和处理实时事件。Apache Spark为大规模数据处理提供了统一的分析优势,对批处理和流数据都具有高性能。我们开发了适配器,而不使用第三方工具来促进IBM Streams和Apache Spark之间的数据传输,以支持新的和遗留的数据分析系统。一个示例用例是IBM Streams摄取和处理实时数据流,然后将数据传递给Spark以实时训练或更新机器学习算法,这些算法可以重新部署在IBM Streams数据处理管道中。本文概述了数据处理管道的结构,描述了数据处理管道的实现细节和设计原理。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Financial Time Series Stock Price Prediction using Deep Learning Development of a Low-cost LoRa based SCADA system for Monitoring and Supervisory Control of Small Renewable Energy Generation Systems A Systematic Literature Review in Causal Association Rules Mining Distance-Based Anomaly Detection for Industrial Surfaces Using Triplet Networks Analysis of Requirements for Autonomous Driving Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1