{"title":"Robust, Scalable, Real-Time Event Time Series Aggregation at Twitter","authors":"Peilin Yang, S. Thiagarajan, Jimmy J. Lin","doi":"10.1145/3183713.3190663","DOIUrl":null,"url":null,"abstract":"Twitter's data engineering team is faced with the challenge of processing billions of events every day in batch and in real time, and we have built various tools to meet these demands. In this paper, we describe TSAR (TimeSeries AggregatoR), a robust, scalable, real-time event time series aggregation framework built primarily for engagement monitoring: aggregating interactions with Tweets, segmented along a multitude of dimensions such as device, engagement type, etc. TSAR is built on top of Summingbird, an open-source framework for integrating batch and online MapReduce computations, and removes much of the tedium associated with building end-to-end aggregation pipelines---from the ingestion and processing of events to the publication of results in heterogeneous datastores. Clients are provided a query interface that powers dashboards and supports downstream ad hoc analytics.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"120 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3183713.3190663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Twitter's data engineering team is faced with the challenge of processing billions of events every day in batch and in real time, and we have built various tools to meet these demands. In this paper, we describe TSAR (TimeSeries AggregatoR), a robust, scalable, real-time event time series aggregation framework built primarily for engagement monitoring: aggregating interactions with Tweets, segmented along a multitude of dimensions such as device, engagement type, etc. TSAR is built on top of Summingbird, an open-source framework for integrating batch and online MapReduce computations, and removes much of the tedium associated with building end-to-end aggregation pipelines---from the ingestion and processing of events to the publication of results in heterogeneous datastores. Clients are provided a query interface that powers dashboards and supports downstream ad hoc analytics.