The big data ecosystem at LinkedIn

Roshan Sumbaly, J. Kreps, Sam Shah
{"title":"The big data ecosystem at LinkedIn","authors":"Roshan Sumbaly, J. Kreps, Sam Shah","doi":"10.1145/2463676.2463707","DOIUrl":null,"url":null,"abstract":"The use of large-scale data mining and machine learning has proliferated through the adoption of technologies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn's Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data. In particular, we present our solutions to the ``last mile'' issues in providing a rich developer ecosystem. This includes easy ingress from and egress to online systems, and managing workflows as production processes. A key characteristic of our solution is that these distributed system concerns are completely abstracted away from researchers. For example, deploying data back into the online system is simply a 1-line Pig command that a data scientist can add to the end of their script. We also present case studies on how this ecosystem is used to solve problems ranging from recommendations to news feed updates to email digesting to descriptive analytical dashboards for our members.","PeriodicalId":87344,"journal":{"name":"Proceedings. ACM-SIGMOD International Conference on Management of Data","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"140","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. ACM-SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2463676.2463707","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 140

Abstract

The use of large-scale data mining and machine learning has proliferated through the adoption of technologies such as Hadoop, with its simple programming semantics and rich and active ecosystem. This paper presents LinkedIn's Hadoop-based analytics stack, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data. In particular, we present our solutions to the ``last mile'' issues in providing a rich developer ecosystem. This includes easy ingress from and egress to online systems, and managing workflows as production processes. A key characteristic of our solution is that these distributed system concerns are completely abstracted away from researchers. For example, deploying data back into the online system is simply a 1-line Pig command that a data scientist can add to the end of their script. We also present case studies on how this ecosystem is used to solve problems ranging from recommendations to news feed updates to email digesting to descriptive analytical dashboards for our members.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
领英的大数据生态系统
通过采用Hadoop等技术,大规模数据挖掘和机器学习的使用已经激增,Hadoop具有简单的编程语义和丰富而活跃的生态系统。本文介绍了LinkedIn基于hadoop的分析堆栈,它允许数据科学家和机器学习研究人员从大量数据中提取见解并构建产品功能。特别是,我们在提供丰富的开发者生态系统方面提出了我们对“最后一英里”问题的解决方案。这包括易于进出在线系统,以及将工作流作为生产流程进行管理。我们的解决方案的一个关键特征是这些分布式系统关注点完全从研究人员中抽象出来。例如,将数据部署回在线系统只是简单的一行Pig命令,数据科学家可以将其添加到脚本的末尾。我们还提供了案例研究,说明如何使用这个生态系统来解决各种问题,从推荐到新闻提要更新,从电子邮件消化到为我们的成员提供描述性分析仪表板。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Protecting Data Markets from Strategic Buyers XLJoins Convergence of Array DBMS and Cellular Automata: A Road Traffic Simulation Case Near-Optimal Distributed Band-Joins through Recursive Partitioning. Optimal Join Algorithms Meet Top-k.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1