Aditya Auradkar, C. Botev, Shirshanka Das, Dave De Maagd, Alex Feinberg, Phanindra Ganti, L. Gao, B. Ghosh, K. Gopalakrishna, B. Harris, J. Koshy, Kevin Krawez, J. Kreps, Shih-Hui Lu, S. Nagaraj, N. Narkhede, S. Pachev, I. Perisic, Lin Qiao, Tom Quiggle, Jun Rao, Bob Schulman, Abraham Sebastian, Oliver Seeliger, Adam Silberstein, Boris Shkolnik, Chinmay Soman, Roshan Sumbaly, Kapil Surlaker, Sajid Topiwala, C. Tran, B. Varadarajan, Jemiah Westerman, Zach White, David Zhang, Jason Zhang
{"title":"Data Infrastructure at LinkedIn","authors":"Aditya Auradkar, C. Botev, Shirshanka Das, Dave De Maagd, Alex Feinberg, Phanindra Ganti, L. Gao, B. Ghosh, K. Gopalakrishna, B. Harris, J. Koshy, Kevin Krawez, J. Kreps, Shih-Hui Lu, S. Nagaraj, N. Narkhede, S. Pachev, I. Perisic, Lin Qiao, Tom Quiggle, Jun Rao, Bob Schulman, Abraham Sebastian, Oliver Seeliger, Adam Silberstein, Boris Shkolnik, Chinmay Soman, Roshan Sumbaly, Kapil Surlaker, Sajid Topiwala, C. Tran, B. Varadarajan, Jemiah Westerman, Zach White, David Zhang, Jason Zhang","doi":"10.1109/ICDE.2012.147","DOIUrl":null,"url":null,"abstract":"Linked In is among the largest social networking sites in the world. As the company has grown, our core data sets and request processing requirements have grown as well. In this paper, we describe a few selected data infrastructure projects at Linked In that have helped us accommodate this increasing scale. Most of those projects build on existing open source projects and are themselves available as open source. The projects covered in this paper include: (1) Voldemort: a scalable and fault tolerant key-value store, (2) Data bus: a framework for delivering database changes to downstream applications, (3) Espresso: a distributed data store that supports flexible schemas and secondary indexing, (4) Kafka: a scalable and efficient messaging system for collecting various user activity events and log data.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"64","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 28th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2012.147","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 64
Abstract
Linked In is among the largest social networking sites in the world. As the company has grown, our core data sets and request processing requirements have grown as well. In this paper, we describe a few selected data infrastructure projects at Linked In that have helped us accommodate this increasing scale. Most of those projects build on existing open source projects and are themselves available as open source. The projects covered in this paper include: (1) Voldemort: a scalable and fault tolerant key-value store, (2) Data bus: a framework for delivering database changes to downstream applications, (3) Espresso: a distributed data store that supports flexible schemas and secondary indexing, (4) Kafka: a scalable and efficient messaging system for collecting various user activity events and log data.