{"title":"分布式系统上的TopK排序","authors":"Prarthana, N. Karamchandani","doi":"10.1145/3266276.3266280","DOIUrl":null,"url":null,"abstract":"Ranking has wide range of applications like social choice\\citeSC, recommendation systems\\citeRS, web search\\citeWS, crowd sourcing \\citeCS etc. \\textttTeraSort is a distributed algorithm, commonly used in systems like Hadoop MapReduce, for sorting large datasets. However, in most applications of interest we do not desire complete ordering of data, rather only a few items which have the highest ranks. In this paper we propose Coded Partial Sort to obtain partially sorted data from large datasets using distributed computing systems. We intend to find \\texttttopK ordered elements of a dataset by optimally utilizing servers in distributed network.\\\\ Coded Partial Sort modifies conventional \\textttTeraSort algorithm to remove data irrelevant for partial ordering and applies ideas of \"coding\" to improve run-time performance by significantly decreasing communication load of Uncoded Partial Sort\\citeUs. We empirically evaluate the performance of tCoded and Uncoded Partial Sort on Amazon EC2 clusters for experimental settings of interest.","PeriodicalId":365026,"journal":{"name":"Proceedings of the 2018 on Technologies for the Wireless Edge Workshop","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"TopK Ordering on Distributed Systems\",\"authors\":\"Prarthana, N. Karamchandani\",\"doi\":\"10.1145/3266276.3266280\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Ranking has wide range of applications like social choice\\\\citeSC, recommendation systems\\\\citeRS, web search\\\\citeWS, crowd sourcing \\\\citeCS etc. \\\\textttTeraSort is a distributed algorithm, commonly used in systems like Hadoop MapReduce, for sorting large datasets. However, in most applications of interest we do not desire complete ordering of data, rather only a few items which have the highest ranks. In this paper we propose Coded Partial Sort to obtain partially sorted data from large datasets using distributed computing systems. We intend to find \\\\texttttopK ordered elements of a dataset by optimally utilizing servers in distributed network.\\\\\\\\ Coded Partial Sort modifies conventional \\\\textttTeraSort algorithm to remove data irrelevant for partial ordering and applies ideas of \\\"coding\\\" to improve run-time performance by significantly decreasing communication load of Uncoded Partial Sort\\\\citeUs. We empirically evaluate the performance of tCoded and Uncoded Partial Sort on Amazon EC2 clusters for experimental settings of interest.\",\"PeriodicalId\":365026,\"journal\":{\"name\":\"Proceedings of the 2018 on Technologies for the Wireless Edge Workshop\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2018 on Technologies for the Wireless Edge Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3266276.3266280\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 on Technologies for the Wireless Edge Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3266276.3266280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Ranking has wide range of applications like social choice\citeSC, recommendation systems\citeRS, web search\citeWS, crowd sourcing \citeCS etc. \textttTeraSort is a distributed algorithm, commonly used in systems like Hadoop MapReduce, for sorting large datasets. However, in most applications of interest we do not desire complete ordering of data, rather only a few items which have the highest ranks. In this paper we propose Coded Partial Sort to obtain partially sorted data from large datasets using distributed computing systems. We intend to find \texttttopK ordered elements of a dataset by optimally utilizing servers in distributed network.\\ Coded Partial Sort modifies conventional \textttTeraSort algorithm to remove data irrelevant for partial ordering and applies ideas of "coding" to improve run-time performance by significantly decreasing communication load of Uncoded Partial Sort\citeUs. We empirically evaluate the performance of tCoded and Uncoded Partial Sort on Amazon EC2 clusters for experimental settings of interest.