{"title":"Federated Aggregation of Mallows Rankings: A Comparative Analysis of Borda and Lehmer Coding","authors":"Jin Sima, Vishal Rana, Olgica Milenkovic","doi":"arxiv-2409.00848","DOIUrl":null,"url":null,"abstract":"Rank aggregation combines multiple ranked lists into a consensus ranking. In\nfields like biomedical data sharing, rankings may be distributed and require\nprivacy. This motivates the need for federated rank aggregation protocols,\nwhich support distributed, private, and communication-efficient learning across\nmultiple clients with local data. We present the first known federated rank\naggregation methods using Borda scoring and Lehmer codes, focusing on the\nsample complexity for federated algorithms on Mallows distributions with a\nknown scaling factor $\\phi$ and an unknown centroid permutation $\\sigma_0$.\nFederated Borda approach involves local client scoring, nontrivial\nquantization, and privacy-preserving protocols. We show that for $\\phi \\in\n[0,1)$, and arbitrary $\\sigma_0$ of length $N$, it suffices for each of the $L$\nclients to locally aggregate $\\max\\{C_1(\\phi), C_2(\\phi)\\frac{1}{L}\\log\n\\frac{N}{\\delta}\\}$ rankings, where $C_1(\\phi)$ and $C_2(\\phi)$ are constants,\nquantize the result, and send it to the server who can then recover $\\sigma_0$\nwith probability $\\geq 1-\\delta$. Communication complexity scales as $NL \\log\nN$. Our results represent the first rigorous analysis of Borda's method in\ncentralized and distributed settings under the Mallows model. Federated Lehmer\ncoding approach creates a local Lehmer code for each client, using a\ncoordinate-majority aggregation approach with specialized quantization methods\nfor efficiency and privacy. We show that for $\\phi+\\phi^2<1+\\phi^N$, and\narbitrary $\\sigma_0$ of length $N$, it suffices for each of the $L$ clients to\nlocally aggregate $\\max\\{C_3(\\phi), C_4(\\phi)\\frac{1}{L}\\log\n\\frac{N}{\\delta}\\}$ rankings, where $C_3(\\phi)$ and $C_4(\\phi)$ are constants.\nClients send truncated Lehmer coordinate histograms to the server, which can\nrecover $\\sigma_0$ with probability $\\geq 1-\\delta$. Communication complexity\nis $\\sim O(N\\log NL\\log L)$.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.00848","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Rank aggregation combines multiple ranked lists into a consensus ranking. In
fields like biomedical data sharing, rankings may be distributed and require
privacy. This motivates the need for federated rank aggregation protocols,
which support distributed, private, and communication-efficient learning across
multiple clients with local data. We present the first known federated rank
aggregation methods using Borda scoring and Lehmer codes, focusing on the
sample complexity for federated algorithms on Mallows distributions with a
known scaling factor $\phi$ and an unknown centroid permutation $\sigma_0$.
Federated Borda approach involves local client scoring, nontrivial
quantization, and privacy-preserving protocols. We show that for $\phi \in
[0,1)$, and arbitrary $\sigma_0$ of length $N$, it suffices for each of the $L$
clients to locally aggregate $\max\{C_1(\phi), C_2(\phi)\frac{1}{L}\log
\frac{N}{\delta}\}$ rankings, where $C_1(\phi)$ and $C_2(\phi)$ are constants,
quantize the result, and send it to the server who can then recover $\sigma_0$
with probability $\geq 1-\delta$. Communication complexity scales as $NL \log
N$. Our results represent the first rigorous analysis of Borda's method in
centralized and distributed settings under the Mallows model. Federated Lehmer
coding approach creates a local Lehmer code for each client, using a
coordinate-majority aggregation approach with specialized quantization methods
for efficiency and privacy. We show that for $\phi+\phi^2<1+\phi^N$, and
arbitrary $\sigma_0$ of length $N$, it suffices for each of the $L$ clients to
locally aggregate $\max\{C_3(\phi), C_4(\phi)\frac{1}{L}\log
\frac{N}{\delta}\}$ rankings, where $C_3(\phi)$ and $C_4(\phi)$ are constants.
Clients send truncated Lehmer coordinate histograms to the server, which can
recover $\sigma_0$ with probability $\geq 1-\delta$. Communication complexity
is $\sim O(N\log NL\log L)$.