Spike overlaps occur frequently in dense neuronal network recordings, creating difficulties for spike sorting. Brainmachine interfaces and in vivo studies of neuronal network dynamics often require that an accurate spike sorting be done in real time, with low execution latency (on the order of milliseconds). Moreover, modern neuronal recording systems that feature thousands of electrodes require processing of several tens or hundreds of neurons in parallel. The existing algorithms capable of performing spike overlap decomposition are generally very complex and unsuitable for real-time implementation, especially for an on-chip implementation. Here we present a hardware device capable of processing pair-wise spike overlaps in real time. A previously-published spike sorting algorithm, which is not suitable for processing data of large neuronal networks with low latency, has been optimized for high-throughput, low-latency hardware implementation. The designed hardware architecture has been verified on an FPGA platform. Low spike sorting error rates (0.05) for overlapping spikes have been achieved with a latency of 2.75 ms, rendering the system particularly suitable for use in closed-loop experiments.