Abstract: Communication overhead represents a primary bottleneck in distributed deep learning, impeding training scalability. Although existing gradient sparsification techniques reduce network ...