GPU-based Low-precision Detection Approach for Massive MIMO Systems

GPU-based Low-precision Detection Approach for Massive MIMO Systems

Tuesday, May 23, 2023 9:50 AM to 10:15 AM · 25 min. (Europe/Berlin)
Hall F - 2nd Floor
Research Paper
Industrial Use Cases of HPC, ML and QCMixed Precision Algorithms

Information

Massive Multiple-Input Multiple-Output (M-MIMO) uses hundreds of antennas in mobile communications basestations to increase the amount of transmitted data and the number of connected devices in 5G and beyond. However, this increases the complexity of recovering the transmitted data (detection phase). To address this challenge, we leverage low-precision arithmetic in recent NVIDIA GPUs to improve the latency/scalability/accuracy of M-MIMO detection. We proposed a GPU tree-based detection algorithm that aggregates multiple tree levels and formulates the computation as a matrix multiplication operation followed by a square-norm calculation and sorting phase. This process is repeated until reaching the last level of the detection tree. The results show near-optimal data detection with a 10x speedup compared to a two-socket 28-core IceLake CPU implementation. We further deploy low-precision arithmetic operations. We show that moving from single-precision 32-bit floating-point arithmetic (FP32) to half-precision 16-bit representation (FP16) does not affect the accuracy performance while translating into an additional 1.7x speedup. In addition, exploiting 8-bit integer representation results in an acceptable error rate degradation that can be compensated by increasing the number of aggregated levels. In addition, we propose a multi-GPU version that computes the matrix-multiplication operation of subsequent iterations in parallel. This latter operation represents more than 80% of the elapsed time for dense constellations. Results with four A100 GPUs show an additional 2.3x relative speedup compared to our single GPU version. The achieved accuracy/scalability balance may accelerate the deployment of this technology and promote low-precision GPU computations within the wireless communication community.
Contributors:
Format
On-siteOn Demand
Beginner Level
20%
Intermediate Level
55%
Advanced Level
25%