TY - GEN
T1 - Acceleration of radar echo coherent accumulation system based on half-precision format and tensor core
AU - Wang, Luming
AU - Chen, Defeng
AU - Wang, Dongliang
AU - Wang, Chao
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The processing speed of radar echo coherent accumulation system is an important factor affecting the real-time performance of space target detection. In this paper, based on GPU V100, adopting the concept of half-precision and tensor core, we design the radar echo coherent accumulation system and achieve the acceleration effect. The design of the system includes optimizing the process of coherent accumulation system, designing the scaling coefficient and using tcFFT library to realize FFT with the method of WMMA. We use FP32, FPl6 and FP16tensor core to compare the speed of coherent accumulation system. In FP32 and FP16, we use CUFFT library to realize FFT operation, and in FP16tensor core, we call tcFFT library to realize FFT operation. Nsight Compute is used to test the speed. The test results show that: (a) The time of creating FFT plan in tcFFT is less than CUFFT. (b) In the case of single batch, FP16 achieves 1.18X-1.39X acceleration effect compared with FP32 in the whole coherent accumulation process; In the case of multiple batches, the parallel batch processing method is proposed, and in two-dimensional FFT, compared with FP16, FP16tensor core can achieve 2.23X-3.17X acceleration effect, in the whole phase-coherent accumulation process, it can achieve 1.54X-1.77X acceleration effect.
AB - The processing speed of radar echo coherent accumulation system is an important factor affecting the real-time performance of space target detection. In this paper, based on GPU V100, adopting the concept of half-precision and tensor core, we design the radar echo coherent accumulation system and achieve the acceleration effect. The design of the system includes optimizing the process of coherent accumulation system, designing the scaling coefficient and using tcFFT library to realize FFT with the method of WMMA. We use FP32, FPl6 and FP16tensor core to compare the speed of coherent accumulation system. In FP32 and FP16, we use CUFFT library to realize FFT operation, and in FP16tensor core, we call tcFFT library to realize FFT operation. Nsight Compute is used to test the speed. The test results show that: (a) The time of creating FFT plan in tcFFT is less than CUFFT. (b) In the case of single batch, FP16 achieves 1.18X-1.39X acceleration effect compared with FP32 in the whole coherent accumulation process; In the case of multiple batches, the parallel batch processing method is proposed, and in two-dimensional FFT, compared with FP16, FP16tensor core can achieve 2.23X-3.17X acceleration effect, in the whole phase-coherent accumulation process, it can achieve 1.54X-1.77X acceleration effect.
KW - GPU
KW - coherent accumulation system
KW - half precision
KW - tensor core
UR - http://www.scopus.com/inward/record.url?scp=85147690214&partnerID=8YFLogxK
U2 - 10.1109/IMCEC55388.2022.10019890
DO - 10.1109/IMCEC55388.2022.10019890
M3 - Conference contribution
AN - SCOPUS:85147690214
T3 - IMCEC 2022 - IEEE 5th Advanced Information Management, Communicates, Electronic and Automation Control Conference
SP - 990
EP - 995
BT - IMCEC 2022 - IEEE 5th Advanced Information Management, Communicates, Electronic and Automation Control Conference
A2 - Xu, Bing
A2 - Xu, Bing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference, IMCEC 2022
Y2 - 16 December 2022 through 18 December 2022
ER -