This paper aims to better understand the performance differences between FPGAs and GPUs. We intentionally begin with a widely used GPU-friendly benchmark suite, Rodinia, and port 15 of the kernels onto FPGAs using HLS C. Then we propose an analytical model to compare their performance. We find that for 6 out of the 15 ported kernels, today's FPGAs can provide comparable performance or even achieve better performance than the GPU, while consuming an average of 28% of the GPU power. Besides lower clock frequency, FPGAs usually achieve a higher number of operations per cycle in each customized deep pipeline, but lower effective parallel factor due to the far lower off-chip memory bandwidth. With 4x more memory bandwidth, 8 out of the 15 FPGA kernels are projected to achieve at least half of the GPU kernel performance.
@INPROCEEDINGS{8457638,
author={Cong, Jason and Fang, Zhenman and Lo, Michael and Wang, Hanrui and Xu, Jingxian and Zhang, Shaochong},
booktitle={2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)},
title={Understanding Performance Differences of FPGAs and GPUs},
year={2018},
volume={},
number={},
pages={93-96},
keywords={Field programmable gate arrays;Pipelines;Graphics processing units;Kernel;Benchmark testing;Arrays;Bandwidth;FPGA;GPU;Analytical model;Performance comparison},
doi={10.1109/FCCM.2018.00023}}
This work is funded by the Center for Domain-Specific Computing and Center for Future Architectures Research. We also thank Falcon Computing for open-sourcing the memory coalescing APIs.