Understanding Performance Differences of FPGAs and GPUs

Jason Cong, Zhenman Fang, Michael Lo, Hanrui Wang, Jingxian Xu, Shaochong Zhang
UCLA
(* indicates equal contribution)

News

Waiting for more news.

Awards

No items found.

Competition Awards

No items found.

Abstract

This paper aims to better understand the performance differences between FPGAs and GPUs. We intentionally begin with a widely used GPU-friendly benchmark suite, Rodinia, and port 15 of the kernels onto FPGAs using HLS C. Then we propose an analytical model to compare their performance. We find that for 6 out of the 15 ported kernels, today's FPGAs can provide comparable performance or even achieve better performance than the GPU, while consuming an average of 28% of the GPU power. Besides lower clock frequency, FPGAs usually achieve a higher number of operations per cycle in each customized deep pipeline, but lower effective parallel factor due to the far lower off-chip memory bandwidth. With 4x more memory bandwidth, 8 out of the 15 FPGA kernels are projected to achieve at least half of the GPU kernel performance.

Overview

Video

Citation

@INPROCEEDINGS{8457638,
 author={Cong, Jason and Fang, Zhenman and Lo, Michael and Wang, Hanrui and Xu, Jingxian and Zhang, Shaochong},
 booktitle={2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)},
 title={Understanding Performance Differences of FPGAs and GPUs},
 year={2018},
 volume={},
 number={},
 pages={93-96},
 keywords={Field programmable gate arrays;Pipelines;Graphics processing units;Kernel;Benchmark testing;Arrays;Bandwidth;FPGA;GPU;Analytical model;Performance comparison},
 doi={10.1109/FCCM.2018.00023}}

Media

No media articles found.

Acknowledgment

This work is funded by the Center for Domain-Specific Computing and Center for Future Architectures Research. We also thank Falcon Computing for open-sourcing the memory coalescing APIs.

Team Members