Rough-n-ready Roofline: Titan V edition
In this post we discuss rules of thumb for performance limiters when using shared memory in a CUDA compute kernel running on a Titan V - coincidentally the topic of my advanced GPU+FEM topics VT course lecture today. According to the Volta micro-architecture wiki entry, the Titan V card has the following characteristics: Theoretical device memory bandwidth of 652 GB/s. Using cudaMemcpy (from OCCA) we measure typical achievable memory bandwidth of 540GB/s [ note that for some