Basic GPU optimization strategies
When I started writing GPU code, I often heard that using shared memory is the only way to get good performance out of my code. As I kept diving more and more into CUDA programming and performance analysis, I understood that obtaining good performance on a GPU is a far more involved task. In this post I will discuss basic optimization strategies for the GPUs. Understanding memory and its hierarchy: In terms of speed, GPU memory can be organized as follows: Global memory + loc