February 26, 2018

When I started writing GPU code, I often heard that using shared memory is the only way to get good performance out of my code. As I kept diving more and more into CUDA programming and performance analysis, I understood that obtaining good performance on a GPU is a fa...

February 13, 2018

A new Titan V arrived at Virginia Tech today. Installation went relatively smoothly thanks to the patience of Bill Reilly. 

The Titan V differs from the NVIDIA Tesla V100 in a couple of significant ways. The Tesla V100 has peak bandwidth 900GB/s and L2 cache of 6GB...

February 8, 2018

In this post we discuss rules of thumb for performance limiters when using shared memory in a NVIDIA V100 CUDA compute kernel.

The V100 16GB PCI-E card has:

  1. Theoretical device memory bandwidth of 900GB/s. Using cudaMemcpy we measure achievable memory bandwidth of 790...

February 7, 2018

Overview: This week we have been experimenting with instances on Amazon AWS and Paperspace that come equipped with NVIDIA V100 GPUs. These GPUs are hot properties and not widely available, so we had to request special access to V100 equipped instances on both systems....

February 6, 2018

We created an Amazon EC2 instance with NVIDIA V100 GPU. We will discuss that process in more detail in a future posting. As usual this is not an endorsement of a particular cloud server provider or of a particular GPU model or manufacturer.

Running  occaBench with def...

February 1, 2018

VT CEED BP Software Release: the VT Parallel Numerical Algorithms team has released GPU optimized implementations for the Center for Efficient Exascale Discretization (CEED) bake-off competition on github here. The details are described in this report on arXi...

Please reload

Our Recent Posts

Please reload


Please reload


I'm busy working on my blog posts. Watch this space!

Please reload


225 Stanger St
Blacksburg, VA 24061