Rough-n-Ready Roofline: NVIDIA V100 editionIn this post we discuss rules of thumb for performance limiters when using shared memory in a NVIDIA V100 CUDA compute kernel. The V100...
Concurrent Cloud Computing: installing occaBench for V100Overview: This week we have been experimenting with instances on Amazon AWS and Paperspace that come equipped with NVIDIA V100 GPUs....
Vaunted Volta Verified: initial comparison of the NVIDIA V100 & P100 GPUsWe created an Amazon EC2 instance with NVIDIA V100 GPU. We will discuss that process in more detail in a future posting. As usual this is...
CEED Code Competition: VT software releaseVT CEED BP Software Release: the VT Parallel Numerical Algorithms team has released GPU optimized implementations for the Center for...
Concurrent Cloud Computing: running OCCAAlas our NVIDIA Titan V order didn't go through. Instead I am gearing up to run on an NVIDIA V100 GPU equipped server at paperspace.com ...
Limiting Performance: an interesting readThere is an interesting personal essay on the history and developments in the art and design of numerical schemes to limit spurious...
Portable Performance Profiling: occaBenchThe mixbench micro-benchmarking utility (available on github here and documented in the references below) is a tool for measuring the...
Spurious Solution Suppression: the Goldilocks upwind discontinuous Galerkin Time-domain methodThere are roughly three schools of thought about how much stabilization should be added to control the continuity of solutions obtained...
High-order Discontinuous Galerkin Simulations: is single precision enough?It is tempting to use 32 bit floating point arithmetic (FP32) on GPUs. Modern consumer grade cards from NVIDIA have theoretical peak...
Spherical Shear-flow SolverFirst flow from our new flow solver rendered using Paraview. Stay tuned for more details.