Concurrent Cloud Computing: running OCCAAlas our NVIDIA Titan V order didn't go through. Instead I am gearing up to run on an NVIDIA V100 GPU equipped server at paperspace.com ...
Limiting Performance: an interesting readThere is an interesting personal essay on the history and developments in the art and design of numerical schemes to limit spurious...
Portable Performance Profiling: occaBenchThe mixbench micro-benchmarking utility (available on github here and documented in the references below) is a tool for measuring the...
Spurious Solution Suppression: the Goldilocks upwind discontinuous Galerkin Time-domain methodThere are roughly three schools of thought about how much stabilization should be added to control the continuity of solutions obtained...
High-order Discontinuous Galerkin Simulations: is single precision enough?It is tempting to use 32 bit floating point arithmetic (FP32) on GPUs. Modern consumer grade cards from NVIDIA have theoretical peak...
Spherical Shear-flow SolverFirst flow from our new flow solver rendered using Paraview. Stay tuned for more details.
Dude Where's My FLOPS ?Computing the Mandelbrot fractal is seemingly a perfect application for the GPU. It is simple to state: iterate z = z^2 + c, starting...
Pascal Processor Powerhouse: the beer fridge sized clusterThis is the new Pascal GPU cluster hosted in the Math Department at VT. It consists of four compute nodes, each equipped with six NVIDIA...
Climbing the Pascal Performance Pyramid: 10 rules of thumbRules of thumb for optimizing FP64 CUDA kernels on the Pascal class NVIDIA P100 GPU [ numbers vary depending on the specific model, the...