January 25, 2018

Alas our NVIDIA Titan V order didn't go through. Instead I am gearing up to run on an NVIDIA V100 GPU equipped server at paperspace.com  [ This is not an endorsement, proceed at your own risk, your mileage may vary, and there are other cloud providers with GPU compute...

January 22, 2018

There is an interesting personal essay on the history and developments in the art and design of numerical schemes to limit spurious oscillations in solutions of nonlinear PDEs even in the presence of shocks here.

I will echo one point that the Sweby diagram (...

January 19, 2018

The mixbench micro-benchmarking utility (available on github here and documented in the references below) is a tool for measuring the data throughput and computational throughput of a mixed streaming & compute workload. We have found it to be extremely useful for...

January 16, 2018

There are roughly three schools of thought about how much stabilization should be added to control the continuity of solutions obtained through discontinuous Galerkin discretizations of time-dependent linear wave problems (e.g. acoustics, electromagnetics, linear elast...

January 12, 2018

It is tempting to use 32 bit floating point arithmetic (FP32) on GPUs. Modern consumer grade cards from NVIDIA have theoretical peak performance of 10 TFLOPS. However, we do have to be careful about when it is safe to use single precision in high-order calculations....

January 11, 2018

First flow from our new flow solver rendered using Paraview.

Stay tuned for more details.

January 6, 2018

 Produced using the FRAX app (link).

January 4, 2018

Computing the Mandelbrot fractal is seemingly a perfect application for the GPU. It is simple to state: iterate z = z^2 + c, starting with z=0 for a set of values c drawn from the complex plane (wiki). The number of iterations it takes for |z| to exceed 4 is recor...

January 3, 2018

This is the new Pascal GPU cluster hosted in the Math Department at VT. It consists of four compute nodes, each equipped with six NVIDIA GTX 1080 TI GPUs. Each GPU has a nominal peak FP32 performance of approximately 10 TFLOPS/s (link) hence the cluster can potent...

January 1, 2018

Rules of thumb for optimizing FP64 CUDA kernels on the Pascal class NVIDIA P100 GPU [ numbers vary depending on the specific model, the following are measured on the 12GB PCI-Express version ]

Rule #1 - device global memory bandwidth is limited

The device memory bandwi...

Please reload

Our Recent Posts

Please reload


Please reload


I'm busy working on my blog posts. Watch this space!

Please reload


225 Stanger St
Blacksburg, VA 24061