October 11, 2018

In this post we discuss rules of thumb for performance limiters when using shared memory in a CUDA compute kernel running on a Titan V - coincidentally the topic of my advanced GPU+FEM topics VT course lecture today.

According to the Volta micro-architecture wiki entry,...

September 26, 2018

Simulation: flow constrained to the surface of a sphere modeled using Galerkin-Boltzmann equations (Tölke et al '00) discretized with a tenth order  discontinuous Galerkin spectral elements in space and adaptive semi-analytic Runge-Kutta time stepping. See th...

September 18, 2018

Added ellipsoids to the set of objects supported by the simple Whitted style ray tracer [1] we developed for the CMDA 3634 course @Virginia Tech for Fall 2018. 

To detect collisions between sphere and ellipsoids we use a Newton based algorithm for finding the nearest p...

September 2, 2018

Capabilities of the Paranumal Accelerated Ray Tracer:

  • Whitted based ray transport (link).

  • Stack based multiple scattering.

  • GPU acceleration. 

  • Primitives: spheres, cylinders, cones, planes, triangles, disks. 

  • Field of view emulated using Monte Ca...

August 24, 2018

Adding more primitives and learning about the dreaded ray tracing "acne" caused by finite precision issues with computing ray-shape intersections. All rendering in the above movie is done in CUDA on a Titan V.  Unfortunately the youtube compression algorithm is a lit h...

August 22, 2018

This semester the students in CMDA 3634 @ Virginia Tech will be building up  ray tracing codes that runs with threaded using OpenMP, distributed with MPI, and/or accelerated with CUDA on GPUs.

Gearing up the basic ray tracer just to make sure I understand everythi...

May 18, 2018

Four VT undergraduates have joined the paranumal team as summer research assistants. From left to right: Nick Polidoro, Dallas Viar, Tulika Chaudhary, and Weichen Li.

They have all taken Computer Science Foundations for CMDA (CMDA 3634) and are using their GPU programmi...

May 18, 2018


Jesse Chan gave a colloquium talk in the Math Department @VT on novel entropy stable flux differencing discontinuous Galerkin formulations as described in his article.

February 13, 2018

A new Titan V arrived at Virginia Tech today. Installation went relatively smoothly thanks to the patience of Bill Reilly. 

The Titan V differs from the NVIDIA Tesla V100 in a couple of significant ways. The Tesla V100 has peak bandwidth 900GB/s and L2 cache of 6GB...

February 8, 2018

In this post we discuss rules of thumb for performance limiters when using shared memory in a NVIDIA V100 CUDA compute kernel.

The V100 16GB PCI-E card has:

  1. Theoretical device memory bandwidth of 900GB/s. Using cudaMemcpy we measure achievable memory bandwidth of 790...

Please reload

Our Recent Posts

Please reload


Please reload


I'm busy working on my blog posts. Watch this space!

Please reload


225 Stanger St
Blacksburg, VA 24061