The above movie is a visualization of a low Mach number gas kinetic simulation discretized with the D2Q9 Lattice Boltzmann Method solver (see LBM wiki). The domain was specified by an input PNG image. Technical details: the simulation code is available from github here. The code is accelerated using OCCA. The above movie consists of 2400 frames, output once every one hundred time steps. It took a few minutes to compute on a single NVIDIA GTX 1080 TI using the OCCA CUDA backe
From my teaching experience at Rice University and at Virginia Tech I have observed that computationally inclined students benefit from a structured introduction to a simple programming language (like C) and hands on experience using straight forward tools (LaTeX, git, gdb, valgrind) in tandem with modern parallel programming methodologies. The CMDA course Computer Science Foundations of Computational Modeling and Data Analytics (CMDA 3634) strives to bring these concepts, to
Most of our finite element codes are built using the Warp & Blend Lagrange elements. The node distribution of these elements are constructed using an edge warping and interior blending construction. The interior blending is tuned to optimize the node's interpolation property for up to degree 15 polynomial elements. T. Warburton, An Explicit Construction for Interpolation Nodes on the Simplex, Journal of Engineering Mathematics, Volume 56, Number 3, pp. 247-262, 2006, (paper)
The 2018 Argonne Training Program on Extreme Scale Computing is now accepting applications: link. This is an excellent opportunity to gain hands on experience in the latest programming models and tools for extreme scale computing.
We have just ordered a sample NVIDIA Titan V GPU. This GPU has more than 5000 FP32 compute cores and theoretically delivers more than 600GB/s of DEVICE memory bandwidth. The questions we want to answer: how much of the bandwidth is accessible and can we tune our compute kernels to fully exploit it. Titan V tech specs here.
The Parallel Numerical Algorithms @VT research group relies heavily on the OCCA library to develop truly portable multithreading code that can target CPUs and GPUs. The main idea is to write one driver code that liaises with the compute processor through the unified OCCA API. Run-time compilation is used to target a desired threading programming model. See http://libocca.org and the project repo at: https://github.com/libocca/occa