RECENT PUBLICATIONS

See scholar.google for full bibliographic details.

Publications : Publications

DISCONTINUOUS GALERKIN DISCRETIZATIONS OF THE BOLTZMANN EQUATIONS IN 2D: SEMI-ANALYTIC TIME STEPPING AND ABSORBING BOUNDARY LAYERS

We present an efficient nodal discontinuous Galerkin method for approximating nearly incompressible flows using the Boltzmann equations. The equations are discretized with Hermite polynomials in velocity space yielding a first order conservation law. A stabilized unsplit perfectly matching layer (PML) formulation is introduced for the resulting nonlinear flow equations. The proposed PML equations exponentially absorb the difference between the nonlinear fluctuation and the prescribed mean flow. We introduce semi-analytic time discretization methods to improve the time step restrictions in small relaxation times. We also introduce a multirate semi-analytic Adams-Bashforth method which preserves efficiency in stiff regimes. Accuracy and performance of the method are tested using distinct cases including isothermal vortex, flow around square cylinder, and wall mounted square cylinder test cases.

arXiv.org preprint

AN ENTROPY STABLE DISCONTINUOUS GALERKIN METHOD FOR THE SHALLOW WATER EQUATIONS ON CURVILINEAR MESHES WITH WET/DRY FRONTS ACCELERATED BY GPUS

We extend the entropy stable high order nodal discontinuous Galerkin spectral element approximation for the non-linear two dimensional shallow water equations presented by Wintermeyer et al. [ Journal of Computational Physics, 340:200-242, 2017] with a shock capturing technique and a positivity preservation capability to handle dry areas. The scheme preserves the entropy inequality, is well-balanced and works on unstructured, possibly curved, quadrilateral meshes. For the shock capturing, we introduce an artificial viscosity to the equations and prove that the numerical scheme remains entropy stable. We add a positivity preserving limiter to guarantee non-negative water heights as long as the mean water height is non-negative. We prove that non-negative mean water heights are guaranteed under a certain additional time step restriction for the entropy stable numerical interface flux.

We implement the method on GPU architectures using the abstract language OCCA, a unified approach to multi-threading languages. We show that the entropy stable scheme is well suited to GPUs as the necessary extra calculations do not negatively impact the runtime up to reasonably high polynomial degrees (around N=7). We provide numerical examples that challenge the shock capturing and positivity properties of our scheme to verify our theoretical findings.

arXiv.org preprint

GPU ACCELERATION OF A HIGH-ORDER DISCONTINUOUS GALERKIN INCOMPRESSIBLE FLOW SOLVER

We present a GPU-accelerated version of a high-order discontinuous Galerkin discretization of the unsteady incompressible Navier-Stokes equations. The equations are discretized in time using a semi-implicit scheme with explicit treatment of the nonlinear term and implicit treatment of the split Stokes operators. The pressure system is solved with a conjugate gradient method together with a fully GPU-accelerated multigrid preconditioner which is designed to minimize memory requirements and to increase overall performance. A semi-Lagrangian subcycling advection algorithm is used to shift the computational load per timestep away from the pressure Poisson solve by allowing larger timestep sizes in exchange for an increased number of advection steps. Numerical results confirm we achieve the design order accuracy in time and space. We optimize the performance of the most time-consuming kernels by tuning the fine-grain parallelism, memory utilization, and maximizing bandwidth. To assess overall performance we present an empirically calibrated roofline performance model for a target GPU to explain the achieved efficiency. We demonstrate that, in the most cases, the kernels used in the solver are close to their empirically predicted roofline performance.

arXiv.org preprint

ACCELERATION OF TENSOR-PRODUCT OPERATIONS FOR HIGH-ORDER FINITE ELEMENT METHODS

This paper is devoted to GPU kernel optimization and performance analysis of three tensor-product operators arising in finite element methods. We provide a mathematical background to these operations and implementation details. Achieving close-to-the-peak performance for these operators requires extensive optimization because of the operators' properties: low arithmetic intensity, tiered structure, and the need to store intermediate results inside the kernel. We give a guided overview of optimization strategies and we present a performance model that allows us to compare the efficacy of these optimizations against an empirically calibrated roofline.

arXiv.org preprint

WEIGHT-ADJUSTED DISCONTINUOUS GALERKIN METHODS: WAVE PROPAGATION IN HETEROGENEOUS MEDIA

Time-domain discontinuous Galerkin (DG) methods for wave propagation require accounting for the inversion of dense elemental mass matrices, where each mass matrix is computed with respect to a parameter-weighted L2 inner product. In applications where the wavespeed varies spatially at a sub-element scale, these matrices are distinct over each element, necessitating additional storage. In this work, we propose a weight-adjusted DG (WADG) method which reduces storage costs by replacing the weighted L2 inner product with a weight-adjusted inner product. This equivalent inner product results in an energy stable method, but does not increase storage costs for locally varying weights. A-priori error estimates are derived, and numerical examples are given illustrating the application of this method to the acoustic wave equation with heterogeneous wavespeed.

arXiv preprint

WEIGHT-ADJUSTED DISCONTINUOUS GALERKIN METHODS: CURVILINEAR MESHES

Traditional time-domain discontinuous Galerkin (DG) methods result in large storage costs at high orders of approximation due to the storage of dense elemental matrices. In this work, we propose a weight-adjusted DG (WADG) methods for curvilinear meshes which reduce storage costs while retaining energy stability. A priori error estimates show that high order accuracy is preserved under sufficient conditions on the mesh, which are illustrated through convergence tests with different sequences of meshes. Numerical and computational experiments verify the accuracy and performance of WADG for a model problem on curved domains.

arXiv preprint

ON THE PENALTY STABILIZATION MECHANISM FOR UPWIND DISCONTINUOUS GALERKIN FORMULATIONS OF FIRST ORDER HYPERBOLIC SYSTEMS

Penalty fluxes are dissipative numerical fluxes for high order discontinuous Galerkin (DG) methods which depend on a penalization parameter. We investigate the dependence of the spectra of high order DG discretizations on this parameter, and show that as its value increases, the spectra of the DG discretization splits into two disjoint sets of eigenvalues. One set converges to the eigenvalues of a conforming discretization, while the other set corresponds to spurious eigenvalues which are damped proportionally to the parameter. Numerical experiments also demonstrate that undamped spurious modes present in both in the limit of zero and large penalization parameters are damped for moderate values of the upwind parameter.

arXiv.org preprint