Concurrent Cloud Computing: installing occaBench for V100
Overview: This week we have been experimenting with instances on Amazon AWS and Paperspace that come equipped with NVIDIA V100 GPUs. These GPUs are hot properties and not widely available, so we had to request special access to V100 equipped instances on both systems. Both AWS and Paperspace responded quickly to our requests. The Paperspace support team was also incredibly responsive, patient, and helpful getting through some minor technical issues. Note: this article is not an endorsement of these companies or their products, we are just providing an insight into our experience getting started on their systems. Your mileage may vary. In our experience both systems were very similar once the instances were provisioned.
Configuration: On AWS we set up a p3.2xlarge instance and on Paperspace we set up a V100 machine. In both cases we chose Ubuntu 16.04, for no other reason than familiarity with Ubuntu/Linux.
On the Paperspace system I was able to get basic dev tools, NVIDIA drivers, the NVIDIA CUDA SDK, and some bits and bobs installed with: # basics sudo apt-get update sudo apt-get install -y build-essential gcc make sudo apt-get install emacs24
# NVIDIA drivers and CUDA SDK:
sudo bash ./NVIDIA-Linux-x86_64-390.12.run
sudo bash ./cuda_9.1.85_387.26_linux
Note: Paperspace includes an in-browser terminal. This is incredibly convenient but eventually we switched to using ssh due to browser pasting issues.
Availability: In general the AWS instance starts almost immediately. On some occasions the Paperspace takes a little while to initiate, likely due to a smaller pool of available V100 cards.
Pricing: Paperspace ($2.30/hr) versus AWS ($3.06/hr) as of 02/18. These prices are just for basic configurations with a single V100 and I didn't dwell on spec comparisons. Storage incurs additional charges.
Operation: Once the instances are provisioned, they operate almost identically. For instance, installing OCCA is the same # install occa git clone https://github.com/libocca/occa cd occa # set up env variables # (eventually you should add this to your .bashrc) export PATH=$PATH:/usr/local/cuda/bin export OCCA_DIR=`pwd` export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$OCCA_DIR/lib export PATH=$PATH:$OCCA_DIR/bin # build OCCA make -j # print all available OCCA devices/platforms/thread models ./bin/occainfo
On the Paperspace instance we get this summary:
Benchmarking: we ran occaBench on both systems and unsurprisingly the results were similar. For details of the hybrid streaming/compute mixbench benchmark see references below. For the AWS result see previous blog entry.
The NVIDIA V100 PCI-E 16GB on the Paperspace instance has manufacturer peak spec of 7 TFLOPS (FP64) and 14 TFLOPS (FP32). The occaBench code running on this GPU in OCCA:CUDA mode on a vector of length 10,240,000 achieves the following performance:
Summary: both cloud providers are impressively easy to use and deliver similarly impressive performance. In a future entry we will describe the next steps we are taking to optimize our finite element codes for the specific core architecture of the V100.
Background: for the original mixbench micro-benchmarking that occaBench is based on see the following papers by Konstantinidis & Cotronis: Elias Konstantinidis, Yiannis Cotronis, "A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling", Journal of Parallel and Distributed Computing, Volume 107, 2017, Pages 37-56.
Konstantinidis, E., Cotronis, Y., "A Practical Performance Model for Compute and Memory Bound GPU Kernels", Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on , pp.651-658.