Concurrent Cloud Computing: installing occaBench for V100

February 7, 2018

Overview: This week we have been experimenting with instances on Amazon AWS and Paperspace that come equipped with NVIDIA V100 GPUs. These GPUs are hot properties and not widely available, so we had to request special access to V100 equipped instances on both systems. Both AWS and Paperspace responded quickly to our requests. The Paperspace support team was also incredibly responsive, patient, and helpful getting through some minor technical issues. 

Note: this article is not an endorsement of these companies or their products, we are just providing an insight into our experience getting  started on their systems. Your mileage may vary. In our experience both systems were very similar once the instances were provisioned.


Configuration: On AWS we set up a p3.2xlarge instance and on Paperspace we set up a V100 machine. In both cases we chose Ubuntu 16.04, for no other reason than familiarity with Ubuntu/Linux.

 

On the Paperspace system I was able to get basic dev tools, NVIDIA drivers, the NVIDIA CUDA SDK, and some bits and bobs installed with:

# basics
sudo apt-get update
sudo apt-get install -y build-essential gcc make
sudo apt-get install emacs24

 

# NVIDIA drivers and CUDA SDK:

# (https://developer.nvidia.com/cuda-downloads)                                            

wget http://us.download.nvidia.com/tesla/390.12/NVIDIA-Linux-x86_64-390.12.run

 

sudo bash ./NVIDIA-Linux-x86_64-390.12.run

 

wget https://developer.nvidia.com/compute/cuda/9.1/Prod/local_installers/cuda_9.1.85_387.26_linux

 

sudo bash ./cuda_9.1.85_387.26_linux

 

Note: Paperspace includes an in-browser terminal. This is incredibly convenient but eventually we switched to using ssh due to browser pasting issues. 
 

Availability: In general the AWS instance starts almost immediately. On some occasions the Paperspace takes a little while to initiate, likely due to a smaller pool of available V100 cards. 

 

Pricing:  Paperspace ($2.30/hr) versus AWS ($3.06/hr) as of 02/18. These prices are just for basic configurations with a single V100 and I didn't dwell on spec comparisons. Storage incurs additional charges.
 

Operation: Once the instances are provisioned, they operate almost  identically.  For instance, installing OCCA is the same

# install occa
git clone https://github.com/libocca/occa
cd occa
# set up env variables
# (eventually you should add this to your .bashrc)

export PATH=$PATH:/usr/local/cuda/bin
export OCCA_DIR=`pwd`
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$OCCA_DIR/lib
export PATH=$PATH:$OCCA_DIR/bin

# build OCCA
make -j

# print all available OCCA devices/platforms/thread models
./bin/occainfo

 

On the Paperspace instance we get this summary:

 

Benchmarking: we ran occaBench on both systems and unsurprisingly the results were similar. For details of the hybrid streaming/compute mixbench benchmark see references below. For the AWS result see previous blog entry.

 

The NVIDIA V100 PCI-E 16GB on the Paperspace instance has manufacturer peak spec of 7 TFLOPS  (FP64) and 14 TFLOPS (FP32). The occaBench code running  on this GPU in OCCA:CUDA mode on a vector of length 10,240,000  achieves the following performance:

 

 

Summary: both cloud providers are impressively easy to use and deliver similarly impressive performance. In a future entry we will describe the next steps we are taking to optimize our finite element codes for the specific core architecture of the V100.
 

Background: for the original mixbench micro-benchmarking that occaBench is based on see the following papers by Konstantinidis & Cotronis:

Elias Konstantinidis, Yiannis Cotronis, "A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling", Journal of Parallel and Distributed Computing, Volume 107, 2017, Pages 37-56.

 

Konstantinidis, E., Cotronis, Y., "A Practical Performance Model for Compute and Memory Bound GPU Kernels", Parallel, Distributed and Network-Based Processing (PDP), 2015 23rd Euromicro International Conference on , pp.651-658.

 

 

 

Please reload

Our Recent Posts

Please reload

Archive

Please reload

Tags

I'm busy working on my blog posts. Watch this space!

Please reload

 

225 Stanger St
Blacksburg, VA 24061
USA.

©2018 BY THE PARALLEL NUMERICAL ALGORITHMS  RESEARCH GROUP @VT.