Alas our NVIDIA Titan V order didn't go through. Instead I am gearing up to run on an NVIDIA V100 GPU equipped server at paperspace.com [ This is not an endorsement, proceed at your own risk, your mileage may vary, and there are other cloud providers with GPU compute nodes.]
The first step after setting up a new machine instance and logging on is to install the necessary compilers and
If everything was set up correctly you should get a report about the available threading APIs available, and associated computed devices.
I haven't yet tried gained access to a V100 node, but on their NVIDIA Quadro M400 node I get the following:
The next step is to build and run one of the examples bundled with the OCCA distribution.
# find the OCCA addVectors example
# build example
# run example
If everything went ok you should see the OCCA available API and device info as well as extra output from the example. OCCA builds a compute kernel to add two vectors together on a compute device and the compilation stage outputs some information:
Note how OCCA does run time compilation and in this case the default build is for "Serial" so it uses gcc to build the kernel, then executes it on the CPU.
Next we install (sight) emacs and edit the main.cpp file to uncomment the line that selects a CUDA device when running OCCA and run again:
# install emacs
sudo apt-get install emacs24
# .. edit main.cpp with emacs
emacs -nw main.cpp
# now rebuild
This time we see output reflecting the need to do runtime compilation for CUDA:
Note how OCCA has detected which architecture to use when compiling the CUDA kernel.
Exercise: For your own entertainment you can edit main.cpp to select OpenCL mode (being careful to select device 0 on OpenCL platform 0 as indicated by occainfo) and run the code again but this time in OpenCL mode.
Upcoming: we will hopefully be able to provide some performance analysis of the NVIDIA V100 GPU using occaBench.