Monday, December 8, 2014

A type-safe definition for OpenCL's enqueue_kernel function

I want to share with you something that I initially thought it wouldn't work... but it does. No reason behind it, just to prove (once again) that C++11 is indeed fantastic and it can handle (almost) whatever you throw at it.

Since 1 year the OpenCL 2.0 standard was ratified. The thing which is most exciting for me is device-side enqueue. This is a functionality which allows a kernel to submit new work directly on the device without the need for host intervention.

However there is something fishy with the way the function is defined and I am going to explain why. The new enqueue_kernel function (defined in the OpenCL 2.0 language specification) has several overloads:
Ok, we like it. But wait for the next two ones:
Emh.. wait a second.. what? How many variadic arguments are in there?

The issue here are the two sets of "...". How many arguments should we accept? It is funny since the specs are not saying much about these additional arguments. However the only way to read this in a way that makes sense (in my understanding) is the following:
"If a closure function (or block) is passed which accepts N OpenCL "local" pointers, then their size is defined by an equal number of unsigned values (i.e., size0,..., sizeN-1). It is responsibility of the runtime support to allocate memory before the nested kernel is executed."
All of this to say that the length of the two variadic argument lists (the lambda's and the one internal to enqueue_kernel) must match. This means that it is responsibility of the compiler to perform this additional check. 

I can see many people being happy with this...  but couldn't we use the type system to enforce that? Can our beloved meta-programming fix this? Let's assume we were in C++? Would the API designer able to to express this concept (number of arguments in the closure equal number of arguments passed) just with the means of the type system? You will be glad to ear that with C++11, YES WE CAN! ...and I am going to show you how to do that.

For this example we use the sizes as input to the lambda (not for allocating device local memory as the actual implementation of enqueue_kernel is supposed to do). This is just a proof of concept, we are not interested in the actual implementation of OpenCL's device-side enqueue. Execution of the program will produce the following expected output:

> 10
> 20
> 30
> Calling closure
> Computed value: 60

This highlight the power of the ... expansion operator of C++11 variadic templates. For example if we try to call this function using an invalid number of sizes a compiler error will be generated:

@ThinkPad-X1-Carbon:~$ g++ -std=c++11 test.cpp 
test.cpp: In function ‘int main(int, char**)’:
test.cpp:25:10: error: too few arguments to function ‘int enqueue_kernel(std::function<void(Args ...)>, typename to_int<Args>::type ...) [with Args = {int, int, int}]’
    10, 20);
          ^
test.cpp:10:5: note: declared here
 int enqueue_kernel(std::function<void (Args... )> block, typename to_int<Args>::type... sizes)
     ^
And there you have it. A type-safe definition of OpenCL's enqueue_kernel using C++11. Just because in C++11 we can! Hate on that C lovers! :)

C++ <3