Like many others, I'm trying to make my spare time somewhat productive during the quarantine. I figured it was time to get my head around a subject that I was curious about for a long time: GPU programming. If you Google this topic, you'll find that there are a few technologies that can help you to take advantage of your GPU. For the sake of everyone, I'll not go through a comparison of these technologies, as these are plentiful on-line. What I can say is why I chose to learn OpenCL instead of its counterparts.
As I read, OpenCL will often not give you the optimal performance when compared to others (e.g. CUDA) however, it is an open standard backed by dozens of big companies. It felt to me as something more general and openly adopted in the industry.
I'm not, by any means, an expert on the subject. However, I'd like to share what I learned while trying to use OpenCL in one of my projects.
[Khronos Group] Some examples of the extensive list of OpenCL backers. (https://www.khronos.org/opencl/)
OpenCL is not a library
As said by Khronos Group, the maintainers of OpenCL and other open standards such as OpenGL:
OpenCL™ (Open Computing Language) is an open, royalty-free standard for cross-platform, parallel programming of diverse accelerators found in supercomputers, cloud servers, personal computers, mobile devices, and embedded platforms.
That doesn't explain much apart from some generic commercial description. Let's break that down. OpenCL is a tool that will enable you to perform parallel general-purpose computing on GPUs or other compliant hardware accelerators. This includes moving data around, doing the actual work, and fetching the results.
OpenCL is an open specification, also called a standard. It defines, among other things, a programming language, a programming model, and a rationale. However, it makes no mention of how to implement those on the hardware or even software; that is fine, and that is why it can be considered a cross-platform standard. Application and hardware developers agree on an abstract interface where both their worlds will meet; yet another good use of decoupled systems.
Hardware manufacturers will only bother to implement hardware and publish drivers that are compliant to some version of the OpenCL spec. Meanwhile, application developers will gladly use the same software model proposed by the spec and that will be guaranteed to be run correctly on the hardware.
You probably noticed I mentioned drivers previously. Think of a driver as a hardware abstraction layer written in software with the goal of handling (on behalf of the user/developer) the operation intrinsics of a particular device. These drivers are most often implemented by the hardware manufacturers themselves and thus are proprietary. Because of that, different implementations of OpenCL, e.g. AMD vs. Nvidia, can produce different results; some drivers are more polished than others.
Being OpenCL an abstract standard, not only GPUs can be OpenCL compliant. Theoretically, anything that implements the OpenCL spec will run an OpenCL program. Heck, you can even develop your own OpenCL compliant hardware accelerator on an FPGA. The standard is free after all.
[Khronos Group] How OpenCL applications are organized (https://www.khronos.org/opencl/)
As an application developer, what does that all mean?
OpenCL has its own language but fear not as it is essentially a subset of C99. Since OpenCL 2.1, there's also an specification for another dialect based on a C++14 subset. If know C, you'll be fine.
OpenCL shines over data-parallel computationally intensive applications. If your code has a lot of data dependency or memory access, perhaps OpenCL will not give you a relevant performance boost.
The same code runs on any OpenCL device.
OpenCL code is compiled at runtime.
You'll have to model your problem to match OpenCL's programming model. Be careful when defining your data parallelism.
Next time we'll discuss OpenCL's programming model. Until then, happy coding!