You’re probably wondering what an FPGA is and what it can offer you. Spending hours reading all the documentation and specifications might seem really boring, but still — you’re very interested in this topic… Don’t worry! Today is your lucky day, as we’ve already spent all these hours digging into this cutting-edge technology in order to present you the ultimate guide on everything that you may want to know related to the industry of adaptable processing units.
Let’s start with some basic definitions. “FPGA” is an abbreviation for “field programmable gate array.” To compare, FPGA comes from the same family as ASIC, GPU, and CPU microchips. All of these are needed to compute some basic (or advanced) logical and mathematical operations. Let us give you a short comparison of these 4 technologies.
ASIC (application-specific integrated circuit) is a microchip that can be used for executing one particular primitive logical or mathematical operation. Its configuration and the operation that it will be able to execute is defined at the moment of its production. This is the cheapest and the most simple type of processing unit.
CPU (central processing unit) is more powerful. It can execute many different and truly advanced operations. However, as these operations are so complex, it can execute only a few of them simultaneously, and all of them require many logical blocks on the microchip. This is a really nice choice when you want to do many things using only one chip!
GPU (graphics processing unit) is a microchip that combines many identical units that can simultaneously perform some simple operations. It’s really powerful due to its ability to parallel processes.
Finally, FPGA is a microchip that is able to execute several logical or mathematical operations. It sounds quite similar to ASIC, but it is completely different, as the list of operations that can be executed by an FPGA can be redefined using a specific programming language right from the computer! That means its computation power is not limited to the list of operations defined during the production process.
In “Figure 1,” you can see the pictures depicting the differences between different processing units.
As you can see, an FPGA can be seen as a next generation of processing units, because it can parallel operations while changing the list of the functions it can execute.
Anyone who’s tried to find some information about FPGA has definitely encountered another interesting name — “accelerator card.” Accelerator cards are collections of FPGA blocks which can perform the same or different operations at the same time. They can be considered as a replacement for a GPU card, for example.
Xilinx offers various types of accelerator cards for different needs. For example, there are Zynq, Virtex, and Alveo series which can be used for edge, embedded, and cloud applications.
Now that we’ve gone over the general idea and shown examples of these cards, let’s move to the most interesting part — how we actually program the operations. At this point you might get the urge to stop reading because programming a microchip yourself doesn’t sound like a great idea, but don’t fear! Xilinx experts have already come up with the most popular and the most needed configurations that anyone can use. They are called IPs (intellectual property) and they represent a set of operations which will boost the performance of a model created for particular needs when started on one of Xilinx devices.
A collection of such IPs on a device is called DPU. It stands for deep-learning processor unit. And what if not deep learning is the reason why we are all here? There are many DPU configurations predefined by Xilinx with different names. In “Figure 3,” there is an example of such a configuration. It is called DPUCADX8G. You might be wondering what these letters mean. Let’s examine that in “Figure 4.” As you can see, Xilinx provides a description of what each letter represents. In our case, the name of a DPU stands for a deep-learning processing unit optimized for CNN, which can work with Alveo DDR, uses 32 to 8 quantization, and is designed for general purposes.
Every IP (and DPU consequently) has its own set of the supported operations that can be checked at any time in the documentation (which can be found here). Beware also of the fact that not all cards accept all DPUs, so you should check it for your device in the documentation!
Now that you’ve found out that Xilinx has already made everything related to the hardware to facilitate the process of running the models, you probably want to know how to actually do it using all this power. So, the answer is using Vitis AI!
Vitis Ai is a platform for software (our case) and hardware development which was introduced in 2020 by Xilinx. It can automatically adapt the hardware configuration and usage according to the user’s purposes. It has several important parts inside of it.
The first one is a model zoo where the popular DL models are implemented on caffe/pytorch/tensorflow and optimized for usage with an FPGA. In the model zoo you can find various models, for example, for computer vision. If you opened the link after we mentioned it, you may ask yourself what these nice names mean. The scheme for naming the models is pretty simple and it follows this pattern: framework + model + dataset + height + width + pruning ratio + computation + version of Vitis AI.
The second important part is a pipeline which precedes the processing. It includes model quantization step, model optimization (optional), and model compilation.
A quantizer is used to quantize the weights of a pre-trained model and tune its activations using the small calibration set. The second part is needed to reach the same level of performance as it was with a full model.
An optimizer, being an optional step, prunes a pre-trained model which comes from the quantizer, which basically means that it reduces the total number of weights and connections inside the model without drastically changing its performance.
The last and the most important step is the model’s compilation. It takes a pre-trained model written on one of the 3 popular frameworks and converts into an xmodel (a model which uses a xilinx runtime library for compilation). This is the most important step as this is where the magic happens and the usage of an FPGA starts. After this, this model goes to the compilation step and then to the execution.
If after the execution you want to check the load of the device and how much time and memory it took to compute each operation, you should definitely check the Vitis AI Profiler, as it gives exactly this info!
This is all we have for now about the FPGA usage for everyday purposes! We hope that now you understand a bit more about the fascinating world of accelerator cards. If you have any further questions, we’ll be glad to answer them, so just drop us a line.