Hardware Accelerators for CNNs. Application to FPGA-Based Smart-Cameras

Research group on Embedded Architecture for Multisensing

Deep Convolutional Neural Networks (CNNs) have encountered a huge success in the last decade and became the de-facto standard in image classification and machine vision.

This success came at the price of a large computational cost as CNNs may process up to 30 billion operation in order to classify a single frame. As a result, implementing CNNs with real-time constraints is a challenging task. This challenge is addressed by exploiting the large amount of parallelism that CNNs exhibit, which usually requires dedicated hardware, such as high-end GPUs, to be effective.

However, GPUs are power-hungry devices that requires several hundreds of watts to support CNNs, preventing their implementation in low-energy embedded devices. As an alternative, FPGA platforms ally energy efficiency and hardware flexibility to support the fine-grain parallelism exhibited by CNNs. While FPGAs have provided superior energy efficiency (Performance/Watt) than GPUs, they have not been known for offering top performance.

Nonetheless, FPGA technologies are evolving rapidly. The upcoming generation of  14nm FPGAs embed a higher density of logic elements and natively supports computationally intensive workloads thanks to hardwired DSP blocks. Moreover, recent advances in neural networks developpement, such as dynamic fixed point arithmetic, binarization, neuron pruning or Single Value Decomposition add more sparsity to the CNN computations and introduce irregular parallelism in the CNN template. As FPGAs shine on irregular parallelism, these devices are as efficient as never to process CNNs.


In this context, algorithmic optimizations,  fine-grain tuning of the hardware architecture and exploration of the design space of FPGA parameters are jointly required to derive energy efficient FPGA based accelerators for CNNs. In addition,  dedicated software solutions are needed by the computer vision community to automatically deploy CNNs on FPGAs in a total abstraction of the hardware layer.


The four latter aspects constitute the main research topics I’m investigating in a PhD thesis since September 2015, under the supervision of Prof. François Berry and Ass.Prof. Maxime Pelcat.