ProPruNN
The ProPruNN project aims to use structured pruning in a hardware-algorithm co-design methodology to improve hardware implementation of Convolutional Neural Networks (CNNs). The first sub-hypothesis of the project suggests that designing hardware architectures that take advantage of structured pruning leads to significant gains in latency, throughput, and power metrics. However, this may be complicated in more complex networks like ResNets and DenseNets, which require rearrangements after filter removal. The second sub-hypothesis is that it is possible to predict the performance of pruned networks in terms of accuracy and power, throughput, and latency metrics. The final sub-hypothesis is that prediction models can be used to design state-of-the-art networks and push the Pareto frontier of accuracy vs implementation performances of current literature. The project aims to improve the results of this method by using a finer and more efficient pruning approach.
Project framework
Deep Neural Networks (DNNs) have revolutionized image classification, object detection, and machine translation, but at the cost of increased complexity, power consumption, and greenhouse gas emissions. In the real world, it is needed to perform DNN inference on embedded systems with limited resources. To address these issues, two approaches are possible: hardware-independent and hardware-aware design. Hardware-independent design modifies the structure of the network to reduce the number of parameters and computations, while hardware-aware design uses quantization and pruning to reduce the number of bits used to represent parameters and data. Structured pruning, which removes larger structures such as whole neurons or convolution filters, is the pruning technique of interest in this project.
Objective
The ProPruNN project aims to design CNN accelerators on FPGAs that benefit from structured pruning to achieve significant gains in throughput, latency, and energy efficiency. After designing hardware architectures that support structured pruning, the project aims to measure the effect of pruning on accuracy, latency, and power consumption, and develop predictive models that link structured pruning with these metrics. The project also aims to compete with the literature by designing CNN-hardware architectures pairs that achieve low latency (<1ms) while maintaining acceptable accuracies. Finally, the project aims to design efficient neural networks that have a very low environmental impact and can be oriented towards a particular goal, such as energy consumption, during training using the previously developed predictors. The potential gains in terms of energy consumption could be as large as one order of magnitude for comparable accuracy.
Used method
The idea behind the ProPruNN project is to use structured pruning to do NAS. The advantage of doing this over classical approaches is that we can let the network decide its structure itself during learning. By incorporating the predictors we plan to create into the loss function, the network will optimize its structure itself according to a dual objective of accuracy and hardware performance (i.e., latency or power consumption).
To do this, several steps are necessary. First, it will be necessary to develop our own hardware
architectures of CNN accelerators. As mentioned in the literature review, two types of architectures are possible. On the one hand, recursive architectures, in which the hardware resources are shared between the different convolutional layers. The advantage of this architecture is that it can support any size of neural network, as long as it has a sufficient level of programmability, i.e. it is possible to tell them at the time of inference via a scheduler how their processing elements should be used and on what data. The other type corresponds to pipelined architectures. This time the hardware architecture of the accelerator depends on the hyperparameters of the network. It would then be a question of CNN - hardware architectures pairs whose synthesis must be automated. In the framework of the ProPruNN project, such architectures are not possible. Indeed, we want to create datasets linking CNNs with their hardware performance metrics (latency, energy consumption), on which we intend to perform learning tasks. To perform learning tasks, we need a large enough dataset (hundreds at least). Creating such a dataset for a limited number of a few programmable recursive architectures is possible: the synthesis of these architectures will take a few days for each, on which we will be able to evaluate by simulation the execution of a large number of different CNNs. If it were necessary on the contrary to generate several hundreds of pipelined CNN - hardware architectures pairs, the time needed would be unmanageable (several hundreds of days of synthesis). A second reason why we choose to focus on these recursive architectures is also that the most advanced works in the literature on the ImageNet dataset also use them. And the last reason is that recursive architectures are simpler to size to use the maximum in a given FPGA, unlike pipelined architectures whose structure depends on the considered CNN.
The second step is the creation of predictive models of the latency and energy consumption of the circuit. To do so, we will use the product of the previous step, i.e. the hardware architectures and the tools allowing to execute the CNNs, in order to create a dataset associating a CNN pruned in a structured way with its hardware performances (latency, power consumption). Then, we will compare approaches to predict performances: an expert one, based on some hardware metrics to predict the latency or the power consumption of the CNN on a given hardware, and a learning-based one. We will then compare both these approaches on the considered dataset. The goal will then be that the network, during training, shapes its own structure without degrading its accuracy.
The expected outcomes
The ProPruNN project addresses the societal problem of machine learning's ecological impact by reducing the complexity of networks through pruning and designing neural networks with low energy consumption goals. The proposed technique involves a single training session that leads to an efficient network structure, unlike the CO2-intensive approach in the hardware-aware NAS literature. The industrial demand for machine learning solutions requires trade-offs between accuracy, latency, and energy consumption. The ProPruNN project aligns with this demand by integrating energy consumption and latency criteria into the network's design and hardware architecture.
School's role
IMT Atlantique is a well established research center, which facilitates interdisciplinary collaborations and partnerships with other institutions and organizations. It will provide the necessary infrastructure, resources, and expertise to enable researchers to carry out their work.
Partners
Various existing industry collaborations with companies such as Stellantis, OSO AI, and GoodFloow may benefit from the expertise developed in the ProPruNN project to build neural networks suited for their hardware targets. FPGAs are a competitive target for low latency and energy consumption.
The ProPruNN project will also strengthen my international academic collaborations. First, I am in contact with Yvon Savaria from Polytechnique Montreal, who is in connection with the IVADO institute, for which he proposes solutions for hardware implementations of neural networks. As such, the original approach of the ProPruNN project is of great interest to him, since it faces the same challenges of neural network embeddability. Another collaboration is planned with the University of Tampere in Finland, with whom we are exchanging students on these kind of subjects.