Logo Proprunn

ProPruNN

Pruning neural networks
Projet ANR JCJC
Approval no 22-CE25-0006-01
Start: 2023
End: 2026

The ProPruNN project aims to use structured pruning in a  hardware-algorithm co-design methodology to improve hardware  implementation of Convolutional Neural Networks (CNNs). The first  sub-hypothesis of the project suggests that designing hardware  architectures that take advantage of structured pruning leads to  significant gains in latency, throughput, and power metrics. However,  this may be complicated in more complex networks like ResNets and  DenseNets, which require rearrangements after filter removal. The second  sub-hypothesis is that it is possible to predict the performance of  pruned networks in terms of accuracy and power, throughput, and latency  metrics. The final sub-hypothesis is that prediction models can be used  to design state-of-the-art networks and push the Pareto frontier of  accuracy vs implementation performances of current literature. The project aims to improve the results of this method by  using a finer and more efficient pruning approach.

Project framework

Deep Neural Networks (DNNs) have revolutionized image classification,  object detection, and machine translation, but at the cost of increased  complexity, power consumption, and greenhouse gas emissions. In the real world, it is needed to perform DNN inference on embedded  systems with limited resources. To address these issues, two approaches  are possible: hardware-independent and hardware-aware design.  Hardware-independent design modifies the structure of the network to  reduce the number of parameters and computations, while hardware-aware  design uses quantization and pruning to reduce the number of bits used  to represent parameters and data. Structured pruning, which removes  larger structures such as whole neurons or convolution filters, is the  pruning technique of interest in this project.

Objective

The ProPruNN project aims to design CNN accelerators on FPGAs that  benefit from structured pruning to achieve significant gains in  throughput, latency, and energy efficiency.  After designing  hardware architectures that support structured pruning, the project aims  to measure the effect of pruning on accuracy, latency, and power  consumption, and develop predictive models that link structured pruning  with these metrics. The project also aims to compete with the literature  by designing CNN-hardware architectures pairs that achieve low latency  (<1ms) while maintaining acceptable accuracies. Finally,  the project aims to design efficient neural networks that have a very  low environmental impact and can be oriented towards a particular goal,  such as energy consumption, during training using the previously  developed predictors. The potential gains in terms of energy consumption  could be as large as one order of magnitude for comparable accuracy.

Used method

The idea behind the ProPruNN project is to use structured pruning to do NAS. The advantage of doing this over classical approaches is that we can let the network decide its structure itself during learning. By incorporating the predictors we plan to create into the loss function, the network will optimize its structure itself according to a dual objective of accuracy and hardware performance (i.e., latency or power consumption).

To do this, several steps are necessary. First, it will be necessary to develop our own hardware

architectures of CNN accelerators. As mentioned in the literature review, two types of architectures are possible. On the one hand, recursive architectures, in which the hardware resources are shared between the different convolutional layers. The advantage of this architecture is that it can support any size of neural network, as long as it has a sufficient level of programmability, i.e. it is possible to tell them at the time of inference via a scheduler how their processing elements should be used and on what data. The other type corresponds to pipelined architectures. This time the hardware architecture of the accelerator depends on the hyperparameters of the network. It would then be a question of CNN - hardware architectures pairs whose synthesis must be automated. In the framework of the ProPruNN project, such architectures are not possible. Indeed, we want to create datasets linking CNNs with their hardware performance metrics (latency, energy consumption), on which we intend to perform learning tasks. To perform learning tasks, we need a large enough dataset (hundreds at least). Creating such a dataset for a limited number of a few programmable recursive architectures is possible: the synthesis of these architectures will take a few days for each, on which we will be able to evaluate by simulation the execution of a large number of different CNNs. If it were necessary on the contrary to generate several hundreds of pipelined CNN - hardware architectures pairs, the time needed would be unmanageable (several hundreds of days of synthesis). A second reason why we choose to focus on these recursive architectures is also that the most advanced works in the literature on the ImageNet dataset also use them. And the last reason is that recursive architectures are simpler to size to use the maximum in a given FPGA, unlike pipelined architectures whose structure depends on the considered CNN.

Objective

The second step is the creation of predictive models of the latency and energy consumption of the circuit. To do so, we will use the product of the previous step, i.e. the hardware architectures and the tools allowing to execute the CNNs, in order to create a dataset associating a CNN pruned in a structured way with its hardware performances (latency, power consumption). Then, we will compare approaches to predict performances: an expert one, based on some hardware metrics to predict the latency or the power consumption of the CNN on a given hardware, and a learning-based one. We will then compare both these approaches on the considered dataset. The goal will then be that the network, during training, shapes its own structure without degrading its accuracy.

The expected outcomes

The ProPruNN project addresses the societal problem of machine  learning's ecological impact by reducing the complexity of networks  through pruning and designing neural networks with low energy  consumption goals. The proposed technique involves a single training  session that leads to an efficient network structure, unlike the  CO2-intensive approach in the hardware-aware NAS literature. The  industrial demand for machine learning solutions requires trade-offs  between accuracy, latency, and energy consumption. The ProPruNN project  aligns with this demand by integrating energy consumption and latency  criteria into the network's design and hardware architecture.

School's role

IMT Atlantique is a well established research center, which facilitates interdisciplinary collaborations and partnerships with other institutions and organizations. It will provide the necessary infrastructure, resources, and expertise to enable researchers to carry out their work.

Partners

Various existing industry collaborations with companies such as Stellantis, OSO  AI, and GoodFloow may benefit from the expertise developed in the ProPruNN  project to build neural networks suited for their hardware targets.  FPGAs are a competitive target for low latency and energy consumption.

The ProPruNN project will also strengthen my international academic collaborations. First, I am in contact with Yvon Savaria from Polytechnique Montreal, who is in connection with the IVADO institute, for which he proposes solutions for hardware implementations of neural networks. As such, the original approach of the ProPruNN project is of great interest to him, since it faces the same challenges of neural network embeddability. Another collaboration is planned with the University of Tampere in Finland, with whom we are exchanging students on these kind of subjects.