Model Pruning in Deep Neural Networks Based on Accurate GPU-level Energy Measurement

Project Description

This project aims to improve the energy efficiency of neural network models by systematically identifying and pruning the most energy-intensive layers without significantly sacrificing model accuracy. The specific objectives are three-fold: 1. Measure the energy consumption of each layer during inference using hardware-level monitoring tools.
2. Analyze the relationship between layer complexity, accuracy contribution, and energy cost.
3. Develop a pruning strategy that balances accuracy drop and energy reduction.
The objective of this research is to investigate how model pruning techniques can reduce the energy cost of deep neural networks (DNNs) deployed on GPU platforms. While pruning traditionally focuses on lowering computational complexity and memory usage, its direct impact on energy consumption remains insufficiently quantified. This project aims to establish a measurement-driven understanding of how specific pruning decisions—such as removing weights, filters, or entire blocks—affect GPU power usage during both inference and training. Ultimately, the research seeks to identify pruning strategies that achieve meaningful reductions in energy consumption while preserving model accuracy. The methodology combines systematic pruning experiments with accurate GPU energy profiling. Candidate DNN architectures (e.g., ResNet, MobileNet, and Transformer-style models) will be pruned using state-of-the-art structured and unstructured techniques. Each pruned model will be evaluated using a controlled benchmarking pipeline that records power draw, execution time, and energy per layer or module. GPU energy consumption will be measured using NVIDIA’s power telemetry interfaces and validated through external measurement when possible. By correlating pruning patterns with component-wise energy costs, the study will provide actionable insights into energy-efficient neural network design and build a dataset enabling further optimization research.

Tasks and Responsibilites

The student researcher will be responsible for carrying out both the experimental and analytical components of the project. Their first task will be to survey existing pruning techniques and energy-aware model optimization strategies, summarizing key approaches and identifying methods most suitable for GPU-based evaluation. The student will then implement or adapt pruning algorithms—such as magnitude pruning, structured filter pruning, and low-rank approximation—using frameworks like PyTorch. A major responsibility will be building a reliable experimental pipeline for GPU energy measurement. This includes setting up controlled benchmarking scripts, integrating power-monitoring tools (e.g., NVIDIA Management Library), and validating measurements across repeated runs. The student will execute pruning experiments on selected neural network architectures, systematically collecting data on accuracy, inference latency, power draw, and layer-level energy usage. The student will also analyze the collected data to identify patterns linking pruning decisions to energy savings. This will involve applying statistical methods, visualizing results, and comparing performance across different pruning strategies. Finally, the student will document all procedures, maintain organized code and datasets, and contribute to interim reports and a final research summary. Through these responsibilities, the student will gain hands-on experience in deep learning, performance benchmarking, and experimental research methodology.

Desired Qualifications

None Listed.