Introduction
왜 왔냐요
과목명이 멋져요 과목명을 바꿔야겠다
Real introduction
How can we run AI (in GPU) efficiently?
- What is AI accelerator chip?
- How can we design AI accelerator chip?
- How can we run neural networks on the AI accelerator chip (or GPU)?
- How can we run CNN/LLM on the AI accelerator chip (or GPU)?
- (If possible) why HBM is more important?
Lecture goal
Understanding hardware/software system design issues/methods with a real system design example
Hardware design
Matrix-matrix (MM) multiplication accelerator
Previously we used Verilog, but these days we have python-based hardware description language, Amaranth!
Software design
Neural network code running on CPU
It should communicate with hardware MM accelerator.
Optimizing software/hardware design
- Tiling a.k.a. blocking
- Reduced precision (e.g. 8-bit computation)
- Zero skipping in matrix-matrix multiplication
Optimization includes runtime optimization and energy optimization.
Q. How do we optimize power with Amaranth(simulation)?
A. Even Verilog uses power estimation. We estimate power usage for each step in design process.