V Explore GPGPU-SIM and GEMM
Table of contents
At this part, you will tune GEMM and learn the basic use of GPGPU-SIM

GPGPU-SIM is a simulator for CUDA program. GPGPU-SIM is a little outdated from gem5. But it is still acknowledged by academic field.
Links about GPGPU-SIM
Build GPGPU-SIM
You can choose either of two ways below to prepare environment for building.
Setup on Ubuntu 18.04
sudo apt-get install  -y wget build-essential xutils-dev bison zlib1g-dev flex \
      libglu1-mesa-dev git g++ libssl-dev libxml2-dev libboost-all-dev git g++ \
      libxml2-dev vim python-setuptools python-dev build-essential python-pip
pip3 install pyyaml plotly psutil
wget http://developer.download.nvidia.com/compute/cuda/11.0.1/local_installers/cuda_11.0.1_450.36.06_linux.run
sh cuda_11.0.1_450.36.06_linux.run --silent --toolkit
rm cuda_11.0.1_450.36.06_linux.run
Using docker(accel-sim)
To get docker image
docker pull accelsim/ubuntu-18.04_cuda-11
Build
To get GPGPU-SIM
git clone git@github.com:accel-sim/gpgpu-sim_distribution.git
To build
# at <gpgpu-sim dir>
source setup_environment
make -j
Use GPGPU-SIM
The following steps are all necessary.
Comile CUDA source file
you should add
-lcudartflag when you use nvcc to compile
nvcc -lcudart <source-file> -o <binary-file>
Set up environment
# at <gpgpu-sim dir>
. setup_environment
Copy config and just run
First, choose a config you like from <gpgpu-sim dir>/configs/tested-cfgs. Copy all the files under <gpgpu-sim dir>/configs/tested-cfgs/<selected configs> to the path where the binary file lies.Then go to the path where the binary file lies and just run it.
GEMM
General Matrix Multiply (GEMM) is a common algorithm in linear algebra, machine learning, statistics, and many other domains. It provides a more interesting trade-off space, as there are many ways to break up the computation. This includes using blocking, inner products, outer products, and systolic array techniques.
At this part of LAB, we provide a GEMM template code of CUDA, your task is as follows:
- simulate GEMM template code in GPGPU-SIM and find out the weakness of it
you can do whatever you want with the code except the basic test frame in order to improve the performance of the GEMM
Hintyou can simulate the modified code in GPGPU-SIM to validate the improvement of performance.
a. What parameters do you think should be used to evaluate GEMM performance? Why? (Try to look through the simulation output)