- Links about GPGPU-SIM
- Build GPGPU-SIM
- Use GPGPU-SIM
At this part, you will tune GEMM and learn the basic use of GPGPU-SIM
GPGPU-SIM is a simulator for CUDA program. GPGPU-SIM is a little outdated from gem5. But it is still acknowledged by academic field.
You can choose either of two ways below to prepare environment for building.
sudo apt-get install -y wget build-essential xutils-dev bison zlib1g-dev flex \ libglu1-mesa-dev git g++ libssl-dev libxml2-dev libboost-all-dev git g++ \ libxml2-dev vim python-setuptools python-dev build-essential python-pip pip3 install pyyaml plotly psutil wget http://developer.download.nvidia.com/compute/cuda/11.0.1/local_installers/cuda_11.0.1_450.36.06_linux.run sh cuda_11.0.1_450.36.06_linux.run --silent --toolkit rm cuda_11.0.1_450.36.06_linux.run
To get docker image
docker pull accelsim/ubuntu-18.04_cuda-11
To get GPGPU-SIM
git clone email@example.com:accel-sim/gpgpu-sim_distribution.git
# at <gpgpu-sim dir> source setup_environment make -j
The following steps are all necessary.
you should add
-lcudartflag when you use nvcc to compile
nvcc -lcudart <source-file> -o <binary-file>
# at <gpgpu-sim dir> . setup_environment
First, choose a config you like from
<gpgpu-sim dir>/configs/tested-cfgs. Copy all the files under
<selected configs> to the path where the binary file lies.Then go to the path where the binary file lies and just run it.
General Matrix Multiply (GEMM) is a common algorithm in linear algebra, machine learning, statistics, and many other domains. It provides a more interesting trade-off space, as there are many ways to break up the computation. This includes using blocking, inner products, outer products, and systolic array techniques.
- simulate GEMM template code in GPGPU-SIM and find out the weakness of it
you can do whatever you want with the code except the basic test frame in order to improve the performance of the GEMM
Hintyou can simulate the modified code in GPGPU-SIM to validate the improvement of performance.
a. What parameters do you think should be used to evaluate GEMM performance? Why? (Try to look through the simulation output)