Skip to main content Link Search Menu Expand Document (external link)

V Explore GPGPU-SIM and GEMM

Table of contents

  1. Links about GPGPU-SIM
  2. Build GPGPU-SIM
    1. Setup on Ubuntu 18.04
    2. Using docker(accel-sim)
    3. Build
  3. Use GPGPU-SIM
    1. Comile CUDA source file
    2. Set up environment
    3. Copy config and just run
  4. GEMM

At this part, you will tune GEMM and learn the basic use of GPGPU-SIM


GPGPU-SIM is a simulator for CUDA program. GPGPU-SIM is a little outdated from gem5. But it is still acknowledged by academic field.


You can choose either of two ways below to prepare environment for building.

Setup on Ubuntu 18.04

sudo apt-get install  -y wget build-essential xutils-dev bison zlib1g-dev flex \
      libglu1-mesa-dev git g++ libssl-dev libxml2-dev libboost-all-dev git g++ \
      libxml2-dev vim python-setuptools python-dev build-essential python-pip
pip3 install pyyaml plotly psutil
sh --silent --toolkit

Using docker(accel-sim)

To get docker image

docker pull accelsim/ubuntu-18.04_cuda-11



git clone

To build

# at <gpgpu-sim dir>
source setup_environment
make -j


The following steps are all necessary.

Comile CUDA source file

you should add -lcudart flag when you use nvcc to compile

nvcc -lcudart <source-file> -o <binary-file>

Set up environment

# at <gpgpu-sim dir>
. setup_environment

Copy config and just run

First, choose a config you like from <gpgpu-sim dir>/configs/tested-cfgs. Copy all the files under <gpgpu-sim dir>/configs/tested-cfgs/<selected configs> to the path where the binary file lies.Then go to the path where the binary file lies and just run it.


General Matrix Multiply (GEMM) is a common algorithm in linear algebra, machine learning, statistics, and many other domains. It provides a more interesting trade-off space, as there are many ways to break up the computation. This includes using blocking, inner products, outer products, and systolic array techniques.

At this part of LAB, we provide a GEMM template code of CUDA, your task is as follows:

  • simulate GEMM template code in GPGPU-SIM and find out the weakness of it

you can do whatever you want with the code except the basic test frame in order to improve the performance of the GEMM

Hint you can simulate the modified code in GPGPU-SIM to validate the improvement of performance.

a. What parameters do you think should be used to evaluate GEMM performance? Why? (Try to look through the simulation output)