Post

CUDA Basics

CUDA basics

CUDA Basic Syntax for defining a kernel with a grid size and thread blocks

Host Code: Code that runs on CPU

Device Code: Code that runs on GPU

Grid: collection of all threads launched for a kernel - distributed into thread blocks

Thread blocks: a set of threads in a grid grouped together

Syntax given for a 32 thread :

1
2
3
dim3 block(4,1,1);
or 
dim3 block(4); ( it will be implicit that the number of threads in the other two dimensions, y and z will be 1 and 1 respectively)

So, in the above code - it means that we want each thread block to have four threads each.

1
2
3
dim3 grid(8, 1, 1); 
or 
dim3 block(8); (so it is implicit similar to the block declaration)

and that the grid defined for the kernel has 8 thread-blocks

The first keyword dim3 is used to denote the number of dimensions of the thread block and the grid size. It is a cuda structure defined for storing the grid and the block dimensions

So. lets say we have a 64 core gpu (each core able to run one single thread), then we can define a kernel with the following thread-block and grid dimensions ( using all the three dimensions).

1
2
3
4
5
6
7
8
9
dim3 block(4, 4, 4); 
dim3 grid(1, 1, 1);


or 

dim3 block(8, 2, 1);
dim3 grid(4, 1, 1); # meaning there are four thread blocks in grid.x and 
then each thread block: thread.blk.x - 8 threads in x and thread.blk.y - 2 threads in y 

Also, there are limitations to the total number of threads in a thread block based on the x, y, z dimensions as follows:

  • x: 1024
  • y: 1024
  • z: 64 and for the grid with the totla number of thread blocks as:
  • x: 2^32 - 1
  • y: 65536
  • z: 65536 So, once we define the sizes of the - we can use them in writing kernel functions as follows: Simple function declaration and definition ``` Simple GPU function declaration and definition ———————————————————-

print_helloworld«<grid, block »>(); # declaration - generally in the main function for simple programs

it is essentially to say function_name«< num of blocks, num of threads per block»>();


global void print_helloworld(){ printf(“Hello World!”); }

1
2
3
4
The function definition is similar to that of a C/C++ function defintion with a return type before the function name and then the function with the argument list in the parenthesis.

We also have the following modifiers - we use the `__global__` modifier for the above example

device host global ```

This post is licensed under CC BY 4.0 by the author.