Final Exam Prep: Code
CGMA Calculation
Compute to Global Memory Access (CGMA) ratio: The number of FP calculations performed for each access to the global memory within a region in a CUDA program.
A good CGMA is in the order of 20 or 30 to overcome slow global memory access.
2 Elements per Thread: C[i] = A[i] + B[i]
Index of the first element:
i = threadIdx.x + (blockIdx.x * blockdim.x)*2;
Index of the second element:
i + blockDim.x
Last updated