Enhancing GPU Efficiency: Understanding Global Memory Access in CUDA