Enhancing CUDA Kernel Performance with Shared Memory Register Spilling