Optimizing Large Language Models with NVIDIA’s TensorRT: Pruning and Distillation Explained