Microsoft Bing Visual Search Enhanced by NVIDIA’s Accelerated Libraries
The post Microsoft Bing Visual Search Enhanced by NVIDIA’s Accelerated Libraries appeared on BitcoinEthereumNews.com.
Tony Kim Oct 08, 2024 06:23 Microsoft Bing Visual Search achieves a 5.13x speedup using NVIDIA’s TensorRT, CV-CUDA, and nvImageCodec, enhancing efficiency and reducing costs. Microsoft Bing Visual Search, a tool enabling users worldwide to search using photographs, has been significantly optimized through a collaboration with NVIDIA, resulting in a remarkable performance boost. According to NVIDIA Technical Blog, the integration of NVIDIA’s TensorRT, CV-CUDA, and nvImageCodec into Bing’s TuringMM visual embedding model has led to a 5.13x increase in throughput for offline indexing pipelines, reducing both energy consumption and costs. Multimodal AI and Visual Search Multimodal AI technologies, like Microsoft’s TuringMM, are essential for applications that require seamless interaction between different data types such as text and images. A popular model for joint image-text understanding is CLIP, which uses a dual encoder architecture to process hundreds of millions of image-caption pairs. These advanced models are critical for tasks such as text-based visual search, zero-shot image classification, and image captioning. Optimization Efforts The optimization of Bing’s visual embedding pipeline was achieved by leveraging NVIDIA’s GPU acceleration technologies. The effort focused on enhancing the performance of the TuringMM pipeline by using NVIDIA’s TensorRT for model execution, which improved the efficiency of computationally expensive layers in transformer architectures. Additionally, the use of nvImageCodec and CV-CUDA accelerated the image decoding and preprocessing stages, leading to a significant reduction in latency for image processing tasks. Implementation and Results Prior to optimization, Bing’s visual embedding model operated on a GPU server cluster that handled inference tasks for various deep learning services across Microsoft. The original implementation, using ONNXRuntime with CUDA Execution Provider, faced bottlenecks due to image decoding processes handled by OpenCV. By integrating NVIDIA’s libraries, the pipeline’s throughput increased from 88 queries per second (QPS) to 452 QPS, showcasing a 5.14x speedup. These enhancements not…
Filed under: News - @ October 8, 2024 6:22 am