VMware Cracks -> How is VMware doing with GPU Support in virtualized environments?

With my customers I often gets asked “… btw, what is VMware doing on GPU accelerated workloads like HPC (high performance computing), K8s and AI /ML ?”

Well…. there is a long story on this. I made a list of blog post on this topic.

Enjoy

Cheers, Dudi

Broadcom Delivers Near Bare-Metal Performance — MLPerf Inference 5.0 (Apr 2025) vSphere with NVIDIA vGPUs delivers 95%–100% of bare metal performance for MLPerf Inference 5.0, while using only 28.5%–67% of CPU cores and 50%–83% of available memory — leaving headroom for other workloads ->> https://blogs.vmware.com/cloud-foundation/2025/04/17/broadcom-delivers-near-bare-metal-performance-for-virtualized-ai-ml/

Wizards Behind the Curtain — MLPerf Inference 4.0 (Broadcom, NVIDIA, Dell) (May 2024) Tests on a Dell XE9680 with 8x virtualized NVIDIA H100 SXM GPUs showed vSphere achieving 95%–104% of bare metal performance across Offline and Server benchmark scenarios ->>  https://blogs.vmware.com/performance/2024/05/magic-of-virtualized-ml-ai.html

VMware Private AI Foundation with NVIDIA — Aria Automation Services (Nov 2024) VCF serves as the core infrastructure platform for VMware Private AI Foundation with NVIDIA, providing a secure, cloud-native AI platform for deep learning, ML, and HPC workloads using NVIDIA NGC containers validated on vSphere ESXi hosts with NVIDIA GPUs ->> https://blogs.vmware.com/cloud-foundation/2024/11/06/aria-automation-march-2024-8-16-2-private-ai-automation-services-for-nvidia/

ML and AI Performance of NVIDIA GPUs on VCF (Sep 2024) Mark Achtemichuk discusses virtualized ML/AI performance results on different NVIDIA GPUs with Uday Kurkure, highlighting near or better than bare metal performance. ->>  https://blogs.vmware.com/cloud-foundation/2024/09/19/extreme-performance-series-2024-ml-and-ai-performance-of-nvidia-gpus-on-vcf/

Boost Throughput by Scaling VMs While Keeping GPUs to a Minimum (Aug 2024) Tests with LLAMA2-7b and LLAMA2-13b showed that 2 VMs sharing a single physical GPU using vGPU/MIG configurations on vSphere 8.0 U3 delivered significantly higher aggregate throughput than a single VM using the full GPU — and 3 VMs sharing one GPU delivered even more ->> https://blogs.vmware.com/cloud-foundation/2024/08/27/boost-throughput-scaling-vms-minimal-gpus/

Automated Testing with Virtualized GPUs for ML/AI Workloads (Sep 2024) This episode discusses the test automation framework put in place at VMware for ML and AI workloads with virtualized GPUs and key insights gained from large-scale automated testing ->> https://blogs.vmware.com/cloud-foundation/2024/09/13/extreme-performance-series-2024-automated-testing-with-virtualized-gpus-for-ml-ai-workloads/

vSphere Scaling and High Performance (Aug 2024) Todd Muirhead and Mark Achtemichuk discuss the scalability of the vSphere platform at large scale — relevant to GPU-heavy HPC cluster configurations ->> https://blogs.vmware.com/cloud-foundation/2024/08/26/extreme-performance-series-2024-vsphere-scale-and-high-performance/

MLPerf Inference 5.1 — VCF 9.0 with NVIDIA B200 & H200 GPUs (Dec 2025 — most current) VCF 9.0 supports both DirectPath I/O and NVIDIA vGPU technologies and achieved on-par bare metal performance for LLMs (Llama 3.1 405B), Speech2Text (Whisper), and Text2Images (Stable Diffusion XL) on NVIDIA HGX B200 and H200 GPUs ->> https://blogs.vmware.com/cloud-foundation/2025/12/15/mlperf-5-1-confirms-vcf-future-of-ai-ml-performance/

No Virtualization Tax — MLPerf Inference v3.0 with NVIDIA Hopper & Ampere vGPUs (2023) vSphere with NVIDIA vGPUs delivers near bare metal performance ranging from 94.4% to 105% for Offline and Server scenarios when using the MLPerf Inference 3.0 benchmarks with H100 and A100 GPUs on vSphere 8.0 U1. 🔗 https://blogs.vmware.com/performance/2023/04/no-virtualization-tax-for-mlperf-inference-v3-0-using-nvidia-hopper-and-ampere-vgpus-and-nvidia-ai-software-with-vsphere-8-0-1.html

vSphere 8 Performance in the “Goldilocks Zone” — MLPerf Training & Inference (Performance Study PDF, Jun 2024) This official study covers MLPerf Training v3.0 benchmarks comparing vGPU 4x A100-80c against bare metal, plus MLPerf Inference with 2x H100 — detailed throughput tables in queries per second are included. ->> https://www.vmware.com/docs/vmware-ml-training-and-inference-perf

HPC in Finance — Monte Carlo with Virtual GPUs on vSphere (Dec 2023) This article covers testing of a Monte Carlo simulation (used for financial “Greeks” calculations) on VMware vSphere with virtualized A100 GPUs — showing performance improvement over bare metal deployment for a highly parallel CUDA-based application using partial differential equations and linear algebra ->>  https://blogs.vmware.com/cloud-foundation/2023/03/20/a-high-performance-computing-application-in-finance-with-virtual-gpus-on-vmware-vsphere/

Exploring GPU Architecture and Why You Need It for HPC on vSphere This foundational blog explains why GPUs are ideal for HPC workloads — covering the CPU vs. GPU architecture difference (latency-optimized vs. throughput-optimized) and how running HPC workloads on vSphere ESXi makes GPU resource allocation flexible and dynamic ->> https://blogs.vmware.com/cloud-foundation/2019/03/08/exploring-the-gpu-architecture-and-why-we-need-it/

Boost Throughput by Scaling VMs While Keeping GPUs to a Minimum (Aug 2024) Tests with LLAMA2-7b and LLAMA2-13b showed that 2 VMs sharing a single physical GPU using vGPU/MIG configurations on vSphere 8.0 U3 delivered significantly higher aggregate throughput than a single VM using the full GPU — and 3 VMs sharing one GPU delivered even more ->> https://blogs.vmware.com/cloud-foundation/2024/08/27/boost-throughput-scaling-vms-minimal-gpus/

Leave a Reply

Your email address will not be published. Required fields are marked *