GRID 5.0–Pascal Support and More

This morning, NVIDIA announced the latest version of the graphics virtualization stack – NVIDIA GRID 5.0.  This latest releases continues the trend that NVIDIA started two years ago when they separated the GRID software stack from the Tesla data center GPUs in the GRID 2.0 release.

GRID 5.0 adds several new key features to the GRID product line.  Along with these new features, NVIDIA is also adding a new Tesla card and rebranding the Virtual Workstation license SKU.

Quadro Virtual Data Center Workstation

Previous versions of GRID contained profiles designed for workstations and high-end professional applications.  These profiles, which ended in a Q, provided Quadro level features for the most demanding applications.  They also required the GRID Virtual Workstation license.

NVIDIA has decided to rebrand the professional series capabilities of GRID to better align with their professional visualization series of products.  The GRID Virtual Workstation license will now be called the Quadro Virtual Data Center Workstation license.  This change helps differentiate the Virtual PC and Apps features, which are geared towards knowledge users, from the professional series capabilities.

Tesla P6

The Tesla P6 is the Pascal-generation successor to the Maxwell-generation M6 GPU.  It provides a GPU purpose-built for blade servers.  In addition to using a Pascal-generation GPU, the P6 also increases the amount of Framebuffer to 16GB.   The P6 can now support up to 16 users per blade, which provides more value to customers who want to adopt GRID for VDI on their blade platform.

Pascal Support for GRID

The next generation GRID software adds support for the Pascal-generation Tesla cards.  The new cards that are supported in GRID 5.0 are the Tesla P4, P6, P40, and P100.

The P40 is the designated successor to the M60 card.  It is a 1U board with a single GPU and 24GB of Framebuffer.  The increased framebuffer also allows for a 50% increase in density, and the P40 can handle up to 24 users per board compared to the 16 users per M60.

Edit for Clarification – The comparison between the M60 and the P40 was done using the 1GB GRID profiles.  The M60 can support up to 32 users per board when assigning each VM 512MB of framebuffer, but this option is not available in GRID 5.0.  

On the other end of the scale is the P4. This is a 1U small form factor Pascal GPU with 8GB of Framebuffer.  Unlike other larger Tesla boards, this board can run on 75W, so it doesn’t need any additional power.  This makes it suitable for cloud and rack-dense computing environments.

In addition to better performance, the Pascal cards have a few key advantages over the previous generation Maxwell cards.  First, there is no need to use the GPU-Mode-Switch utility to convert the Pascal board from compute mode to graphics mode.  There is, however, a manual step that is required to disable ECC memory on the Pascal boards, but this is built into the NVIDIA-SMI utility.  This change streamlines the GRID deployment process for Pascal boards.

The second advantage involves hardware-level preemption support.  In previous generations of GRID, CUDA support was only available when using the 8Q profile.  This dedicated an entire GPU to a single VM.  Hardware preemption support enables Pascal cards to support CUDA on all profiles.

To understand why hardware preemption is required, we have to look at how GRID shares GPU resources.  GRID uses round-robin time slicing to share GPU resources amongst multiple VMs, and each VM gets a set amount of time on the GPU.  When the time slice expires, the GPU moves onto the next VM.  When the GPU is rendering graphics to be displayed on the screen, the round-robin method works well because the GPU can typically complete all the work in the allotted time slice.  CUDA jobs, however, pose a challenge because jobs can take hours to complete.  Without the ability to preempt the running jobs, the CUDA jobs could fail when the time slice expired.

Preemption support on Pascal cards allows VMs with any virtual workstation profile to have access to CUDA features.  This enables high-end applications to use smaller Quadro vDWS profiles instead of having to have an entire GPU dedicated to that specific user.

Fixed Share Round Robin Scheduling

As mentioned above, GRID uses round robin time slicing to share the GPU across multiple VMs.  One disadvantage of this method is that if a VM doesn’t have anything for the GPU to do, it is skipped and the time slice is given to the next VM in line.  This prevents the GPU from being idle if there are VMs that can utilize it.  It also means that some VMs may get more access to the GPU than others.

NVIDIA is adding a new scheduler option in GRID 5.0.  This option is called the Fixed Share Scheduler.  The Fixed Share scheduler grants each VM that is placed on the GPU an equal share of resources.  Time slices are still used in the fixed share scheduler, and fi a VM does not have any jobs for the GPU to execute, the GPU will be idled during that time slice.

As VMs are placed onto, or removed from, a GPU, the share of resources available to each VM is recalculated, and shares are redistributed to ensure that all VMs get equal access.

Enhanced Monitoring

GRID 5.0 adds new monitoring capabilities to the GRID platform.  One of the new features is per-application monitoring.  Administrators can now view GPU utilization on a per-application basis using the NVIDIA-SMI tool.  This new feature allows administrators to see exactly how much of the GPU resources each application is using.

License Enforcement

In previous versions of GRID, the licensing server basically acted as an auditing tool.  A license was required for GRID, but the GRID features would continue to function even if the licensed quantity was exceeded.  GRID 5.0 changes that.  Licensing is now enforced with GRID, and if a license is not available, the GRID drivers will not function.  Users will get reduced quality when they sign into their desktops.

Because licensing is now enforced, the license server has built-in HA functionality.  A secondary licensing server can be specified in the config of both the license server and the driver, and if the primary is not available, it will fall back to the secondary.

Other Announced Features

Two GRID 5.0 features were announced at Citrix Synergy back in May.  The first was Citrix Director support for monitoring GRID.  The second feature is beta Live Migration support for XenServer.

What’s New in NVIDIA GRID August 2016

Over the last year, the great folks over at NVIDIA have been very busy.  Last year at this time, they announced the M6 and M60 cards, bringing the Maxwell architecture to GRID, adding support for blade server architectures, and introducing the software licensing model for the drivers.  In March, GRID 3.0 was announced, and it was a fix for the new licensing model.

Today, NVIDIA announced the August 2016 release of GRID.  This is the latest edition of the GRID software stack, and it coincides with the general availability of the high-density M10 card that supports up to 64 users.

So aside from the hardware, what’s new in this release?

The big addition to the GRID product line is monitoring.  In previous versions of GRID, there was a limited amount of performance data that any of the NVIDIA monitoring tools could see.  NVIDIA SMI, the hypervisor component, could only really report on the GPU core temperature and wattage, and the NVIDIA WMI counters on Windows VMs could only see framebuffer utilization.

The GRID software now exposes more performance metrics from the host and the guest VM level.  These metrics include discovery of the vGPU types currently in use on the physical card as well as utilization statistics for 3D, encode, and decode engines from the hypervisor and guest VM levels.  These stats can be viewed using the NVIDIA-SMI tool in the hypervisor or by using NVIDIA WMI in the guest OS.  This will enable 3rd-party monitoring tools, like Liquidware Stratusphere UX, to extract and analyze the performance data.  The NVIDIA SDK has been updated to provide API access to this data.

Monitoring was one of the missing pieces in the GRID stack, and the latest release addresses this.  It’s now possible to see how the GPU’s resources are being utilized and if the correct profiles are being utilized.

The latest GRID release supports the M6, M60 and M10 cards, and customers have an active software support contract with NVIDIA customers.  Unfortunately, the 1st generation K1 and K2 cards are not supported.

#GRIDDays Followup – Understanding NVIDIA GRID vGPU Part 1

Author Node: This post has been a few months in the making.  While GRIDDays was back in March, I’ve had a few other projects that have kept this on the sidelines until now.  This is Part 1.  Part 2 will be coming at some point in the future.  I figured 1200 words on this was good enough for one chunk.

The general rule of thumb is that if a virtual desktop requires some dedicated hardware – examples include serial devices, hardware license dongles, and physical cards, it’s probably not a good fit to be virtualized.  This was especially true of workloads that required high-end 3D acceleration.  If a virtual workload required 3D graphics, multiple high-end Quadro cards hard to be installed in the server and then passed through to the virtual machines that required them. 

Since pass-through GPUs can’t be shared amongst VMs, this design doesn’t scale well.  There is a limit to the number of cards I could install in a host, and that limited the number of 3D workloads I could run.  If I needed more, I would have to add hosts.  It also limits the flexibility in the environment as VMs with pass-through hardware can’t easily be moved to a new host if maintenance is needed or a hardware failure occurs.

NVIDIA created the GRID products to address the challenges of GPU virtualization.  GRID technology combines purpose-built graphics hardware, software, and drivers to allow multiple virtual machines to access a GPU. 

I’ve always wondered how it worked, and how it ensured that all configured VMs had equal access to the GPU.  I had the opportunity to learn about the technology and the underlying concepts a few weeks ago at NVIDIA GRID Days. 

Disclosure: NVIDIA paid for my travel, lodging, and some of my meals while I was out in Santa Clara.  This has not influenced the content of this post.

Note:  All graphics in this slide are courtesy of NVIDIA.

How it Works – Hardware Layer

So how does a GRID card work?  In order to understand it, we have to start with the hardware.  A GRID card is a PCIe card with multiple GPUs on the board.  The hardware includes the same features that many of the other NVIDIA products have including framebuffer (often referred to as video memory), graphics compute cores, and hardware dedicated to video encode and decode. 

image

Interactions between an operating system and a PCIe hardware device happen through the base address register.  Base address registers are used to hold memory addresses used by a physical device.  Virtual machines don’t have full access to the GPU hardware, so they are allocated a subset of the GPU’s base address registers for communication with the hardware.  This is called a virtual BAR. 

image

Access to the GPU Base Address Registers, and by extension the Virtual BAR, is handled through the CPU’s Memory Management Unit.  The MMU handles the translation of the virtual BAR memory addresses into the corresponding physical memory addresses used by the GPU’s BAR.  The translation is facilitated by page tables managed by the hypervisor.

The benefit of the virtual bar and hardware-assisted translations is that it is secure.  VMs can only access the registers that they are assigned, and they cannot access any other locations outside of the virtual BAR.

image

The architecture described above – assigning a virtual base address register space that corresponds to a subset of the physical base address register allows multiple VMs to securely share one physical hardware device.  That’s only one part of the story.  How does work actually get from the guest OS driver to the GPU?  And how does the GPU actually manage GPU workloads from multiple VMs?

When the NVIDIA driver submits a job or workload to the GPU, it gets placed into a channel.  A channel is essentially a queue or a line that is exposed through each VM’s virtual BAR.  Each GPU has a fixed number of channels available, and channels are allocated to each VM by dividing the total number of channels by the number of users that can utilize a profile.  So if I’m using a profile that can support 16 VMs per GPU, each VM would get 1/16th of the channels. 

When a virtual desktop user opens an application that requires resources on the GPU, the NVIDIA driver in the VM will dedicate a channel to that application.  When that application needs the GPU to do something, the NVIDIA driver will submit that job to channels allocated to the application on the GPU through the virtual BAR.

image

So now that the application is queue up for execution, something needs to get it into the GPU for execution.  That job is handled by the scheduler.  The scheduler will move work from active channels into the GPU engines.  The GPU has four engines for handling a few different tasks – graphics compute, video encode and decode, and a copy engine.  The GPU engines are timeshared (more on that below), and they execute jobs in parallel.

When active jobs are placed on an engine, they are executed sequentially.  When a job is completed, the NVIDIA driver is signaled that the work has been completed, and the scheduler loads the next job onto the engine to begin processing.

image

Scheduling

There are two types of scheduling in the computing world – sequential and parallel.  When sequential scheduling is used, a single processor executes each job that it receives in order.  When it completes that job, it moves onto the next.  This can allow a single fast processor to quickly move through jobs, but complex jobs can cause a backup and delay the execution of waiting jobs.

Parallel scheduling uses multiple processors to execute jobs at the same time.  When a job on one processor completes, it moves the next job in line onto the processor.  Individually, these processors are too slow to handle a complex job.  But they prevent a single job from clogging the pipeline.

A good analogy to this would be the checkout lane at a department store.  The cashier (and register) is the processor, and each customer is a job that needs to be executed.  Customers are queued up in line, and as the cashier finishes checking out one customer, the next customer in the queue is moved up.  The cashier can usually process users efficiently and keep the line moving, but if a customer with 60 items walks into the 20 items or less lane, it would back up the line and prevent others from checking out.

This example works for parallel execution as well.  Imagine that same department store at Christmas.  Every cash register is open, and there is a person at the front of the line directing where people go.  This person is the scheduler, and they are placing customers (jobs) on registers  (GPU engines) as soon as they have finished with their previous customer.

Graphics Scheduling

So how does GRID ensure that all VMs have equal access to the GPU engines?  How does it prevent one VM from hogging all the resources on a particular engine?

The answer comes in the way that the scheduler works.  The scheduler uses a method called round-robin time slicing.  Round-robin time slicing works by giving each channel a small amount of time on a GPU engine.  The channel has exclusive access to the GPU engine until the timeslice expires or until there are no more work items in the channel.

If all of the work in a channel is completed before the timeslice expires, any spare cycles are redistributed to other channels or VMs.  This ensures that the GPU isn’t sitting idle while jobs are queued in other channels.

The next part of the Understanding vGPU series will cover memory management on the GRID cards.

It’s Time To Reconsider My Thoughts on GPUs in VDI…

Last year, I wrote that it was too early to consider GPUs for general VDI use and that they  should be reserved only for VDI use cases where they are absolutely required.  There were a number of reasons for this including user density per GPU, lack of monitoring and vMotion, and economics.  That lead to a Frontline Chatter podcast discussing this topic in more depth with industry expert Thomas Poppelgaard.

When I wrote that post, I said that there would be a day when GPUs would make sense for all VDI deployments.  That day is coming soon.  There is a killer app that will greatly benefit all users (in certain cases) who have access to a GPU.

Last week, I got to spend some time out at NVIDIA’s Headquarters in Santa Clara taking part in NVIDIA GRID Days.  GRID Days was a two day event interacting with the senior management of NVIDIA’s GRID product line along with briefings on the current and future technology in GRID.

Disclosure: NVIDIA paid for my travel, lodging, and some of my meals while I was out in Santa Clara.  This has not influenced the content of this post.

The killer app that will drive GPU adoption in VDI environments is Blast Extreme.  Blast Extreme is the new protocol being introduced in VMware Horizon 7 that utilizes H.264 as the codec for the desktop experience.  The benefit of using H.264 over other codecs is that many devices include hardware for encoding and decoding H.264 streams.  This includes almost every video card made in the last decade.

So what does this have to do with VDI?

When a user is logged into a virtual desktop or is using a published application on an RDSH server, the desktop and applications that they’re interacting with are being rendered, captured, encoded, or converted into a stream of data, and then transported over the network to the client.  Normally, this encoding happens in software and uses CPU cycles.  (PCoIP has hardware offload in the form of APEX cards, but these only handle the encoding phase, rendering happens somewhere else…

When GPUs are available to virtual desktops or RDSH/XenApp servers, the rendering and encoding tasks can be pushed into the GPU where dedicated and optimized hardware can take over these tasks.  This reduces the amount of CPU overhead on each desktop, and it can lead to snappier user experience.  NVIDIA’s testing has also shown that Blast Extreme with GPU offload uses less bandwidth and has lower latency compared to PCoIP.

Note: These aren’t my numbers, and I haven’t had a chance to validate these finding in my lab.  When Horizon 7 is released, I plan to do similar testing of my own comparing PCoIP and Blast Extreme in both LAN and WAN environment.

If I use Blast Extreme, and I install GRID cards in my hosts, I gain two tangible user experience benefits.  Users now have access to a GPU, which many applications, especially Microsoft Office and most web browsers, tap into for processing and rendering.  And they gain the benefits of using that same GPU to encode the H.264 streams that Blast Extreme uses, potentially lowering the bandwidth and latency of their session.  This, overall, translates into significant improvements in their virtual desktop and published applications experience*.

Many of the limitations of vGPU still exist.  There is no vMotion support, and performance analytics are not fully exposed to the guest OS.  But density has improved significantly with the new M6 and M60 cards.  So while it may not be cost effective to retrofit GPUs into existing Horizon deployments, GPUs are now worth considering for new Horizon 7 deployments.

*Caveat: If users are on a high latency network connection, or if the connection has a lot of contention, you may have different results.

EUC5404 – Deliver High Performance Desktops with VMware Horizon and NVIDIA GRID vGPU

Notes from EUC5405.

Reasons for 3D Graphics

  • Distributed Workforces with Large Datasets – harder to share
  • Contractors/3rd Party workers that need revocable access – worried about data Leakage and Corporate Security

Engineering firm gained 70% productivity improvements for CATIA users by implementing VDI – slide only shows 20%

Windows 7 drives 3D graphics, Aero needs 3D.  Newer versions of Windows and new web browsers do even more.

History of 3D Graphics in Horizon

  • Soft3D was first
  • vSGA – shared a graphics card amongst VM, limited to productivity and lightweight use
  • vDGA – hardwire card to virtual machine
  • GRID vGPU – Mediated Pass-thru, covers the middle space between vSGA and vDGA

vGPU defined – Shared access to physical GPU on a GRID card, gets access to native NVIDIA drivers

vGPU has official support statements from application vendors

Product Announcement – 3D graphics on RDSH

vGPU does not support vMotion, but it does support HA and DRS placement

Upgrade Path to Horizon vGPU

If you already have GRID cards and are using vDGA or vSGA, there is an upgrade path to vGPU.

Steps:

  • Upgrade to vSphere 6.0
  • Upgrade Horizon to 6.1 or newer
  • Install NVIDIA VIBs on host
  • Upgrade VMs to version 11
  • Set vGPU profiles
  • Install drivers in VMs

vGPU has Composer Support

GRID Profiles set in vCenter

Two settings to configure – one in vCenter (vGPU Profiles) and one in Horizon

GRID 2.0 – bringing Maxwell to GRID

More users, Linux Support

Moving to Platform – software on top of hardware instead of dedicated product line for GRID

GRID 2.0 is hardware plus software.  Changing from being a driver into a platform and software with additional features

Licensing is changing. Licensed user groups.

Grid Editions

vMotion not coming today – much more complicated problem to solve

GRID editions

GRID Use Cases

Virtual PC – business users who expect great perf, AutoCAD, PhotoShop

Virtual Workstation – Siemens, Solidworks, CATIA, REVIT

Virtual Workstation Extended – Very high end.  Autodesk Maya

 

High-Perf VDI is not the same your regular VDI

  • Density goes down, CPU/Memory/IOPS/Rich Graphics capabilities go up
  • Workloads are different than traditional VDI

Hardware Recommendations

  • vSphere 6.0 Required
  • VM must be HW version 11
  • 2-8 vCPUs, at least 4 for Power Users
  • Minimum 4GB RAM
  • 64-bit OS

Required Components in VMs:

  • VM Tools
  • View Agent
  • NVIDIA Driver

Use the VMware OS Optimization Tool fling.  Users can see up to 40% in resource savings.

Sizing Rich Graphics – Storage

Storage still critical factor in performance

CAD users can demand more than 1TB of storage per desktop

Size and performance matter now

Storage Options:

  • Virtual SAN – SSD based local storage
  • Or All-Flash based SANs

Bringing Rich 3D into Production

  • Establish End-User Acceptance Criteria to verify that User Experience is acceptable
  • Have end users test applications and daily tasks
  • Time how long it takes to complete tasks

The Next Generation of Virtual Graphics–NVIDIA GRID 2.0

When vGPU was released for Horizon View 6.1 back in March 2015, it was an exciting addition to the product line.  It addressed many of problems that plagued 3D Graphics Acceleration and 3D workloads in virtual desktop environments running on the VMware platform.

vGPU, running on NVIDIA GRID cards, bridged the gap between vDGA, which is dedicating a GPU to a specific virtual machine, and vSGA, which is sharing the graphics card between multiple virtual machines through the use of a driver installed in the hypervisor.  The physical cores of the GRID card’s GPU could be shared between desktops, but there was no hypervisor-based driver between the virtual desktop and the GPU.  The hypervisor-based component merely acted as a GPU scheduler to ensure that each virtual desktop received the resources that it was guaranteed.

While vGPU improved performance and application compatibility for virtual desktops and applications, there were certain limitations.  Chief amongst these limitations was support for blade servers and virtual machines running Linux.  There were also hard capacity limits – GRID cards could only support so many virtual desktops had vGPU enabled.

Introducing GRID 2.0

Today, NVIDIA is announcing the next generation of virtual Graphics Acceleration – GRID 2.0.

GRID 2.0 offers a number of benefits over the previous generation of GRID cards and vGPU software.  The benefits include:

  • Higher Densities – A GRID card with 2 high-end GPUs can now support up to 32 users.
  • Blade Server Support – GRID 2.0 will support blade servers, bringing virtual desktop graphics acceleration to high-density compute environments.
  • Linux Desktop Support – GRID 2.0 will support vGPU on Linux desktops.  This will bring vGPU to a number of use cases such as oil and gas.

GRID 2.0 will also offer better performance over previous generations of GRID and vGPU.

Unfortunately, these new features and improvements aren’t supported on today’s GRID K1 and K2 cards, so that means…

New Maxwell-based GRID Cards

NVIDIA is announcing two new graphics cards alongside GRID 2.0.  These cards, which are built on the Maxwell architecture, are the M60 – a double-height PCI-Express card with two high-end Maxwell cores, and the M6 – a GPU with a single high-end Maxwell core that is designed to fit in blades and other rack-dense infrastructure.  The M6 is designed to have approximately half of the performance of the M60.  Both cards double the amount of memory available to the GPU.  The M6 and M60 each have 8GB of RAM per GPU compared to the 4GB per GPU on the GRID K1 and K2.

Both the M6 and the M60 will be branded under NVIDIA’s TESLA line of data center graphics products, and the GRID 2.0 software will bring the graphics virtualization capabilities on top of these new Maxwell-based Tesla cards..  The M60 is slated to be the direct replacement for both the GRID K1 and GRID K2 cards.  The K1 and K2 cards will not be discontinued, though – they will still be sold and supported for the foreseeable future.