Getting Started With UEM Part 2: Laying The Foundation – File Services

In my last post on UEM, I discussed the components and key considerations that go into deploying VMware UEM.  UEM is made up of multiple components that rely on a common infrastructure of file shares and Group Policy to manage the user environment, and in this post, we will cover how to deploy the file share infrastructure.

There are two file shares that we will be deploying.  These file shares are:

  • UEM Configuration File Share
  • UEM User Data Share

Configuration File Share

The first of the two UEM file shares is the configuration file share.  This file share holds the configuration data used by the UEM agent that is installed in the virtual desktops or RDSH servers.

The UEM configuration share contains a few important subfolders.  These subfolders are created by the UEM management console during it’s initial setup, and they align with various tabs in the UEM Management Console.  We will discuss this more in a future article on using the UEM Management console.

  • General – This is the primary subfolder on the configuration share, and it contains the main configuration files for the agent.
  • FlexRepository – This subfolder under General contains all of the settings configured on the “User Environment” tab.  The settings in this folder tell the UEM agent how to configure policies such as Application Blocking, Horizon Smart Policies, and ADMX-based settings.

Administrators can create their own subfolders for organizing application and Windows  personalization.  These are created in the user personalization tab, and when a folder is created in the UEM Management Console, it is also created on the configuration share.  Some folders that I use in my environment are:

  • Applications – This is the first subfolder underneath the General folder.  This folder contains the INI files that tell the UEM agent how to manage application personalization.  The Applications folder makes up one part of the “Personalization” tab.
  • Windows Settings – This folder contains the INI files that tell the UEM agent how to manage the Windows environment personalization.  The Windows Settings folder makes up the other part of the Personalization tab.

Some environments are a little more complex, and they require additional configuration sets for different use cases.  UEM can create a silo for specific settings that should only be applied to certain users or groups of machines.  A silo can have any folder structure you choose to set up – it can be a single application configuration file or it can be an entire set of configurations with multiple sub-folders.  Each silo also requires its own Group Policy configuration.

User Data File Share

The second UEM file share is the user data file share.  This file share holds the user data that is managed by UEM.  This is where any captured application profiles are stored. It can also contain other user data that may not be managed by UEM such as folders managed by Windows Folder Redirection.  I’ve seen instances where the UEM User Data Share also contained other data to provide a single location where all user data is stored.

The key thing to remember about this share is that it is a user data share.  These folders belong to the user, and they should be secured so that other users cannot access them.  IT administrators, system processes such as antivirus and backup engines, and, if allowed by policy, the helpdesk should also have access to these folders to support the environment.

User application settings data is stored on the share.  This consists of registry keys and files and folders from the local user profile.  When this data is captured by the UEM agent, it is compressed in a zip file before being written out to the network.  The user data folder also can contain backup copies of user settings, so if an application gets corrupted, the helpdesk or the user themselves can easily roll back to the last configuration.

UEM also allows log data to be stored on the user data share.  The log contains information about activities that the UEM agent performs during logon, application launch and close, and logoff, and it provides a wealth of troubleshooting information for administrators.

UEM Shared Folder Replication

VMware UEM is perfect for multi-site end-user computing environments because it only reads settings and data at logon and writes back to the share at user logoff.  If FlexDirect is enabled for applications, it will also read during an application launch and write back when the last instance of the application is closed.  This means that it is possible to replicate UEM data to other file shares, and the risk of file corruption is minimized due to file locks being minimized.

Both the UEM Configuration Share and the UEM User Data share can be replicated using various file replication technologies.

DFS Namespaces

As environments grow or servers are retired, this UEM data may need to be moved to new locations.  Or it may need to exist in multiple locations to support multiple sites.  In order to simplify the configuration of UEM and minimize the number of changes that are required to Group Policy or other configurations, I recommend using DFS Namespaces to provide a single namespace for the file shares.  This allows users to use a single path to access the file shares regardless of their location or the servers that the data is located on.

UEM Share Permissions

It’s not safe assume that everyone is using Windows-based file servers to provide file services in their environment.  Because of that, setting up network shares is beyond the scope of this post.  The process of creating the share and applying security varies based on the device hosting the share.

The required Share and NTFS/File permissions are listed in the table below. These contain the basic permissions that are required to use UEM.  The share permissions required for the HelpDesk tool are not included in the table.

Share Share Permissions NTFS Permissions
UEMConfiguration Administrators: Full Control

UEM Admins: Change

Authenticated Users: Read

Administrators: Full Control

UEM Admins: Full Control

Authenticated Users: Read and Execute

UserData Administrators: Full Control

UEM Admins: Full Control

Authenticated Users: Change

Administrators: Full Control

UEM Admins: Full Control

Authenticated Users (This folder Only):

Read and Execute

Create Folders/Append Data

Creator Owner (Subfolders and files only):

Full Control

Wrapup and Next Steps

This post just provided a basic overview of the required UEM file shares and user permissions.  If you’re planning to do a multi-site environment or have multiple servers, this would be a good time to configure replication.

The next post in this series will cover the setup and initial configuration of the UEM management infrastructure.  This includes setting up the management console and configuring Group Policy.

Moving to the Cloud? Don’t Forget End-User Experience

The cloud has a lot to offer IT departments.  It provides the benefits of virtualization in a consumption-based model, and it allows new applications to quickly be deployed while waiting for, or even completely forgoing, on-premises infrastructure.  This can provide a better time-to-value and greater flexibility for the business.  It can help organizations reduce, or eliminate, their on-premises data center footprint.

But while the cloud has a lot of potential to disrupt how IT manages applications in the data center, it also has the potential to disrupt how IT delivers services to end users.

In order to understand how cloud will disrupt end-user computing, we first need to look at how organizations are adopting the cloud.  We also need to look at how the cloud can change application development patterns, and how that will change how IT delivers services to end users.

The Current State of Cloud

When people talk about cloud, they’re usually talking about three different types of services.  These services, and their definitions, are:

  • Infrastructure-as-a-Service: Running virtual machines in a hosted, multi-tenant virtual data center.
  • Platform-as-a-Service: Allows developers to subscribe to build applications without having to build the supporting infrastructure.  The platform can include some combination of web services, application run time services (like .Net or Java), databases, message bus services, and other managed components.
  • Software-as-a-Service: Subscription to a vendor hosted and managed application.

The best analogy to explain this comparing the different cloud offerings with different types of pizza restaurants using the graphic below from episerver.com:

pizza

Image retrieved from: http://www.episerver.com/learn/resources/blog/fred-bals/pizza-as-a-service/

So what does this have to do with End-User Computing?

Today, it seems like enterprises that are adopting cloud are going in one of two directions.  The first is migrating their data centers into infrastructure-as-a-service offerings with some platform-as-a-service mixed in.  The other direction is replacing applications with software-as-a-service options.  The former is migrating your applications to Azure or AWS EC2, the latter is replacing on-premises services with options like ServiceNow or Microsoft Office 365.

Both options can present challenges to how enterprises deliver applications to end-users.  And the choices made when migrating on-premises applications to the cloud can greatly impact end-user experience.

The challenges around software-as-a-service deal more with identity management, so this post will focus on migrating on-premises applications to the cloud.

Know Thy Applications – Infrastructure-As-A-Service and EUC Challenges

Infrastructure-as-a-Service offerings provide IT organizations with virtual machines running in a cloud service.  These offerings provide different virtual machines optimized for different tasks, and they provide the flexibility to meet the various needs of an enterprise IT organization.  They allow IT organizations to bring their on-premises business applications into the cloud.

The lifeblood of many businesses is Win32 applications.  Whether they are commercial or developed in house, these applications are often critical to some portion of a business process.  Many of these applications were never designed with high availability or the cloud in mind, and the developer and/or the source code may be long gone.  Or they might not be easily replaced because they are deeply integrated into critical processes or other enterprise systems.

Many Win32 applications have clients that expect to connect to local servers.  But when you move those servers to a remote datacenter, including the cloud, it can introduce problems that makes the application nearly unusable.  Common problems that users encounter are longer application load times, increased transaction times, and reports taking longer to preview and/or print.

These problems make employees less productive, and it has an impact on the efficiency and profitability of the business.

A few jobs ago, I was working for a company that had its headquarters, local office, and data center co-located in the same building.  They also had a number of other regional offices scattered across our state and the country.  The company had grown to the point where they were running out of space, and they decided to split the corporate and local offices.  The corporate team moved to a new building a few miles away, but the data center remained in the building.

Many of the corporate employees were users of a two-tier business application, and the application client connected directly to the database server.  Moving users of a fat client application a few miles down the road from the database server had a significant impact on application performance and user experience.  Application response suffered, and user complaints rose.  Critical business processes took longer, and productivity suffered as a result.

More bandwidth was procured. That didn’t solve the issue, and IT was sent scrambling for a new solution.  Eventually, these issues were addressed with a solution that was already in use for other areas of the business – placing the core applications into Windows Terminal Services and provide users at the corporate office with a published desktop that provided their required applications.

This solution solved their user experience and application performance problems.  But it required other adjustments to the server environment, business process workflows, and how users interact with the technology that enables them to work.  It took time for users to adjust to the changes.  Many of the issues were addressed when the business moved everything to a colocation facility a hundred miles away a few months later.

Ensuring Success When Migrating Applications to the Cloud

The business has said it’s time to move some applications to the cloud.  How do you ensure it’s a success and meets the business and technical requirements of that application while making sure an angry mob of users don’t show up at your office with torches and pitchforks?

The first thing is to understand your application portfolio.  That understanding goes beyond having visibility into what applications you have in your environment and how those applications work from a technical perspective.  You need wholistic view of your applications and  keep the following questions in mind:

  • Who uses the application?
  • What do the users do in the application?
  • How do the users access the application?
  • Where does it fit into business processes and workflows?
  • What other business systems does the application integrate with?
  • How is that integration handled?

Applications rarely exist in a vacuum, and making changes to one not only impacts the users, but it can impact other applications and business processes as well.

By understanding your applications, you will be able to build a roadmap of when applications should migrate to the cloud and effectively mitigate any impacts to both user experience and enterprise integrations.

The second thing is to test it extensively.  The testing needs to be more extensive than functional testing to ensure that the application will run on the server images built by the cloud providers, and it needs to include extensive user experience and user acceptance testing.  This may include spending time with users measuring tasks with a stop-watch to compare how long tasks take in cloud-hosted systems versus on-premises systems.

If application performance isn’t up to user standards and has a significant impact on productivity, you may need to start investigating solutions for bringing users closer to the cloud-hosted applications.  This includes solutions like Citrix, VMware Horizon Cloud, or Amazon Workspaces or AppStream. These solutions bring users closer to the applications, and it can give users an on-premises experience in the cloud.

The third thing is to plan ahead.  Having a roadmap and knowing your application portfolio enables you to plan for when you need capacity or specific features to support users, and it can guide your architecture and product selection.  You don’t want to get three years into a five year migration and find out that the solution you selected doesn’t have the features you require for a use case or that the environment wasn’t architected to support the number of users.

When planning to migrate applications from your on-premises datacenters to an infrastructure-as-a-service offering, it’s important to know your applications and take end-user experience into account.   It’s important to test, and understand, how these applications perform when the application servers and databases are remote to the application client.  If you don’t, you not only anger your users, but you also make them less productive and less profitable overall.

 

VDI in the Time of Frequent Windows 10 Upgrades

The longevity of Windows 7, and Windows XP before that, has spoiled many customers and enterprises.  It provided IT organizations with a stable base to build their end-user computing infrastructures and applications on, and users were provided with a consistent experience.  The update model was fairly well known – a major service pack with all updates and feature enhancements would come out after about one year.

Whether this stability was good for organizations is debatable.  It certainly came with trade-offs, security of the endpoint being the primary one.

The introduction of Windows 10 has changed that model, and Microsoft is continuing to refine that model.  Microsoft is now releasing two major “feature updates” for Windows 10 each year, and these updates will only be supported for about 18 months each.  Microsoft calls this the “Windows as a Service” model, and it consists of two production-ready semi-annual release channels – a targeted deployment that is used to pilot users to test applications, and a broad deployment that replaces the “Current Branch for Business” option for enterprises.

Gone are the days where the end user’s desktop will have the same operating system for it’s entire life cycle.

(Note: While there is still a long-term servicing branch, Microsoft has repeatedly stated that this branch is suited for appliances and “machinery” that should not receive frequent feature updates such as ATMs and medical equipment.)

In order to facilitate this new delivery model, Microsoft has refined their in-place operating system upgrade technology.  While it has been possible to do this for years with previous versions of Windows, it was often flaky.  Settings wouldn’t port over properly, applications would refuse to run, and other weird errors would crop up.  That’s mostly a thing of the past when working with physical Windows 10 endpoints.

Virtual desktops, however, don’t seem to handle in-place upgrades well.  Virtual desktops often utilize various additional agents to deliver desktops remotely to users, and the in-place upgrade process can break these agents or cause otherwise unexpected behavior.  They also have a tendancy to reinstall Windows Modern Applications that have been removed or reset settings (although Microsoft is supposed to be working on those items).

If Windows 10 feature release upgrades can break, or at least require significant rework of, existing VDI images, what is the best method for handling them in a VDI environment?

I see two main options.  The first is to manually uninstall the VDI agents from the parent VMs, take a snapshot, and then do an in-place upgrade.  After the upgrade is complete, the VDI agents would need to be reinstalled on the machine.  In my opinion, this option has a couple of drawbacks.

First, it requires a significant amount of time.  While there are a number of steps that could be automated, validating the image after the upgrade would still require an administrator.  Someone would have to log in to validate that all settings were carried over properly and that Modern Applications were not reinstalled.  This may become a significant time sink if I have multiple parent desktop images.

Second, this process wouldn’t scale well.  If I have a large number of parent images, or a large estate of persistent desktops, I have to build a workflow to remove agents, upgrade Windows, and reinstall agents after the upgrade.  Not only do I have to test this workflow significantly, but I still have to test my desktops to ensure that the upgrade didn’t break any applications.

The second option, in my view, is to rebuild the desktop image when each new version of Windows 10 is released.  This ensures that you have a clean OS and application installation with every new release, and it would require less testing to validate because I don’t have to check to see what broke during the upgrade process.

One of the main drawbacks to this approach is that image building is a time consuming process.  This is where automated deployments can be helpful.  Tools like Microsoft Deployment Toolkit can help administrators build their parent images, including any agents and required applications, automatically as part of a task sequence.  With this type of toolkit, and administrator can automate their build process so that when a new version of Windows 10 is released, or a core desktop component like the Horizon or XenDesktop agent is updated, the image will have the latest software the next time a new build is started.

(Note: MDT is not the only tool in this category.  It is, however, the one I’m most familiar with.  It’s also the tool that Trond Haavarstein, @XenAppBlog, used for his Automation Framework Tool.)

Let’s take this one step further.  As an administrator, I would be doing a new Windows 10 build every 6 months to a year to ensure that my virtual desktop images remain on a supported version of Windows.  At some point, I’ll want to do more than just automate the Windows installation so that my end result, a fully configured virtual desktop that is deployment ready, is available at the push of a button.  This can include things like bringing it into Citrix Provisioning Services or shutting it down and taking a snapshot for VMware Horizon.

Virtualization has allowed for significant automation in the data center.  Tools like VMware PowerCLI and the Nutanix REST API make it easy for administrators to deploy and manage virtual machines using a few lines of PowerShell.   Using these same tools, I can also take details from this virtual machine shell, such as the name and MAC address, and inject them into my MDT database along with a Task Sequence and role.  When I power the VM on, it will automatically boot to MDT and start the task sequence that has been defined.

This is bringing “Infrastructure as Code” concepts to end-user computing, and the results should make it easier for administrators to test and deploy the latest versions of Windows 10 while reducing their management overhead.

I’m in the process of working through the last bits to automate the VM creation and integration with MDT, and I hope to have something to show in the next couple of weeks.

 

Getting Started with VMware UEM

One of the most important aspects of any end-user computing environment is user experience, and a big part of user experience is managing the user’s Windows and application preferences.  This is especially true in non-persistent environments and published application environments where the user may not log into the same machine each time.

So why is this important?  A big part of a user’s experience on any desktop is maintaining their customizations.  Users invest time into personalizing their environment by setting a desktop background, creating an Outlook signature, or configuring the applications to connect to the correct datasets, and the ability to retain these settings make users more productive because they don’t have to recreate these every time they log in or open the application.

User settings portability is nothing new.  Microsoft Roaming Profiles have been around for a long time.  But Roaming Profiles also have limitations, such as casting a large net by moving the entire profile (or the App Data roaming folder on newer versions of Windows) or being tied to specific versions of Windows.

VMware User Environment Manager, or UEM for short, is one of a few 3rd-party user environment management tools that can provide a lighter-weight solution than Roaming Profiles.  UEM can manage both the user’s personalization of the environment by capturing Windows and application settings as well as apply settings to the desktop or RDSH session based on the user’s context.  This can include things like setting up network drives and printers, Horizon Smart Policies to control various Horizon features, and acting as a Group Policy replacement for per-user settings.

UEM Components

There are four main components for VMware UEM.  The components are:

  • UEM Management Console – The central console for managing the UEM configuration
  • UEM Agent – The local agent installed on the virtual desktop, RDSH server, or physical machine
  • Configuration File Share – Network File Share where UEM configuration data is stored
  • User Data File Share – Network File Share where user data is stored.  Depending on the environment and the options used, this can be multiple file shares.

The UEM Console is the central management tool for UEM.  The console does not require a database, and anything that is configured in the console is saved as a text file on the configuration file share.  The agent consumes these configuration files from the configuration share during logon and logoff, and it saves the application or Windows settings configuration when the application is closed or when the user logs off, and it stores them on the user data share as a ZIP file.

The UEM Agent also includes a few other optional tools.  These are a Self-Service Tool, which allows users to restore application configurations from a backup, and an Application Migration Tool.  The Application Migration Tool allows UEM to convert settings from one version of an application to another when the vendor uses different registry keys and AppData folders for different versions.  Microsoft Office is the primary use case for this feature, although other applications may require it as well.

UEM also includes a couple of additional tools to assist administrators with maintaining environment.  The first of these tools is the Application Profiler Tool.  This tool runs on a desktop or an RDSH Server in lieu of the UEM Agent.  Administrators can use this tool to create UEM profiles for applications, and it does this by running the application and tracking where the application writes to.  It can also be used to create default settings that are applied to an application when a user launches it, and this can be used to reduce the amount of time it takes to get users applications configured for the first time.

The other support tool is the Help Desk support tool.  The Helpdesk support tool allows helpdesk agents or other IT support to restore a backup of a user settings archive.

Planning for a UEM Deployment

There are a couple of questions you need to ask when deploying UEM.

  1. How many configuration shares will I have, and where will they be placed? – In multisite environments, I may need multiple configuration shares so the configs are placed near the desktop environments.
  2. How many user data shares will I need, and where will they be placed?  – This is another factor in multi-site environments.  It is also a factor in how I design my overall user data file structure if I’m using other features like folder redirection.  Do I want to keep all my user data together to make it easier to manage and back up, or do I want to place it on multiple file shares.
  3. Will I be using file replication technology? What replication technology will be used? – A third consideration for multi-site environments.  How am I replicating my data between sites?
  4. What URL/Name will be used to access the shares? – Will some sort of global namespace, like a DFS Namespace, be used to provide a single name for accessing the shares?  Or will each server be accessed individually?  This can have some implications around configuring Group Policy and how users are referred to the nearest file server.
  5. Where will I run the management console?  Who will have access to it?
  6. Will I configure UEM to create backup copies of user settings?  How many backup copies will be created?

These are the main questions that come up from an infrastructure and architecture perspective, and they influence how the UEM file shares and Group Policy objects will be configured.

Since UEM does not require a database, and it does not actively use files on a network share, planning for multisite deployments is relatively straight forward.

In the next post, I’ll talk about deploying the UEM supporting infrastructure.

Configuring a Headless CentOS Virtual Machine for NVIDIA GRID vGPU #blogtober

When IT administrators think of GPUs, the first thing that comes to mind for many is gaming.  But GPUs also have business applications.  They’re mainly found in high end workstations to support graphics intensive applications like 3D CAD and medical imaging.

But GPUs will have other uses in the enterprise.  Many of the emerging technologies, such as artificial intelligence and deep learning, utilize GPUs to perform compute operations.  These will start finding their way into the data center, either as part of line-of-business applications or as part of IT operations tools.  This could also allow the business to utilize GRID environments after hours for other forms of data processing.

This guide will show you how to build headless virtual machines that can take advantage of NVIDIA GRID vGPU for GPU compute and CUDA.  In order to do this, you will need to have a Pascal Series NVIDIA Tesla card such as the P4, P40, or P100 and the GRID 5.0 drivers.  The GRID components will also need to be configured in your hypervisor, and you will need to have the GRID drivers for Linux.

I’ll be using CentOS 7.x for this guide.  My base CentOS configuration is a minimal install with no graphical shell and a few additional packages like Nano and Open VM Tools.  I use Bob Planker’s guide for preparing my VM as a template.

The steps for setting up a headless CentOS VM with GRID are:

  1. Deploy your CentOS VM.  This can be from an existing template or installed from scratch.  This VM should not have a graphical shell installed, or it should be in a run mode that does not execute the GUI.
  2. Attach a GRID profile to the virtual machine by adding a shared PCI device in vCenter.  The selected profile will need to be one of the Virtual Workstation profiles, and these all end with a Q.
  3. GRID requires a 100% memory reservation.  When you add an NVIDIA GRID shared PCI device, there will be an associated prompt to reserve all system memory.
  4. Update the VM to ensure all applications and components are the latest version using the following command:
    yum update -y
  5. In order to build the GRID driver for Linux, you will need to install a few additional packages.  Install these packages with the following command:
    yum install -y epel-release dkms libstdc++.i686 gcc kernel-devel 
  6. Copy the Linux GRID drivers to your VM using a tool like WinSCP.  I generally place the files in /tmp.
  7. Make the driver package executable with the following command:
    chmod +X NVIDIA-Linux-x86_64-384.73-grid.run
  8. Execute the driver package.  When we execute this, we will also be adding the –dkms flag to support Dynamic Kernel Module Support.  This will enable the system to automatically recompile the driver whenever a kernel update is installed.  The commands to run the the driver install are:
    bash ./NVIDIA-Linux-x86_64-384.73-grid.run –dkms
  9. When prompted, select yes to register the kernel module sources with DKMS by selecting Yes and pressing Enter.Headless 1
  10. You may receive an error about the installer not being able to locate the X Server path.  Click OK.  It is safe to ignore this error.Headless 2
  11. Install the 32-bit Compatibility Libraries by selecting Yes and pressing Enter.Headless 3
  12. At this point, the installer will start to build the DKMS module and install the driver.  Headless 4
  13. After the install completes, you will be prompted to use the nvidia-xconfig utility to update your X Server configuration.  X Server should not be installed because this is a headless machine, so select No and press Enter.Headless 5
  14. The install is complete.  Press Enter to exit the installer.Headless 6
  15. To validate that the NVIDIA drivers are installed and running properly, run nvidia-smi to get the status of the video card.  headless 7
  16. Next, we’ll need to configure GRID licensing.  We’ll need to create the GRID licensing file from a template supplied by NVIDIA with the following command:
    cp  /etc/nvidia/gridd.conf.template  /etc/nvidia/gridd.conf
  17. Edit the GRID licensing file using the text editor of your choice.  I prefer Nano, so the command I would use is:
    nano  /etc/nvidia/gridd.conf
  18. Fill in the ServerAddress and BackupServerAddress fields with the fully-qualified domain name or IP addresses of your licensing servers.
  19. Set the FeatureType to 2 to configure the system to retrieve a Virtual Workstation license.  The Virtual Workstation license is required to support the CUDA features for GPU Compute.
  20. Save the license file.
  21. Restart the GRID Service with the following command:
    service nvidia-gridd restart
  22. Validate that the machine retrieved a license with the following command:
    grep gridd /var/log/messages
  23. Download the NVIDIA CUDA Toolkit.
    wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
  24. Make the toolkit installer executable.
    chmod +x cuda_9.0.176_384.81_linux-run.sh
  25. Execute the CUDA Toolkit installer.
    bash cuda_9.0.176_384.81_linux-run.sh
  26. Accept the EULA.
  27. You will be prompted to download the CUDA Driver.  Press N to decline the new driver. This driver does not match the NVIDIA GRID driver version, and it will break the NVIDIA setup.  The GRID driver in the VM has to match the GRID software that is installed in the hypervisor.
  28. When prompted to install the CUDA 9.0 toolkit, press Y.
  29. Accept the Default Location for the CUDA toolkit.
  30. When prompted to create a symlink at /usr/local/cuda, press Y.
  31. When prompted to install the CUDA 9.0 samples, press Y.
  32. Accept the default location for the samples.Headless 8
  33. Reboot the virtual machine.
  34. Log in and run nvidia-smi again.  Validate that you get the table output similar to step 15.  If you do not receive this, and you get an error, it means that you likely installed the driver that is included with the CUDA toolkit.  If that happens, you will need to start over.

At this point, you have a headless VM with the NVIDIA Drivers and CUDA Toolkit installed.  So what can you do with this?  Just about anything that requires CUDA.  You can experiment with deep learning frameworks like Tensorflow, build virtual render nodes for tools like Blender, or even use Matlab for GPU compute.

Nutanix Xtract for VMs #blogtober

One of the defining features of the Nutanix platform is simplicity.  Innovations like the Prism interface for infrastructure management and One-Click Upgrades for both the Nutanix software-defined storage platform and supported hypervisors have lowered the management burden of on-premises infrastructure.

Nutanix is now looking to bring that same level of simplicity to migrating virtual machines to a new hypervisor.  Nutanix has released a new tool today called Xtract for VM.  This tool, which is free to all Nutanix customers, brings the same one-click simplicity that Nutanix is known for to migrating workloads from ESXi to AHV.

So how does Xtract for VM differentiate from other migration tools?  First, it is an agentless migration tool.  Xtract will communicate with vCenter to get a list of VMs that are in the ESXi infrastructure, and it will build a migration plan and synchronize the VM data from ESXi to AHV.

During data synchronization and migration, Xtract will insert the AHV device drivers into the virtual machine.  It will also capture and preserve the network configuration, so the VM will not lose connectivity or require administrator intervention after the migration is complete.

Xtract1

By injecting the AHV drivers and preserving the network configuration during the data synchronization and cutover, Xtract is able to perform cross-hypervisor migrations with minimal downtime.  And since the original VM is not touched during the migration, rollback is as easy as shutting down the AHV VM and powering the ESXi VM back on, which significantly reduces the risk of cross-hypervisor migrations.

Analysis

The datacenter is clearly changing, and we now live in a multi-hypervisor world.  While many customers will still run VMware for their on-premises environments, there are many that are looking to reduce their spend on hypervisor products.  Xtract for VMs provides a tool to help reduce that cost while providing the simplicity that Nutanix is known for.

While Xtract is currently version 1.0, I can see this technology be a pivotal for helping customers move workloads between on-premises and cloud infrastructures.

To learn more about this new tool, you can check out the Xtract for VMs page on Nutanix’s webpage.

Coming Soon – The Virtual Horizon Podcast #blogtober

I’ve been blogging now for about seven or so years, and The Virtual Horizon has existed in it’s current form for about two or three years.

So what’s next?  Besides for more blogging, that is…

It’s time to go multimedia.  In the next few weeks, I will be launching The Virtual Horizon Podcast.  The podcast will only partially focus on the latest in end-user computing, and I hope to cover other topics such as career development, community involvement, and even other technologies that exist outside of the EUC space.

I’m still working out some of the logistics and workflow, but the first episode has already been recorded.  It should post in the next couple of weeks.

So keep an eye out here.  I’ll be adding a new section to the page once we’re a bit closer to go-live.

Announcing Rubrik 4.1 – The Microsoft Release

Rubrik has made significant enhancements to their platform since they came out of stealth just over two years ago, and their platform has grown from an innovative way to bring together software and hardware to solve virtualization backup challenges to a robust data protection platform due to their extremely aggressive release schedule.

Yesterday, Rubrik is announcing version 4.1.  The latest version builds on the already strong offerings in the Alta release that came out just a few months ago.  This release, in particular, is heavily focused on the Microsoft stacks, and there is also a heavy focus on cloud.

So what’s new in Rubrik 4.1?

Multi-Tenancy

The major enhancement is multi-tenancy support.  Rubrik 4.1 will now support dividing up a single physical Rubrik cluster into multiple Organizations.  Organizations are logical management units inside a physical Rubrik cluster, and each organization can manage their own logical objects such as users, protected objects, SLA domains, and replication targets.  This new multi-tenancy model is designed to meet the needs of service provider organizations, where multiple customers may use Rubrik as a backup target, as well as large enterprises that have multiple IT organizations.

In order to support the new multi-tenancy feature, Rubrik is adding role-based access control with multiple levels of access.  This will allow application owners and administrators to get limited access to Rubrik to manage their particular resources.

Azure, Azure Stack, and Hyper-V

One of the big foci of the Rubrik 4.1 release is Microsoft, and Rubrik has enhanced their Microsoft platform support.

The first major enhancement to Rubrik’s Microsoft platform offering is Azure Stack support.  Rubrik will be able to integrate with Azure Stack and provide protection to customer workloads running on this platform.

The second major enhancement is to the CloudOn App Instantiation feature.  CloudOn was released in Alta, and it enables customers to power-on VM snapshots in the public cloud.  The initial release supported AWS, and Rubrik is now adding support for Azure.

SQL Server Always-On Support

Rubrik is expanding it’s agent-based SQL Server backup support to Always-On Availability Groups.  In the current release, Rubrik will detect if a SQL Server is part of an availability group, but it requires an administrator to manually apply an SLA policy to databases.  If there is a failover in the availability group, a manual intervention would be required to change the replica that was being protected.  This could be an issue with 2-node availability groups as a node failure, or server reboot, would cause a failover that could impact SLAs on the protected databases.

Rubrik 4.1 will now detect the configuration of a SQL Server, including availability groups.  Based on the configuration, Rubrik will dynamically select the replica to back up.  If a failover occurs, Rubrik will select a different replica in the availability group to use as a backup source.  This feature is only supported on synchronous commit availability groups.

Google Cloud Storage Support

Google Cloud is now supported as a cloud archive target, and all Google Cloud storage tiers are supported.

AWS Glacier and GovCloud Support

One feature that has been requested multiple times since Rubrik was released was support for AWS Glacier for long-term storage retention.  Rubrik 4.1 now adds support for Glacier as an archive location.

Also in the 4.1 release is support for AWS GovCloud.  This will allow government entities with Rubrik to utilize AWS as a cloud archive.

Thoughts

Rubrik has had an aggressive release schedule since Day 1.  And they don’t seem to be letting up on quickly adding features.  The 4.1 release does not disappoint in this category.

The feature I’m most excited about is the enhanced support for SQL Always-On Availability Groups.  While Rubrik can detect if a database is part of an AG today, the ability to dynamically select the instance to back up is key for organizations that have smaller AGs or utilize the basic 2-node AG feature in SQL Server 2016.

 

vMotion Support for NVIDIA vGPU is Coming…And It’ll Be Bigger than you Think

One of the cooler tech announcements at VMworld 2017 was on display at the NVIDIA booth.  It wasn’t really an announcement, per se, but more of a demonstration of a long awaited solution to a very difficult challenge in the virtualization space.

NVIDIA displayed a tech demo of vMotion support for VMs with GRID vGPU running on ESXi.  Along with this demo was news that they had also solved the problem of suspend and resume on vGPU enabled machines, and these solutions would be included in future product releases.  NVIDIA announced live migration support for XenServer earlier this year.

Rob Beekmans (Twitter: @robbeekmans) also wrote about this recently, and his blog has video showing the tech demos in action.

I want to clarify that these are tech demos, not tech previews.  Tech Previews, in VMware EUC terms, usually means a feature that is in beta or pre-release to get real-world feedback.  These demos likely occurred on a development version of a future ESXi release, and there is no projected timeline as to when they will be released as part of a product.

Challenges to Enabling vMotion Support for vGPU

So you’re probably thinking “What’s the big deal? vMotion is old hat now.”  But when vGPU is enabled on a virtual machine, it requires that VM to have direct, but shared, access to physical hardware on the system – in this case, a GPU.  And vMotion never worked if a VM had direct access to hardware – be it a PCI device that was passed through or something plugged into a USB port.

If we look at how vGPU works, each VM has a shared PCI device added to it.  This shared PCI device provides shared access to a physical card.  To facilitate this access, each VM gets a portion of the GPU’s Base Address Register (BAR), or the hardware level interface between the machine and the PCI card.  In order to make this portable, there has to be some method of virtualizing the BAR.  A VM that migrates may not get the same address space on the BAR when it moves to a new host, and any changes to that would likely cause issues to Windows or any jobs that the VM has placed on the GPU.

There is another challenge to enabling vMotion support for vGPU.  Think about what a GPU is – it’s a (massively parallel) processor with dedicated RAM.  When you add a GPU into a VM, you’re essentially attaching a 2nd system to the VM, and the data that is in the GPU framebuffer and processor queues needs to be migrated along with the CPU, system RAM, and system state.  So this requires extra coordination to ensure that the GPU releases things so they can be migrated to the new host, and it has to be done in a way that doesn’t impact performance for other users or applications that may be sharing the GPU.

Suspend and Resume is another challenge that is very similar to vMotion support.  Suspending a VM basically hibernates the VM.  All current state information about the VM is saved to disk, and the hardware resources are released.  Instead of sending data to another machine, it needs to be written to a state file on disk.  This includes the GPU state.  When the VM is resumed, it may not get placed on the same host and/or GPU, but all the saved state needs to be restored.

Hardware Preemption and CUDA Support on Pascal

The August 2016 GRID release included support for the Pascal-series cards.  Pascal series cards include hardware support for preemption.  This is important for GRID because it uses time-slicing to share access to the GPU across multiple VMs.  When a time-slice expires, it moves onto the next VM.

This can cause issues when using GRID to run CUDA jobs.  CUDA jobs can be very long running, and the job is stopped when the time-slice is expired.  Hardware preemption enables long-running CUDA tasks to be interrupted and paused when the time-slice expires, and those jobs are resumed when that VM gets a new time-slice.

So why is this important?  In previous versions of GRID, CUDA was only available and supported on the largest profiles.  So to support the applications that required CUDA in a virtual virtual environment, and entire GPU would need to be dedicated to the VM. This could be a significant overallocation of resources, and it significantly reduced the density on a host.  If a customer was using M60s, which have two GPUs per card, then they may be limited to 4 machines with GPU access if they needed CUDA support.

With Pascal cards and the latest GRID software, CUDA support is enabled on all vDWS profiles (the ones that end with a Q).  Now customers can provide CUDA-enabled vGPU profiles to virtual machines without having to dedicate an entire GPU to one machine.

This has two benefits.  First, it enables more features in the high-end 3D applications that run on virtual workstations.  Not only can these machines be used for design, they can now utilize the GPU to run models or simulations.

The second benefit has nothing to do with virtual desktops or applications.  It actually allows GPU-enabled server applications to be fully virtualized.  This potentially means things like render farms or, in a future looking state, virtualized AI inference engines for business applications or infrastructure support services.  One potentially interesting use case for this is running MapD, a database that runs entirely in the GPU, on a virtual machine.

Analysis

GPUs have the ability to revolutionize enterprise applications in the data center.  They can potentially bring artificial intelligence, deep learning, and massively parallel computing to business apps.

vMotion support is critical in enabling enterprise applications in virtual environments.  The ability to move applications and servers around is important to keeping services available.

By enabling hardware preemption and vMotion support, it now becomes possible to virtualize the next generation of business applications.  These applications will require a GPU and CUDA support to improve performance or utilize deep learning algorithms.  Applications that require a GPU and CUDA support can be moved around in the datacenter without impacting the running workloads, maintaining availability and keeping active jobs running so they do not have to be restarted.

This also opens up new opportunities to better utilize data center resources.  If I have a large VDI footprint that utilizes GRID, I can’t vMotion any running desktops today to consolidate them on particular hosts.  If I can use vMotion to consolidate these desktops, I can utilize the remaining hosts with GPUs to perform other tasks with GPUs such as turning them into render farms, after-hours data processing with GPUs, or other tasks.

This may not seem important now.  But I believe that deep learning/artificial intelligence will become a critical feature in business applications, and the ability to turn my VDI hosts into something else after-hours will help enable these next generation applications.