Configuring a Headless CentOS Virtual Machine for NVIDIA GRID vGPU #blogtober

When IT administrators think of GPUs, the first thing that comes to mind for many is gaming.  But GPUs also have business applications.  They’re mainly found in high end workstations to support graphics intensive applications like 3D CAD and medical imaging.

But GPUs will have other uses in the enterprise.  Many of the emerging technologies, such as artificial intelligence and deep learning, utilize GPUs to perform compute operations.  These will start finding their way into the data center, either as part of line-of-business applications or as part of IT operations tools.  This could also allow the business to utilize GRID environments after hours for other forms of data processing.

This guide will show you how to build headless virtual machines that can take advantage of NVIDIA GRID vGPU for GPU compute and CUDA.  In order to do this, you will need to have a Pascal Series NVIDIA Tesla card such as the P4, P40, or P100 and the GRID 5.0 drivers.  The GRID components will also need to be configured in your hypervisor, and you will need to have the GRID drivers for Linux.

I’ll be using CentOS 7.x for this guide.  My base CentOS configuration is a minimal install with no graphical shell and a few additional packages like Nano and Open VM Tools.  I use Bob Planker’s guide for preparing my VM as a template.

The steps for setting up a headless CentOS VM with GRID are:

  1. Deploy your CentOS VM.  This can be from an existing template or installed from scratch.  This VM should not have a graphical shell installed, or it should be in a run mode that does not execute the GUI.
  2. Attach a GRID profile to the virtual machine by adding a shared PCI device in vCenter.  The selected profile will need to be one of the Virtual Workstation profiles, and these all end with a Q.
  3. GRID requires a 100% memory reservation.  When you add an NVIDIA GRID shared PCI device, there will be an associated prompt to reserve all system memory.
  4. Update the VM to ensure all applications and components are the latest version using the following command:
    yum update -y
  5. In order to build the GRID driver for Linux, you will need to install a few additional packages.  Install these packages with the following command:
    yum install -y epel-release dkms libstdc++.i686 gcc kernel-devel 
  6. Copy the Linux GRID drivers to your VM using a tool like WinSCP.  I generally place the files in /tmp.
  7. Make the driver package executable with the following command:
    chmod +X NVIDIA-Linux-x86_64-384.73-grid.run
  8. Execute the driver package.  When we execute this, we will also be adding the –dkms flag to support Dynamic Kernel Module Support.  This will enable the system to automatically recompile the driver whenever a kernel update is installed.  The commands to run the the driver install are:
    bash ./NVIDIA-Linux-x86_64-384.73-grid.run –dkms
  9. When prompted, select yes to register the kernel module sources with DKMS by selecting Yes and pressing Enter.Headless 1
  10. You may receive an error about the installer not being able to locate the X Server path.  Click OK.  It is safe to ignore this error.Headless 2
  11. Install the 32-bit Compatibility Libraries by selecting Yes and pressing Enter.Headless 3
  12. At this point, the installer will start to build the DKMS module and install the driver.  Headless 4
  13. After the install completes, you will be prompted to use the nvidia-xconfig utility to update your X Server configuration.  X Server should not be installed because this is a headless machine, so select No and press Enter.Headless 5
  14. The install is complete.  Press Enter to exit the installer.Headless 6
  15. To validate that the NVIDIA drivers are installed and running properly, run nvidia-smi to get the status of the video card.  headless 7
  16. Next, we’ll need to configure GRID licensing.  We’ll need to create the GRID licensing file from a template supplied by NVIDIA with the following command:
    cp  /etc/nvidia/gridd.conf.template  /etc/nvidia/gridd.conf
  17. Edit the GRID licensing file using the text editor of your choice.  I prefer Nano, so the command I would use is:
    nano  /etc/nvidia/gridd.conf
  18. Fill in the ServerAddress and BackupServerAddress fields with the fully-qualified domain name or IP addresses of your licensing servers.
  19. Set the FeatureType to 2 to configure the system to retrieve a Virtual Workstation license.  The Virtual Workstation license is required to support the CUDA features for GPU Compute.
  20. Save the license file.
  21. Restart the GRID Service with the following command:
    service nvidia-gridd restart
  22. Validate that the machine retrieved a license with the following command:
    grep gridd /var/log/messages
  23. Download the NVIDIA CUDA Toolkit.
    wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda_9.0.176_384.81_linux-run
  24. Make the toolkit installer executable.
    chmod +x cuda_9.0.176_384.81_linux-run.sh
  25. Execute the CUDA Toolkit installer.
    bash cuda_9.0.176_384.81_linux-run.sh
  26. Accept the EULA.
  27. You will be prompted to download the CUDA Driver.  Press N to decline the new driver. This driver does not match the NVIDIA GRID driver version, and it will break the NVIDIA setup.  The GRID driver in the VM has to match the GRID software that is installed in the hypervisor.
  28. When prompted to install the CUDA 9.0 toolkit, press Y.
  29. Accept the Default Location for the CUDA toolkit.
  30. When prompted to create a symlink at /usr/local/cuda, press Y.
  31. When prompted to install the CUDA 9.0 samples, press Y.
  32. Accept the default location for the samples.Headless 8
  33. Reboot the virtual machine.
  34. Log in and run nvidia-smi again.  Validate that you get the table output similar to step 15.  If you do not receive this, and you get an error, it means that you likely installed the driver that is included with the CUDA toolkit.  If that happens, you will need to start over.

At this point, you have a headless VM with the NVIDIA Drivers and CUDA Toolkit installed.  So what can you do with this?  Just about anything that requires CUDA.  You can experiment with deep learning frameworks like Tensorflow, build virtual render nodes for tools like Blender, or even use Matlab for GPU compute.

Nutanix Xtract for VMs #blogtober

One of the defining features of the Nutanix platform is simplicity.  Innovations like the Prism interface for infrastructure management and One-Click Upgrades for both the Nutanix software-defined storage platform and supported hypervisors have lowered the management burden of on-premises infrastructure.

Nutanix is now looking to bring that same level of simplicity to migrating virtual machines to a new hypervisor.  Nutanix has released a new tool today called Xtract for VM.  This tool, which is free to all Nutanix customers, brings the same one-click simplicity that Nutanix is known for to migrating workloads from ESXi to AHV.

So how does Xtract for VM differentiate from other migration tools?  First, it is an agentless migration tool.  Xtract will communicate with vCenter to get a list of VMs that are in the ESXi infrastructure, and it will build a migration plan and synchronize the VM data from ESXi to AHV.

During data synchronization and migration, Xtract will insert the AHV device drivers into the virtual machine.  It will also capture and preserve the network configuration, so the VM will not lose connectivity or require administrator intervention after the migration is complete.

Xtract1

By injecting the AHV drivers and preserving the network configuration during the data synchronization and cutover, Xtract is able to perform cross-hypervisor migrations with minimal downtime.  And since the original VM is not touched during the migration, rollback is as easy as shutting down the AHV VM and powering the ESXi VM back on, which significantly reduces the risk of cross-hypervisor migrations.

Analysis

The datacenter is clearly changing, and we now live in a multi-hypervisor world.  While many customers will still run VMware for their on-premises environments, there are many that are looking to reduce their spend on hypervisor products.  Xtract for VMs provides a tool to help reduce that cost while providing the simplicity that Nutanix is known for.

While Xtract is currently version 1.0, I can see this technology be a pivotal for helping customers move workloads between on-premises and cloud infrastructures.

To learn more about this new tool, you can check out the Xtract for VMs page on Nutanix’s webpage.

Coming Soon – The Virtual Horizon Podcast #blogtober

I’ve been blogging now for about seven or so years, and The Virtual Horizon has existed in it’s current form for about two or three years.

So what’s next?  Besides for more blogging, that is…

It’s time to go multimedia.  In the next few weeks, I will be launching The Virtual Horizon Podcast.  The podcast will only partially focus on the latest in end-user computing, and I hope to cover other topics such as career development, community involvement, and even other technologies that exist outside of the EUC space.

I’m still working out some of the logistics and workflow, but the first episode has already been recorded.  It should post in the next couple of weeks.

So keep an eye out here.  I’ll be adding a new section to the page once we’re a bit closer to go-live.

Announcing Rubrik 4.1 – The Microsoft Release

Rubrik has made significant enhancements to their platform since they came out of stealth just over two years ago, and their platform has grown from an innovative way to bring together software and hardware to solve virtualization backup challenges to a robust data protection platform due to their extremely aggressive release schedule.

Yesterday, Rubrik is announcing version 4.1.  The latest version builds on the already strong offerings in the Alta release that came out just a few months ago.  This release, in particular, is heavily focused on the Microsoft stacks, and there is also a heavy focus on cloud.

So what’s new in Rubrik 4.1?

Multi-Tenancy

The major enhancement is multi-tenancy support.  Rubrik 4.1 will now support dividing up a single physical Rubrik cluster into multiple Organizations.  Organizations are logical management units inside a physical Rubrik cluster, and each organization can manage their own logical objects such as users, protected objects, SLA domains, and replication targets.  This new multi-tenancy model is designed to meet the needs of service provider organizations, where multiple customers may use Rubrik as a backup target, as well as large enterprises that have multiple IT organizations.

In order to support the new multi-tenancy feature, Rubrik is adding role-based access control with multiple levels of access.  This will allow application owners and administrators to get limited access to Rubrik to manage their particular resources.

Azure, Azure Stack, and Hyper-V

One of the big foci of the Rubrik 4.1 release is Microsoft, and Rubrik has enhanced their Microsoft platform support.

The first major enhancement to Rubrik’s Microsoft platform offering is Azure Stack support.  Rubrik will be able to integrate with Azure Stack and provide protection to customer workloads running on this platform.

The second major enhancement is to the CloudOn App Instantiation feature.  CloudOn was released in Alta, and it enables customers to power-on VM snapshots in the public cloud.  The initial release supported AWS, and Rubrik is now adding support for Azure.

SQL Server Always-On Support

Rubrik is expanding it’s agent-based SQL Server backup support to Always-On Availability Groups.  In the current release, Rubrik will detect if a SQL Server is part of an availability group, but it requires an administrator to manually apply an SLA policy to databases.  If there is a failover in the availability group, a manual intervention would be required to change the replica that was being protected.  This could be an issue with 2-node availability groups as a node failure, or server reboot, would cause a failover that could impact SLAs on the protected databases.

Rubrik 4.1 will now detect the configuration of a SQL Server, including availability groups.  Based on the configuration, Rubrik will dynamically select the replica to back up.  If a failover occurs, Rubrik will select a different replica in the availability group to use as a backup source.  This feature is only supported on synchronous commit availability groups.

Google Cloud Storage Support

Google Cloud is now supported as a cloud archive target, and all Google Cloud storage tiers are supported.

AWS Glacier and GovCloud Support

One feature that has been requested multiple times since Rubrik was released was support for AWS Glacier for long-term storage retention.  Rubrik 4.1 now adds support for Glacier as an archive location.

Also in the 4.1 release is support for AWS GovCloud.  This will allow government entities with Rubrik to utilize AWS as a cloud archive.

Thoughts

Rubrik has had an aggressive release schedule since Day 1.  And they don’t seem to be letting up on quickly adding features.  The 4.1 release does not disappoint in this category.

The feature I’m most excited about is the enhanced support for SQL Always-On Availability Groups.  While Rubrik can detect if a database is part of an AG today, the ability to dynamically select the instance to back up is key for organizations that have smaller AGs or utilize the basic 2-node AG feature in SQL Server 2016.

 

vMotion Support for NVIDIA vGPU is Coming…And It’ll Be Bigger than you Think

One of the cooler tech announcements at VMworld 2017 was on display at the NVIDIA booth.  It wasn’t really an announcement, per se, but more of a demonstration of a long awaited solution to a very difficult challenge in the virtualization space.

NVIDIA displayed a tech demo of vMotion support for VMs with GRID vGPU running on ESXi.  Along with this demo was news that they had also solved the problem of suspend and resume on vGPU enabled machines, and these solutions would be included in future product releases.  NVIDIA announced live migration support for XenServer earlier this year.

Rob Beekmans (Twitter: @robbeekmans) also wrote about this recently, and his blog has video showing the tech demos in action.

I want to clarify that these are tech demos, not tech previews.  Tech Previews, in VMware EUC terms, usually means a feature that is in beta or pre-release to get real-world feedback.  These demos likely occurred on a development version of a future ESXi release, and there is no projected timeline as to when they will be released as part of a product.

Challenges to Enabling vMotion Support for vGPU

So you’re probably thinking “What’s the big deal? vMotion is old hat now.”  But when vGPU is enabled on a virtual machine, it requires that VM to have direct, but shared, access to physical hardware on the system – in this case, a GPU.  And vMotion never worked if a VM had direct access to hardware – be it a PCI device that was passed through or something plugged into a USB port.

If we look at how vGPU works, each VM has a shared PCI device added to it.  This shared PCI device provides shared access to a physical card.  To facilitate this access, each VM gets a portion of the GPU’s Base Address Register (BAR), or the hardware level interface between the machine and the PCI card.  In order to make this portable, there has to be some method of virtualizing the BAR.  A VM that migrates may not get the same address space on the BAR when it moves to a new host, and any changes to that would likely cause issues to Windows or any jobs that the VM has placed on the GPU.

There is another challenge to enabling vMotion support for vGPU.  Think about what a GPU is – it’s a (massively parallel) processor with dedicated RAM.  When you add a GPU into a VM, you’re essentially attaching a 2nd system to the VM, and the data that is in the GPU framebuffer and processor queues needs to be migrated along with the CPU, system RAM, and system state.  So this requires extra coordination to ensure that the GPU releases things so they can be migrated to the new host, and it has to be done in a way that doesn’t impact performance for other users or applications that may be sharing the GPU.

Suspend and Resume is another challenge that is very similar to vMotion support.  Suspending a VM basically hibernates the VM.  All current state information about the VM is saved to disk, and the hardware resources are released.  Instead of sending data to another machine, it needs to be written to a state file on disk.  This includes the GPU state.  When the VM is resumed, it may not get placed on the same host and/or GPU, but all the saved state needs to be restored.

Hardware Preemption and CUDA Support on Pascal

The August 2016 GRID release included support for the Pascal-series cards.  Pascal series cards include hardware support for preemption.  This is important for GRID because it uses time-slicing to share access to the GPU across multiple VMs.  When a time-slice expires, it moves onto the next VM.

This can cause issues when using GRID to run CUDA jobs.  CUDA jobs can be very long running, and the job is stopped when the time-slice is expired.  Hardware preemption enables long-running CUDA tasks to be interrupted and paused when the time-slice expires, and those jobs are resumed when that VM gets a new time-slice.

So why is this important?  In previous versions of GRID, CUDA was only available and supported on the largest profiles.  So to support the applications that required CUDA in a virtual virtual environment, and entire GPU would need to be dedicated to the VM. This could be a significant overallocation of resources, and it significantly reduced the density on a host.  If a customer was using M60s, which have two GPUs per card, then they may be limited to 4 machines with GPU access if they needed CUDA support.

With Pascal cards and the latest GRID software, CUDA support is enabled on all vDWS profiles (the ones that end with a Q).  Now customers can provide CUDA-enabled vGPU profiles to virtual machines without having to dedicate an entire GPU to one machine.

This has two benefits.  First, it enables more features in the high-end 3D applications that run on virtual workstations.  Not only can these machines be used for design, they can now utilize the GPU to run models or simulations.

The second benefit has nothing to do with virtual desktops or applications.  It actually allows GPU-enabled server applications to be fully virtualized.  This potentially means things like render farms or, in a future looking state, virtualized AI inference engines for business applications or infrastructure support services.  One potentially interesting use case for this is running MapD, a database that runs entirely in the GPU, on a virtual machine.

Analysis

GPUs have the ability to revolutionize enterprise applications in the data center.  They can potentially bring artificial intelligence, deep learning, and massively parallel computing to business apps.

vMotion support is critical in enabling enterprise applications in virtual environments.  The ability to move applications and servers around is important to keeping services available.

By enabling hardware preemption and vMotion support, it now becomes possible to virtualize the next generation of business applications.  These applications will require a GPU and CUDA support to improve performance or utilize deep learning algorithms.  Applications that require a GPU and CUDA support can be moved around in the datacenter without impacting the running workloads, maintaining availability and keeping active jobs running so they do not have to be restarted.

This also opens up new opportunities to better utilize data center resources.  If I have a large VDI footprint that utilizes GRID, I can’t vMotion any running desktops today to consolidate them on particular hosts.  If I can use vMotion to consolidate these desktops, I can utilize the remaining hosts with GPUs to perform other tasks with GPUs such as turning them into render farms, after-hours data processing with GPUs, or other tasks.

This may not seem important now.  But I believe that deep learning/artificial intelligence will become a critical feature in business applications, and the ability to turn my VDI hosts into something else after-hours will help enable these next generation applications.

 

#VMworld EUC Showcase Keynote – #EDW7002KU Live Blog

Good afternoon from Las Vegas.  The EUC Showcase keynote is about to start.  During this session, VMware EUC CTO Shawn Bass and GM of End User Computing Sumit Dhawan will showcase the latest advancements in VMware’s EUC technology portfolio.  So sit tight as we bring you the latest news here shortly.

3:30 PM – Session is about to begin.  They’re reading off the disclaimers.

3:30 PM – Follow hashtag #EUCShowcase on Twitter for real-time updates from EUC Champions like @vhojan and @youngtech

3:31 PM – The intro video covers some of the next generation technologies like AI and Machine Learning, and how people are the power behind this power.  EUC is fundamentally about people and using technology to improve how people get work done.

3:34 PM – “Most of you are here to learn how to redefine work.” Sumit Dhawan

3:38 PM – Marginal costs of endpoint management will continue to increase due to the proliferation of devices and applications.  IoT will only make this worse.

3:39 PM – VMware is leveraging public APIs to build a platform to manage devices and applications.  The APIs provide a context of the device along with the identity that allow the device to receive the proper level of security and management.  Workspace One combines identity and context seamlessly to deliver this experience to mobile devices.

3:42 PM – There is a tug of war between the needs of the business, such as security and application management, and the needs of the end user, such as privacy and personal data management.  VMware is using the Workspace One platform to deliver a balance between the needs of the employer and the needs of the end user without increasing marginal costs of management.

3:45 PM – Shawn Bass is now onstage.  He’s going to be showing a lot of demos.  Demos will include endpoint management of Windows 10, MacOS, and ChromeBook, BYO, and delivering Windows as a Service.

3:47 PM – Legacy Windows management is complex.  Imaging has a number of challenges, and delivering legacy applications has more complex challenge.  Workspace One can provide the same experience for delivering applications to Windows 10 as users get with mobile devices.  The process allows users to self-enroll their devices by just entering their email and joining it to an Airwatch-integrated Azure AD.

Application delivery is simplified and performance is improved by using Adaptiva.  This removes the need for local distribution points.  Integration with Workspace One also allows users to self-service enroll in applications without having to open a ticket with IT or manually install software.

3:54 PM – MacOS support is enabled in Workspace One.  The user experience is similar to what users experience on Windows 10 devices and mobile devices – both for enrollment and application experience.  A new Workspace One app experience is being delivered for MacOS.

3:57 PM – Chromebook integration can be configured out of the box and have devices joined to the Workspace One environment.  It also supports the Android Google Play store integration and allows users to get a curated app-store experience.

3:59 PM – The core message of Workspace One is that one solution can manage mobile devices, tablets, and desktop machines, removing the need for point solutions and management silos.

4:01 PM – Capital One and DXC are on stage to talk about their experience around digital workspace.  The key message is that the workplace is changing from one where everyone is an employee to a gig economy where employees are temporary and come and go.  Bring-Your-Own helps solve this challenge, but it raises new challenges around security and access.

Capital One sees major benefits of using Workspace One to manage Windows 10.  Key features include the ability to apply an MDM framework to manage devices and removing the need for application deployment and imaging.

4:10 PM – The discussion has now moved into BYO and privacy.

4:11 PM – And that’s it for me folks.  I need to jet.

NVIDIA GRID Community Advisors Program Inaugural Class

This morning, NVIDIA announced the inaugural class of the GRID Community Advisors program.  As described in the announcement blog, the program “brings together the talents of individuals who have invested significant time and resources to become experts in NVIDIA products and solutions. Together, they give the entire NVIDIA GRID ecosystem access to product management, architects and support managers to help ensure we build the right products.”

I’m honored, and excited, to be a part of the inaugural class of the GRID Community Advisors Program along with several big names in the end-user computing and graphics virtualization fields.  The other members of this 20-person class are:

  • Durukan Artik – Dell, Turkey
  • Barry Coombs – ComputerWorld, UK
  • Tony Foster – EMC, USA, @wonder_nerd
  • Ronald Grass – Citrix, Germany
  • Richard Hoffman – Entisys, USA, @Rich_T_Hoffman
  • Magnar Johnson – Independent Consultant, Norway, @magnarjohnsen
  • Ben Jones – Ebb3, UK, @_BenWJones
  • Philip Jones – Independent Consultant, USA, @P2Vme
  • Arash Keissami – dRaster, USA, @akeissami
  • Tobias Kreidl – Northern Arizona University, USA, @tkreidl
  • Andrew Morgan – Zinopy/ControlUp, Ireland, @andyjmorgan
  • Rasmus Raun-Nielsen – Conecto A/S, Denmark, @RBRConecto
  • Soeren Reinertsen – Siemens Wind Power, Denmark
  • Marius Sandbu – BigTec / Exclusive Networks, Norway, @msandbu
  • Barry Schiffer – SLTN Inter Access, Netherlands, @barryschiffer
  • Kanishk Sethi – Koenig Solutions, India, @kanishksethi
  • Ruben Spruijt – Atlantis Computing, Netherlands, @rspruijt
  • Roy Textor – Textor IT, Germany, @RoyTextor
  • Bernhard (Benny) Tritsch – Independent Consultant, Germany, @drtritsch

Thank you to Rachel Berry for organizing this program and NVIDIA for inviting me to participate.

Full-Stack Engineering Starts With A Full-Stack Education

A few weeks ago, my colleague Eric Shanks wrote a good piece called “So You Wanna Be  A Full Stack Engineer…”  In the post, Eric talks about some the things you can expect when you get into a full-stack engineering role. 

You may be wondering what exactly a full-stack engineer is.  It’s definitely a hot buzzword.  But what does someone in a full-stack engineering role do?  What skills do they need to have? 

The closest definition I can find for someone who isn’t strictly a developer is “an engineer who is responsible for an entire application stack including databases, application servers, and hardware.”  This is an overly broad definition, and it may include things like load balancers, networking, and storage.  It can also include software development.  The specifics of the role will vary by environment – one company’s full-stack engineer may be more of an architect while another may have a hands-on role in managing all aspects of an application or environment.  The size and scope of the environment play a role too – a sysadmin or developer in a smaller company may be a full-stack engineer by default because the team is small.

However that role shakes out, Eric’s post gives a good idea of what someone in a full-stack role can expect.

Sounds fun, doesn’t it?  But how do you get there? 

In order to be a successful full-stack engineer, you need to get a full-stack education.  A full-stack education is a broad, but shallow, education in multiple areas of IT.    Someone in a full-stack engineering role needs to be able to communicate effectively with subject-matter experts in their environments, and that requires knowing the fundamentals of each area and how to identify, isolate, and troubleshoot issues. 

A full-stack engineer would be expected to have some knowledge of the following areas:

  • Virtualization: Understanding the concepts of a hypervisor, how it interacts with networking and storage, and how to troubleshoot using some basic tools for the platform
  • Storage: Understanding the basic storage concepts for local and remote (SAN/NAS) storage and how it integrates with the environment.  How to troubleshoot basic storage issues.
  • Network: Understand the fundamentals of TCP/IP, how the network interacts with the environment,and the components that exist on the network.  How to identify different types of networking issues using various built-in tools like ping, traceroute, and telnet.
    Networking, in particular, is a very broad topic that has many different layers where problems can occur.  Large enterprise networks can also be very complex when VLANs, routing, load balancers and firewalls are considered.  A full-stack engineer wouldn’t need to be an expert on all of the different technologies that make up the network, but one should have enough networking experience to identify and isolate where potential problems can occur so they can communicate with the various networking subject matter experts to resolve the issue.
  • Databases: Understand how the selected database technology works, how to query the database, and how troubleshoot performance issues using built-in or 3rd-party database tools.  The database skillset can be domain-specific as there are different types of databases (SQL vs. No-SQL vs. Column-Based vs. Proprietary), differences in the query languages between SQL database vendors (Oracle, MS SQL Server, and PostgreSQL) and how the applications interact with the database.
  • Application Servers: Understand how to deploy, configure, and troubleshoot the application servers.  Two of the most important things to note are the location of log files and how it interacts with the database (ie – can the application automatically reconnect to the database or does it need to be restarted). This can also include a middleware tier that sits between the application servers  The skills required for managing the middleware tier will greatly depend on the application, the underlying operating system, and how it is designed. 
  • Operating Systems: Understand the basics of the operating system, where to find the logs, and how to measure performance of the system.
  • Programming: Understand the basics of software development and tools that developers use for maintaining code and tracking bugs.  Not all full-stack engineers develop code, but one might need to check out production ready code to deploy it into development and production environments.  Programming skills are also useful for automating tasks, and this can lead to faster operations and fewer mistakes.

Getting A Full-Stack Education

So how does one get exposure and experience in all of the areas above?

There is no magic bullet, and much of the knowledge required for a full-stack engineering role is gained through time and experience.  There are a few ways you can help yourself, though.

  • Get a theory-based education: Get into an education program that covers the fundamental IT skills from a theory level.  Many IT degree programs are hands-on with today’s hot technologies, and they may not cover the business and technical theory that will be valuable across different technology silos.  Find ways to apply that theory outside of the classroom with side projects or in the lab.
  • Create learning opportunities: Find ways to get exposure to new technology.  This can be anything from building a home lab and learning technology hands-on to shadowing co-workers when they perform maintenance.
  • Be Curious: Don’t be afraid to ask questions, pick up a book, or read a blog on something you are interested in and want to learn.

Horizon 7.0 Part 4–Active Directory Design Considerations

Traditionally, virtual desktops run Windows, and the servers that provide the virtual desktop infrastructure services also run on Windows.  Because of the heavy reliance on Windows, Active Directory plays a huge role in Horizon environments.  Even Linux desktops, which are a new option, can be configured for Active Directory and utilize the Horizon user’s AD credentials for Single Sign-On. 

When you’re planning a new Horizon deployment, or re-architecting an existing deployment, the design of your Active Directory environment is a critical element that needs to be considered.  How you organize your virtual desktops, templates, and security groups impacts Group Policy, helpdesk delegation rights, and Horizon Composer.

Some Active Directory objects need to be configured before any Horizon View components are installed.  Some of these objects require special configuration either in Active Directory or inside vCenter.  The Active Directory objects that need to be set up are:

  • An organizational unit structure for Horizon Desktops and desktop templates 
  • Basic Group Policy Objects for the different organizational units
  • An organization unit for Microsoft RDS servers if published apps or RDSH-desktops are deployed

Optionally, you may want to set up an organizational unit for any security groups that might be used for entitling access to the Horizon View desktop pools.  This can be useful for organizing those groups and/or delegating access to Help Desk or other staff who don’t need Account Operator or Domain Administrator rights.

CREATING AN ORGANIZATIONAL UNIT FOR HORIZON DESKTOPS

The first think that we need to do to prepare Active Directory for a Horizon deployment is to create an organizational unit structure for Horizon View desktops.  This OU structure will hold all of the desktops created and used by Horizon View.  A separate OU structure within your Active Directory environment is important because you will want to apply different group policies to your Horizon desktops than you would your regular desktops.  There are also specific permissions that you will need to delegate to the Horizon Composer and/or Instant Clones Administrator service account.

There are a lot of ways that you can set up an Active Directory OU structure for Horizon.  My preferred organizational method looks like this:

2013-12-28_21-55-14

View Desktops is a top-level OU (ie – one that sites in the root of the domain).  I like to set up this OU for two reasons.  One is that is completely segregates my VDI desktops from my non-VDI desktops and servers.  The other is that it gives me one place to apply group policy that should apply to all VDI desktops.

I create three child OUs under the View Desktops OU to separate persistent desktops, non-persistent desktops, and desktop templates.  This allows me to apply different group policies to the different types of desktops.  For instance, you may want to disable Windows Updates and use Persona Management on non-persistent desktops but allow Windows Updates on the desktop templates.

You don’t need to create all three OUs.  If your environment consists entirely of Persistent desktops, you don’t need an OU for non-persistent desktops.  The opposite is true as well.

Finally, I tend to create use-case specific OUs for pools that require additional Group Policy options above and beyond the top-level. These grandchild OUs are completely optional.  If there is no need to set any custom policy for a specific use case, then they don’t need to be created.  However, if a grandchild OU is needed, then an entire pool will need to be created as desktop pools are assigned to OUs.  Adding additional pools can add management overhead to a VDI environment.

I’ve found that there is less of a need for these use-case specific OUs as I’ve learned more about modern UEM tools like RES and VMware User Environment Manager.  These tools can be a scalpel that allow administrators to dynamically apply context-aware policies and settings to specific users or groups without having to create additional pools or OUs for Group Policy configurations.

Creating an Organizational Unit for RDS Servers

Horizon 6.0 added PCoIP support for multi-user desktops running on Windows Server with the Remote Desktop Session Host role.  These new abilities also added support for remote application publishing.

RDSH servers need to be handled differently than virtual desktops.  They’re managed differently than your virtual desktops, and some features such as Persona Management are not available to RDS servers.

If application remoting or multi-user desktops are going to be deployed, an organizational unit for RDS servers should be created underneath your base servers organizational unit. Since RDSH servers are often heavily locked down through Group Policy, I also recommend creating an RDSH Maintenance Mode OU.  This OU is where RDSH servers can be placed when administrators need to remove restrictive group policies such blocking the command prompt or MSI installers removed to perform maintenance on the server.

Horizon Group Policy Objects

Horizon contains a number of custom group policy objects that can be used for configuring features like Persona Management and optimizing the PCoIP protocol.  The number of Group Policy objects and templates is same as what was available in Horizon 6.

Unfortunately, most of the Group Policy templates are distributed as ADM files.  There are a number of drawbacks to ADM files in modern Active Directory environments.  The main one is that you cannot store the Group Policy files in the Central Store.

If you plan on using the Group Policy templates, it’s a good idea to convert them into the ADMX format.  I had previously written about converting the View Group Policy templates into the ADMX format and the reasons for converting here.

Horizon Service Accounts

Horizon requires a service account for accessing vCenter to provision new virtual machines.  If Horizon Composer or Instant Clones are used, a second service account will be needed to create computer accounts in Active Directory for managing computer accounts for the clones.  I will cover setting up those account in a future section.

In the next section, I’ll cover SSL certificates for Horizon servers.