Cloudy with a Chance of Desktops – Episode 1: NVIDIA GTC Recap with Stephen Wagner

Back in 2021, I tried to launch a YouTube channel and podcast focused on End-User Computing. That lasted for about 1 week before I realized I was completely burnt out and took a break to play Pokemon for a year.

After taking a four year break from this (and most content creation in general), I decided to try again as I start up my own consulting business.

In my first podcast episode, Stephen Wagner joins me to talk about NVIDIA GTC, AI workstations, and World of EUC.

Keep an eye out for more as I plan on doing more podcast format and technical videos soon.

Adding or Replacing Disks in Nutanix Community Edition 2.1

In my last home lab blog post, I talked about working with Nutanix Community Edition (CE).  I’ve been spending a lot of time with CE while migrating my home lab workloads over to it.

One of the things I’ve been working on is increasing the storage in my lab cluster. When I originally built my lab cluster, I used the drives that I had on hand.  These were a mixed bag of consumer-grade 1TB SATA and NVME SSDs that worked with my PowerEdge R630s.  It worked great, and I was able to get my cluster built.  

Then I stumbled onto some refurbished Compellent SAS SSDs on eBay, and I wanted to upgrade my cluster.  

While there are some blog posts that cover adding or replacing drives in CE, those posts are for older versions.  The current version of CE at the time of this post is CE 2.1, and it has some significant changes compared to the older versions.  Those directions would work to a point – I could get the new disk imported into my CE cluster, but they would disappear if I shut down or rebooted the host.

You’re probably asking yourself why this is happening.  It’s Nutanix. I should be able to plug in a new disk and have it added into my cluster automatically, right?

Technical Differences Between CE and Commercial Nutanix   

There are some differences between the commercial Nutanix products and CE.  While CE is using the same bits as the commercial AOS offering, it is designed to run on uncertified hardware.  CE can run on NUCs or even as a nested virtual machine as long as you meet the minimum storage requirements.

Commercial Nutanix products will use PCI Passthrough to present the storage controller to the CVM, and the CVM will directly manage any SATA or SAS disks that are installed. This only works with certified SAS HBAs that are found in enterprise-grade servers. CE does not pass the storage controller through to the CVM virtual appliance. Most hardware used in a home lab will not have a supported storage controller, and even if one is present, there is a good chance that the hypervisor boot disk will also be attached to this storage controller.

CE gets around this by passing each SATA or SAS disk through to the CVM individually by reference values like the WWN and Device ID.  This means that you can move the drives to different slots and they will still mount in the same place on the CVM.  CE will also map the disk’s device name as its drive slot – so if a disk’s device name in /sda, for example, CE will map it to slot 1.  This is a little different than a commercial Nutanix system where the disk slot is mapped to a physical slot on the server or HBA.

The disk’s Device ID, UUID and WWN are also provided to the CVM, and this can cause some fun errors if the mounting order or device name changes.  If an existing disk mounted as /sda changes to /sdb, CE will still try to assign it to Slot 1. You may see errors in the Hades log that say “2 disks in one slot” that prevent you from using the new disk.

Adding or replacing disks should be pretty simple then. We just need to edit the CVM’s definition file and update it with the new drive information. Right?  

You’d think that, but that leads right into the second problem I mentioned earlier: the new drives would disappear when the host was rebooted. And to make this issue a little worse, I would see errors about unmounted or missing drives in Prism after the reboot.

So what’s happening here?

After looking into this and talking with the CE lead at Nutanix, I learned that the CVM definition file is regenerated on every host reboot. So if I make a change to the CVM definition, it will be rolled back to its “known configuration” the next time the host is powered on.  Generally speaking, this is a great idea, but in this case, we want to change our CVM configuration and don’t want it to revert on a reboot.

We’ll need to edit a file called cvm_config.json on our host to make our changes permanent. 

There is just one other little thing. The CVM’s disk device names are determined by the order they are listed in the cvm_config.json file. The first disk on the list will be named /sda, second disk as /sdb, etc. And as I said above, the Hades service that manages storage equates the new disk’s device name with the disk slot, and existing disks may also be assigned to that slot if it previously used that device name. This could prevent you from using the new disk and generate “two disks in one slot” error messages in the Hades log if the mounting order changes..

Adding and/or Replacing Disks

So how do we add or replace a disk on Nutanix CE?  Here are the steps that I took in my lab to replace SATA SSDs with SAS SSDs.  NVME drives are PCI Passthrough devices, and I won’t be covering them in this post.

This process will be making changes to your lab environment. Make sure you have backed up any important data before taking these steps, and remember that you are doing these at your own risk.

Removing an Old Disk

The first step I like to take when replacing a disk is to remove the disk I am replacing.  This step isn’t required, but I like to remove the disk first to evacuate any data from it before I make any changes.  If you’re just adding a new disk and not replacing an old one, you can skip to the next section.

To remove our disk, we need to start in Prism. Go to the dropdown menu located at the top of the screen and select Hardware

Click Table (1) and then Disks (2)

Locate the disk you want to remove. The easiest way to do this is by searching for the disk’s serial number.  Right click on the disk and select Remove.  I will also write down the serial number because I will need it later.

This will start the process of evacuating data from the disk, and it may take some time to complete. After the data has been evacuated from the disk, it will be unmounted and removed from the list.  

Entering Maintenance Mode

Before we can shut down the host to replace our disks, we need to prepare the host and/or cluster for maintenance.  There are two ways to do this, and the option you select will depend on how many hosts are in your CE cluster.

If you have a 3 or 4 node CE cluster, then you can place your host into maintenance mode.  This will migrate all running VMs to another host and shut down the CVM for you.  The steps to put a host into maintenance mode are:

Click on Table -> Hosts.

Right click on the host you want to put into maintenance mode and select Enter Maintenance Mode.

When you enter maintenance mode, all VMs will be migrated to another host in the cluster before that host’s CVM is shut down.  It will stay in maintenance mode until you take it out of maintenance mode.

If you only have a single host in your cluster, you will need to shut down all powered-on VMs, manually stop the cluster from the CVM’s command line, and then shut down the CVM.

Once our CVM is shut down, we need to shut down our host to install our new drive or drives. AHV will not detect any new drives without a reboot. So we will need to reboot the server even if we are using enterprise-grade hardware with a storage controller that supports hot-adding drives.

Configuring AHV to Use New Disks

Once our new drives are installed, we can power our host back on.  When it has finished booting, we need to SSH into AHV.  We will need root privileges for the next couple of steps, so you will need to either use the root account or use an account with sudo privileges.  When I was doing this in my lab, I logged in as the admin user and ran sudo su – root to switch to the root user to perform the next steps.

Before we do anything else, we need to back up our CVM configuration file.  This is the file that AHV will use to regenerate the CVM’s virtual machine definition.  The command to back up this file is below.

Root user: cp /etc/nutanix/config/cvm\_config.json /etc/nutanix/config/cvm_config.backup
Admin user:sudo cp /etc/nutanix/config/cvm\_config.json /etc/nutanix/config/cvm_config.backup

Next, we need to validate that our new drives are showing up and mounted properly.  We also want to get the device name for the new drive because we will need that to get other information we need in the next step.  The command we need to run is lsblk or sudo lsblk if we’re not the root user.

Next, we need to retrieve our drive’s device identifiers.  These include the SATA or SCSI name and the WWN.  We will use these when we update our CVM config file to identify the drive that needs to be passed through to the CVM.  The command to retrieve this is below.

Root User: ls -al /dev/disk/by-id/ | grep <device name>
Admin User: sudo ls -al /dev/disk/by-id/ | grep <device name>

Next, we will need to generate a unique UUID for the drive we will be installing.  The command to generate a uuid is uuidgen

Finally, we will need to retrieve the drive’s vendor, product ID, and serial number. We can use the smartctl command to retrieve this. Smartctl works with both SATA and SAS drives.  The command to get this information is below.  If you are planning to install a SATA drive, you’ll want to mark the vendor down as ATA as this is how Nutanix appears to consume these drives.

Root user: smartctl -i /dev/<device name>
Admin user: sudo smartctl -i /dev/<device name>

So now that we’ve retrieved our new disk’s serial number, WWN, and other identifiers, we need to update our CVM definition file.  We do this by editing the cvm_config.json file we backed up in an earlier step.

Root user: nano /etc/nutanix/config/cvm_config.json
Admin user:sudo nano /etc/nutanix/config/cvm_config.json

The cvm_config.json file is a large file, so we will want to navigate down to the VM_Config -> Devices -> Disks section.  If we are replacing a disk, we’ll want to locate the entry for the serial number of the disk we’re replacing and update the values to match the new disk. 

If we are adding a new disk to expand capacity, you need to create a new entry for a disk.  The easiest way to do this in my experience is to copy an existing disk, paste it at the end of our disk block, and then replace the values in each field with the ones for your new disk.

There are a few  things that I want to call out here.  

  • If your new disk is a SATA disk, the vendor should be ATA. If you are adding or replacing a SAS drive, you should use the vendor data you copied when you ran the smartctl command.
  • The UUID field needs to start with a “ua-”, so you would just add this to the front of the UUID you generated earlier using the uuidgen command.  The UUID line should look like this: “uuid”: “ua-<uuid generated by uuidgen command>”,

Once we have all of our disk data entered into the file, we can save it and reboot our host.  During the reboot process, our CVM definition file will be regenerated and the new disks will be attached to the CVM virtual machine.  Once our host is online, we can SSH back into it to validate that our CVM still exists before powering it back on.  The commands that we need to run from the host to perform these tasks are:

virsh list --all
virsh start <CVM VM>

You can also view the updated CVM config to verify that the new disks are present with the following command:

Virsh edit <CVM VM>

Once our CVM is powered on, we need to SSH into it and switch to the nutanix user.  The nutanix user account is the account that all the nutanix services run under, and it is the main account that has access to any logs that may be needed for troubleshooting.  

The main thing we want to do is verify that the CVM can see our disks, and we will use the lsblk command for that.  Our new disks should be showing up at this point, so you can take the host out of maintenance mode or restart the cluster.

The new disk should automatically be adopted and added into the storage pool after the Nutanix services have started on the host.  

Troubleshooting

Additional troubleshooting will be needed if the disk isn’t added to the storage pool automatically, and/or you start seeing disk-related errors in Prism.  This post is already over 2000 words, so I’ll keep this brief.

The first thing to do is check the hades service logs.  This should give you some insight into any disk issues or errors that Nutanix is seeing.  You can check the Hades log by SSHing into the CVM and running the following command as the nutanix user: cat data/logs/hades.out

You may also need to add your new disk to the hcl.json file.  This file contains a list of supported disks and their features.  The hcl.json file is found in /etc/nutanix/.  Due to this post’s length, how to edit the file and add a new disk is also beyond the scope of this post.  I’ll have to follow this up with another post.

I would also recommend joining the Community Edition forums on the Nutanix .Next community.  Nutanix staff and community members can help troubleshoot any issues that you have.

Omnissa Horizon Load Balancing Overview

I’ve been spending a lot of time in the “VMware Horizon*1” sub-Reddit lately where I’ve been trying to help others with their Horizon questions. One common theme that keeps popping up is regarding load balancing, and I decided that it would be easier to write a blog post to address the common load balancing scenarios and use cases than rewriting or pasting a long-winded reply in each thread.

Load balancing is an important part of designing and deploying a Horizon environment. It is an important consideration for service availability and scalability, and there are multiple Techzone articles that talk about load balancing. I have links to some of these articles at the end of this post.

How load balancing fits into a Horizon deployment

Horizon can utilize a load balancer in three different ways. These are:

  • load balancing the Horizon 8 Connection Servers 
  • load balancing the Unified Access Gateways in Horizon 8 deployments
  • load balancing App Volumes Managers.

While App Volumes can also benefit from a load balancer, I won’t be covering that topic in this post. I also won’t be covering global load balancing or multi-site load balancing, and I won’t be doing a comparison of different load balancers. If your question is “which load balancer should I use?” my answer is “yes” followed by “what are your requirements?”

Load balancing for Horizon Connection Servers (CS) and Unified Access Gateways (UAG) seems pretty simple. At least, it seems simple on the surface. But every load balancer or load balancer-as-a-service is implemented differently, and this may require different architectures to achieve the same outcome.

This post is going to focus on load balancing Internet-facing Unified Access Gateways in external access scenarios as this is what most people seem to struggle with.

Why Deploy A Load Balancer With Horizon

So let’s get the first question out of the way.  Why should you load balance your Horizon deployment?  What benefits does it provide?  

There are two reasons to deploy a load balancer, and they are somewhat related.  The first reason is to improve service availability, and the second reason is to support additional user sessions. Both of these are accomplished the same way – horizontally scaling the environment by adding additional CSes or UAGs. 

Adding additional CSs and UAGs increases the number of concurrent sessions that our environment can support, and it increases the availability of the Horizon service. With proper health checks enabled, you can maintain service availability even if a CS or UAG goes offline because the load balancer will just direct new sessions to other components that are available.

We can also provide a consistent user experience by using a single URL to access the service so users do not need to know the URL for each Connection Server or UAG.

Some load balancers can do more than just load balancing between components. Many load balancers can provide SSL offloading services. Some load balancers add security features, real-time analytics, web application catalogs, or other features. Those are out-of-scope for this guide, but it is important to understand what capabilities your load balancer solution can provide as this can shape your desired outcome.

Understanding Horizon Connectivity Flows

Before we talk about load balancing options for Horizon, it’s important to understand traffic flows between the Horizon Client, Unified Access Gateway, Connection Server, and the agent that is deployed on the virtual desktop or RDSH server.

Horizon uses two main protocols.  These are:

  • XML-API over HTTPS: This is the protocol used for authentication and session management. The documentation considers this the “primary protocol.”
  • Session Protocol Traffic: This is the protocol used for communication between the Horizon Client and Agent. Horizon has two protocols, PCoIP and Blast, and can use an HTTPS tunnel for side channel traffic like Client Drive Redirection (CDR) and Multimedia Redirection (MMR). The documentation refers to these protocols as “secondary protocols.”

Horizon also requires session affinity. When connecting to an environment, a load balancer will direct the user to a UAG or Connection Server to authenticate.  All subsequent primary and secondary traffic must be with that UAG or Connection Server. If you do not have session affinity, then a user will be required to reauthenticate when the load balancer directs their session to a new UAG or Connection Server, and it can interrupt their access to their sessions.

There are multiple ways to set up session affinity including Source IP Persistence, using multiple VIPs with one VIP mapped to each UAG, and providing a public IP for each UAG.

So what does session traffic flow look like?  The Omnissa Techzone page has a really good explanation that you can read here.

Figure 1: Horizon Traffic Flow with UAGs and Load Balancers (Retrieved from https://techzone.omnissa.com/resource/understand-and-troubleshoot-horizon-connections#external-connections)

High-Level Horizon Load Balancer Architectures

There are two deployment architectures I’ve regularly encountered when designing for external access.  

The first, which I will refer to as N+1, is to just use the load balancer for XML-API over HTTPS traffic.  In this scenario, the XML-API over HTTPS traffic will be sent through the load balancer, and any session protocol (or secondary) traffic will occur directly with the UAG.  When configuring your UAGs in an N+1 scenario, you need to provide a unique URL for the Blast Secure Gateway or a unique public IP address for the PCoIP Secure Gateway, and your SSL certificate needs to contain subject alternative names for the load balanced URL and the UAG’s unique URL. 

(The Unified Access Gateway also supports HTTP Host Redirection for Horizon environments, but this is only used in some specific load balancer scenarios.)

The second deployment architecture is having all Horizon traffic pass through the load balancer. This includes both the XML-API over HTTPS and session protocol traffic.  This deployment option is typically used in environments where there is a limited number of public IP addresses.  

Session Affinity and throughput are the biggest concerns when using this approach.  The load balancer appliance can become a traffic bottleneck, and it needs to be sized to handle the number of concurrent sessions.  Session affinity is also a concern as an improperly configured load balancer can result in disconnects or failure to launch a session. 

Load Balancing Options for Horizon

At a high level, here are three categories of load balancers that can be used with Horizon.  These are:

  • 3rd-Party external load balancers like NSX Advanced Load Balancer (Avi), F5, Netscaler, Kemp and others. This can also include open-source solutions like Nginx or HAProxy.
  • Cloud Load Balancer Services
  • Unified Access Gateway High-Availability

Unified Access Gateway High Availability

I want to talk about the Unified Access Gateway High-Availability feature first.  This is probably the most misunderstood option, and while it can be a great solution for some customers, it will not be a good fit for many customers. It’s worth reading the documentation on this feature if you’re considering this option.  

When deployed for Horizon, UAG HA uses Round Robin with Source IP Affinity for directing traffic between UAGs. But unlike other options, it can only provides high availability for the XML-API over HTTPs traffic.  It does not provide high availability for session protocol traffic like Blast or PCoIP.  

If you are looking to use this feature in an Internet-facing scenario, you would need N+1 public IP addresses and DNS names, where N is the number of UAGs you are deploying or plan to deploy plus one for the load balanced VIP shared by all of the UAGs. This is because the Horizon Client needs to be able to reach the UAG that it authenticated on for session traffic.

Unified Access Gateway High Availability may also not work in some public cloud scenarios where you are deploying into a native public cloud.

External 3rd-Party Load Balancers

The next option is the 3rd-party external load balancer. This is your traditional load balancer.  It can provide the most deployment flexibility, and most vendors have a guide for deploying their solution with Horizon.  

Third-party load balancers may also provide their own value-added features on top of basic load balancing. F5, for example, can integrate into Horizon when using their iApp, and the Avi documentation contains deployment guides for multiple customer deployment scenarios.

There are also open-source options here – NGINX and HAProxy being the two most common in my experience – but there may be some tradeoffs. Open-source HAProxy only supports TCP traffic with UDP load balancing included in their paid enterprise product. Open-source NGINX can support TCP and UDP traffic, but active health checks are part of the paid product (although there are ways to work around that – I just haven’t tested them any).

Cloud Native Load Balancer-as-a-Service

The final option to consider is Cloud Native Load Balancer-as-a-Service options.  These are useful if you are deploying into a cloud-based VMware SDDC Service like Google Cloud VMware Engine, Azure VMware Solution, or Oracle Cloud VMware, or into a native public cloud like Amazon Web Services for Horizon on Amazon Workspaces Core and EC2.

There are many varieties of cloud-native load balancer services. These come with different feature sets, supported network topologies, and price points. Some load balancing services only support HTTP and HTTPs, others can support all TCP and UDP traffic. Some only work with services that are in the same VCP or vNET as the load balancer while others can provide load balancing to services in other networks or even endpoints in remote data centers over a WAN or VPN connection.  

Public cloud scenarios are usually good for the N+1 IP deployment model.  Public cloud providers have large pools of IPv4 addresses that you can borrow from for a very small monthly fee.

Do I Need a Load Balancer Between My Unified Access Gateways and Connection Servers?

One of the benefits the UAG had over the old Horizon Security Server was that you didn’t need to map each UAG to a Connection Server. You could point them at a load balanced internal URL, and if a Connection Server went offline, the internal load balancer would just direct new sessions to a different Connection Server.

This was much easier than trying to load balance Security Servers, where complicated health check rules were required to detect when a Connection Server was down and take the Security Server offline.

But do you need a load balancer between the UAGs and Connection Servers?

Surprisingly, the answer is no.  While this is a supported deployment, it isn’t required. And it doesn’t require any complex health check setups.  

When configuring a load balancer health check for Horizon, you should point to favicon.ico. The UAG is a reverse proxy, and it proxies the favicon.ico file from the Connection Server (or load balanced set of Connection Servers).  If the Connection Server goes offline, the UAG health check will fail and the load balancer will mark it as down.

Questions to Ask When Getting Started with Horizon Load Balancers

Before we can start architecting a load balancer solution for Horizon, we have to define what our requirements and outcomes are. These should be defined during your discovery or design phase by asking the following questions:

  1. Where are you deploying or hosting the environment?
  2. What load balancers do you have in place today for other services? What sort of traffic types do your existing load balancers support? How much throughput can they handle?
  3. What do your internal and external user traffic flows look like (or what do you want them to look like)? Are you currently or planning on sending both internal and external user sessions through UAGs or just external users?
  4. Do you have any requirements around multi-factor authentication, Smart Card support, or TrueSSO?
  5. What requirements does your security team have?
  6. What is your budget?
  7. How many public IP addresses do you have access to or are available to you for external access?

The answers to these questions will help define the load balancer and external access architecture. 

Learning More

If you want to learn more about load balancing Horizon environments, you can check out the following resources from Omnissa and VMware by Broadcom.  You should also check with your preferred load balancer vendor to see if they have any Horizon configuration guides or reference architectures.

  1. Yes…I am aware that the new name will be Omnissa Horizon, but this is the name of the channel until someone with admin rights changes it. I don’t want to hear it, Rob… ↩︎

How to Configure the NVIDIA vGPU Drivers, CUDA Toolkit and Container Toolkit on Debian 12

As I’ve started building more GPU-enabled workloads in my home lab, I’ve found myself repeating a few steps to get the required software installed. It involved multiple tools, and I was referencing multiple sources in the vendor documentation.

I wanted to pull everything together into one document – both to document my process so I can automate it and also to share so I can help others who are looking at the same thing.

So this post covers the steps for installing and configuring the NVIDIA drivers, CUDA toolkit, and/or the Container Toolkit on vSphere virtual machines.

Install NVIDA Driver Prequisites

There are a few prerequisites required before installing the NVIDIA drivers.  This includes installing kernel headers, the programs required to compile the NVIDIA drivers, and disabling Nouveau. We will also install the NVIDIA CUDA Repo.

#Install Prerequisites
sudo apt-get install xfsprogs wget git python3 python3-venv python3-pip p7zip-full build-essential -y
sudo apt-get install linux-headers-$(uname -r) -y

#Disable Nouveau
lsmod | grep nouveau

cat <<EOF | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
EOF
sudo update-initramfs -u

Reboot the system after the initramfs build completes.

sudo reboot

Install the NVIDIA Drivers

NVIDIA includes .run and .deb installer options for Debian-based operating systems.  I use the .run option because that is what I am most familiar with.  The run file will need to be made executable as it does not have these permissions by default. I also install using the --dkms flag so the driver will be recompiled automatically if the kernel is updated.

The vGPU drivers are distributed through the NVIDIA Enterprise Software Licensing portal through the NVIDIA Virtual GPU or AI Enterprise product sets and require a license to use   If you are using PCI Passthrough instead of GRID, you can download the NVIDIA Data Center/Tesla Drivers from the data center driver download page

I am using the NVAIE product set for some of my testing, so I will be installing a vGPU driver. The steps to install the Driver, CUDA Toolkit, and Container Toolkit are the same whether you are using a regular data center driver or the vGPU driver. You will not need to configure any licensing when using PCI Passthrough.

The drivers need to be downloaded, copied over to the virtual machine, and have the executable flag set on the file.

sudo chmod +X NVIDIA-Linux-x86_64-550.54.15-grid.run
sudo bash ./NVIDIA-Linux-x86_64-550.54.15-grid.run --dkms

Click OK for any messages that are displayed during install.  Once the installation is complete, reboot the server.

After the install completes, type the following command to verify that the driver is installed properly.

nvidia-smi

You should receive an output similar to the following: 

Installing the CUDA Toolkit

Like the GRID Driver installer, NVIDIA distributes the CUDA Toolkit as both a .run and .deb installer. For this step, I’ll be using the .deb installer as it works with Debian’s built-in package management, can handle upgrades when new CUDA versions are released, and contains a multiple meta package installation options that are documented in the CUDA installation documentation.

By default, the CUDA toolkit installer will try to install an NVIDIA driver.  Since this deployment is using a vGPU driver, we don’t want to use the driver included with CUDA.  NVIDIA is very prescriptive about which driver versions work with vGPU, and installing a different driver, even if it is the same version, will result in errors.  

The first step is to install the CUDA keyring and enable the contrib repository.  The keyring file contains the repository information and the GPG signing key.  Use the following commands to complete this step:

wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo add-apt-repository contrib

The next step is to update our Apt-Get repos and install the CUDA Toolkit. The CUDA toolkit requires a number of additional packages that will be installed alongside the main application.

sudo apt-get update && sudo apt-get -y install cuda-toolkit-12-5

The package installer does not add CUDA to the system PATH variable, so we need to do this manually.  The way I’ve done this is to create a login script that applies for all users using the following command.  The CUDA folder path is versioned, so this script to set the PATH variable will need to be updated when the CUDA version changes.

cat <<EOF | sudo tee /etc/profile.d/nvidia.sh
export PATH="/usr/local/cuda-12.5/bin${PATH:+:${PATH}}"
EOF
sudo chmod +x /etc/profile.d/nvidia.sh

Once our script is created, we need to apply the updated PATH variable and test our CUDA Toolkit installation to make sure it is working properly.  

source /etc/profile.d/nvidia.sh
nvcc --version

You should receive the following output if the PATH variable is updated properly.

If you receive a command not found error, then the PATH variable has not been set properly, and you need to review and rerun the script that contains your EXPORT command.

NVIDIA Container Toolkit

If you are planning to use container workloads with your GPU, you will need to install the NVIDIA Container Toolkit.  The Container Toolkit provides a container runtime library and utilities to configure containers to utilize NVIDIA GPUs.  The Container Toolkit is distributed from an apt repository.

Note: The CUDA toolkit is not required if you are planning to only use container workloads with the GPU.  An NVIDIA driver is still required on the host or VM.

The first step for installing the NVIDIA Container Toolkit on Debian is to import the Container Toolkit apt repository.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

Update the apt repository packages lists and install the container toolkit.

sudo apt-get update && sudo apt-get install nvidia-container-toolkit

Docker needs to be configured and restarted after the container toolkit is installed.  

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Note: Other container runtimes are supported.  Please see the documentation to see the supported container runtimes and their configuration instructions.

After restarting your container runtime, you can run a test workload to make sure the container toolkit is installed properly.

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Using NVIDIA GPUs with Docker Compose

GPUs can be utilized with container workloads managed by Docker Compose.  You will need to add the following lines, modified to fit your environment, to the container definition in your Docker Compose file.  Please see the Docker Compose documentation for more details.

deploy:
 resources:
   reservations:
     devices:
       - driver: nvidia
         count: 1
         capabilities:
           - gpu

Configuring NVIDIA vGPU Licensed Features

Your machine will need to check out a license if NVIDIA vGPU or NVAIE are being used, and the NVIDIA vGPU driver will need to be configured with a license server.  The steps for setting up a cloud or local instance of the NVIDIA License System are beyond the scope of this post, but they can be found in the NVIDIA License System documentation.

Note: You do not need to complete these steps if you are using the Data Center Driver with PCI Passthrough. Licensing is only required if you are using vGPU or NVAIE features.

A client configuration token will need to be configured once the license server instance has been set up.  The steps for downloading the client configuration token can be found here for CLS, or cloud-hosted, instances and here for DLS, or delegated local, instances.

After generating and downloading the client configuration token, it will need to be placed onto your virtual machine. The file needs to be copied from your local machine to the /etc/nvidia/ClientConfigToken directory.  This directory is locked down by default, and it requires root or sudo access to perform any file operations here. So you may need to copy the token file to your local home directory and use sudo to copy it into the ClientConfigToken directory.  Or you can place the token file on a local web server and use wget/cURL to download it.

In my lab, I did the following:

sudo wget https://web-server-placeholder-url/NVIDIA/License/client_configuration_token_05-22-2024-22-41-58.tok

The token file needs to be made readable by all users after downloading it into the /etc/nvidia/ClientConfigToken directory.

sudo chmod 744 /etc/nvidia/ClientConfigToken/client_configuration_token_*.tok

The final step is to configure vGPU features.  This is done by editing the gridd.conf file and enabling vGPU.  The first step is to copy the gridd.conf.template file using the following command.

sudo cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf

The next step is to edit the file, find the line called FeatureType, and change the value from 0 to 1.

sudo nano /etc/nvidia/gridd.conf

Finally, restart the NVIDIA GRID daemon.

sudo systemctl restart nvidia-gridd

You can check the service status with the sudo systemctl status nvidia-gridd command to see if a license was successfully checked out.  You can also log into your license service portal and review the logs to see licensing activity.

Sources

While creating this post, I pulled from the following links and sources.

https://docs.nvidia.com/cuda/cuda-installation-guide-linux

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#meta-packages

https://docs.nvidia.com/grid/17.0/grid-vgpu-user-guide/index.html#installing-vgpu-drivers-linux-from-run-file

https://docs.nvidia.com/grid/17.0/grid-vgpu-user-guide/index.html#installing-vgpu-drivers-linux-from-debian-package

https://docs.nvidia.com/ai-enterprise/deployment-guide-vmware/0.1.0/nouveau.html

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/

https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html

https://docs.docker.com/compose/gpu-support/

https://docs.nvidia.com/license-system/latest/index.html

The Home Lab Update 2024

Back in August 2023, I had the pleasure of presenting two VMware Explore sessions about home labs.  While preparing for those sessions, I realized that I hadn’t done a home lab update post in a long time.  In fact, my last update post was four years ago in February 2020.

And a lot has changed in my lab. The use cases, architecture, the hardware, and even my areas of focus have changed significantly in the last four years.  With VMware being acquired by Broadcom and my desire to retool and expand my skillsets, my home lab will be more important than ever.  I will be using it as a tool to help achieve my goals and find my next path.

And while I was originally going to write about the lab infrastructure changes, I decided that my original post just wasn’t right.  My home lab is practically a private cloud, and the tone of the post unintentionally came off as bragging.

That didn’t sit right with me, so I decided to scrap that post and start over. I want to focus on workloads and applications instead of hardware and infrastructure solutions, and I want to elevate some of the open-source projects that I’m using as learning tools.

And when I talk about hardware or infrastructure, it’s going to be about how that hardware supports the specific application or workload.

Home Lab Use Cases

I think it’s very important to talk about my lab use cases. 

I had a slide that I used in two of my VMworld VMware Explore sessions that summed up my home lab use cases:

I really want to focus on the last two use cases on that slide: self-hosting open-source solutions and Minecraft.  The latter has really driven the “roadmap” for my lab by forcing me to look for open-source solutions.  I don’t have a budget, so I’ve been forced to look at open-source solutions to support my kids’ Minecraft servers.

Minecraft isn’t the only thing I’m self-hosting, though.  I’ve found some awesome tools thanks to the r/self-hosted sub-Reddit, and I’ve used some of the tools there to fill in the gaps in my infrastructure.

Most of these solutions are containerized or offer a container-based option.  I’m using containers whenever possible because it makes deploying and maintaining the application and its dependencies much easier than managing binary installs. Each application stack gets its own Debian-based VM, and I am using Traefik as my reverse proxy and SSL offload manager of choice.

I haven’t jumped into Kubernetes yet as I’m still getting comfortable with containers, and self-hosting Kubernetes adds another layer of complexity to my lab.  It is on my to-do list.

All the solutions I’m using would be deserving of their own posts, but in the interest of time and wordcount, I’ll keep it fairly high level today.

Vaultwarden

There was a time, a long time ago, when I was a Lastpass family customer.  It got harder to justify the yearly cost of Lastpass when self-hosted alternatives like Bitwarden were available (and…if I’m being honest…my family was not using the service). The Lastpass breach and security issues came to light about six months after I had cancelled my subscription and migrated my vault out, but it only justified my decision to move on.

I was originally using the self-hosted Bitwarden container.  But I recently switched to Vaultwarden so I could start offering password vaults to the rest of my family as they are seeing the need for a password vault service.

Vaultwarden is one of the most important services in my lab.  This service contains critical data, and I need to make sure it is backed up. I’m using a combination of this Vaultwarden  backup container and Restic to protect the data in this application.

MinIO

MinIO is one of the few applications that I’ve deployed with infrastructure dependencies.  I originally deployed MinIO in my lab when I was testing VMware Data Services Manager (DSM) as that product required S3-compatible storage. 

I have a 3-node MinIO cluster in my lab.  Each MinIO node has two data volumes, so I have a total of 6 data disks across my 3 nodes. 

This is one of the few applications in my lab that is tied to specific hosts and hardware.  Each MinIO node data volume is sitting on a dedicated local SSD, so each node is tied to an ESXi host in a workload cluster.  This setup allows me to use erasure coding and provides some degree of data redundancy, but it makes host management operations a little more complex because I must shut down the MinIO node on a host before I can perform any maintenance operations. 

Even though I’m no longer testing DSM in my lab, I still have MinIO deployed.  I’m using it as the backend for other services in my lab that I will be talking about later in this post. 

Wiki.JS

Home labs are rarely documented.  This is something I’m trying to improve on with my lab as I’ve had a few processes that I’ve had to figure out or reverse engineer from looking at my shell/command history.  I used to use Confluence SaaS free tier for documenting my home network, but SSO was a $30 per month add-on. 

I also wanted a self-hosted option.  I looked at a few wiki options, including Bookstacks, Dokuwiki, and a few others.  But I’m also kind of picky about my requirements and wanted something that supported SSO out of the box and used PostgreSQL.

So I settled on wiki.js as my solution because it is open source, met all my technical requirements, and it fit in my budget.

I’m not taking full advantage of WikiJS yet.  My focus has been importing content from Confluence and testing out the SSO features.  But I plan to add more lab documentation and use it for some of my programming and lab side projects in the future. 

Grafana Loki and Promtail

I’ve needed a log management solution for my fleet of Minecraft servers for a while now.  Log management has been an issue on those, and some method to easily search the logs is kind of a requirement before I let my kids share the servers with their friends.

There are a lot of open-source solutions in this space, but I am settling on the Grafana stack.  I’m starting with this stack because it seems to be a well-integrated stack for performance monitoring, log aggregation, and creating dashboards. Time will tell on that as I am just getting started with Grafana Loki.  I have a small instance deployed today to get my Promtail config ironed out, and I will be redeploying it as I roll it out to the rest of my lab.

One thing I like about some of the newer log management systems is that they can use S3-compatible storage for log data.  Loki isn’t the only solution that can do this but being a part of the Grafana stack set it apart in my mind and helped make it my first choice.

I’m using the Promtail binary for my Minecraft servers, and getting that config set up properly has been a pain.  The documentation is very high level, and as far as I can tell, doesn’t include very many example configs to start from.  Some of the issues I had to work through are scraping the systemctl journal, which required adding the promtail service user to a systemctl-journald group and getting the hostname and IP address added to all forwarded logs.  The documentation covered some of what I needed, but there were some significant gaps in my opinion.  It took a lot of trial, error, and experimentation to get where I wanted to be.

I need to write a longer blog post to talk about my setup and how I worked around some of the issues I faced once I get this rolled out into “production” and get some dashboards built.  I will also be looking at Grafana’s version of Prometheus for performance monitoring in a later phase.

OwnCloud Infinite Scale

Have you ever exceeded the limits of the free tiers that Microsoft and Google offer on OneDrive or Google Drive?  Or needed a Dropbox-like service that was self-hosted to meet data sovereignty or compliance requirements? 

OwnCloud Infinite Scale (OCIS) is an open-source ground-up rewrite of OwnCloud Server using the Go programming language.  It is a drop-in replacement for OneDrive, Google Drive, Dropbox and similar solutions.  The client app supports files-on-demand (although this feature is experimental on MacOS).  The server supports integration with multiple web-based office suites, OpenID Connect for SSO, and S3-compatible storage.

I use it for some of my file storage needs, especially the stuff that I don’t want to put on OneDrive or transferring data from my laptop to my lab. I expect to use the Spaces feature to replace some of my lab file servers and QNAP virtual appliances in my lab.  

DDNS-Route53

DDNS-Route53 is a Go application that allows you to build your own Dynamic DNS service using AWS Route53. I was getting tired of having multiple dynamic DNS services tied to different domains, and I’ve started to standardize all my domains on Route53 and use this service to replace the few dynamic DNS services that I currently use.

Conclusion

These are just a few of the open-source projects I’ve been using in my lab.  I have a few more that I’ve been testing out that I will talk about in future posts. 

Open-source solutions are a great way to get more utilization out of your home lab while building or enhancing your technical skills.  I’ll be talking more about this topic at the Wisconsin VMUG Usercon in April 2024.  If you’re going to be there, please stop by my session.

How I Automated Minecraft Server Builds

If you have kids that are old enough to game on any sort of device with a screen, you’ve probably been asked about virtual Lego kits. And I don’t mean the various branded video games like LEGO Worlds or the LEGO Star Wars games. No, I’m talking about something far more addictive – Minecraft.

My kids are Minecraft fanatics. They could play for hours on end while creative how-tos and “Let’s Play” YouTube videos loop non-stop in the background. And they claim they want to play Minecraft together, although that’s more theory than actual practice in the end. They also like to experiment and try to build the different things they see on YouTube. They wanted multiple worlds to use as playgrounds for their different ideas.

And they even got me to play a few times.

So during the summer of 2020, I started looking into how I could build Minecraft server appliances. I had built a few Minecraft servers by hand before that, but they were difficult to maintain and keep up-to-date with Minecraft Server dot releases and general operating system maintenance.

I thought a virtual appliance would be the best way to do this, and this is my opinionated way of building a Minecraft server.

TL;DR: Here is the link to my GitHub repo with the Packer files and scripts that I use.

A little bit of history

The initial version of the virtual appliance was built on Photon. Photon is a stripped down version of Linux created by my employer for virtual appliances and running container workloads. William Lam has some great content on how to create a Photon-based virtual appliance using Packer.

This setup worked pretty well until Minecraft released version 1.17, also known as the Caves and Cliffs version, in the summer of 2021.

There are a couple of versions of Minecraft. The two main ones are Bedrock, which is geared towards Windows, mobile devices, and video game consoles, and Java, which uses Java and only runs on Windows, Linux, and Mac.

My kids play Java edition, and up until this point, Minecraft Java edition servers used the Java8 JDK. Minecraft 1.17, however, required the Java16 JDK. And that led to a second problem. The only JDK in the Photon repositories at the time was for Java8.

Now this doesn’t seem like a problem, or at least it isn’t on a small scale. There are a few open-source OpenJDK implementations that I could adopt. I ended up going with Adoptium’s Temurin OpenJDK. But after building a server or two, I didn’t really feel like maintaining a manual install process. I wanted the ease of use that came with a installing and updating from a package repository, and that wasn’t available for Photon.

So I needed a different Linux distribution. CentOS would have been my first choice, but I didn’t want something that was basically a rolling release candidate. My colleague Timo Sugliani spoke very highly of Debian, and he released a set of Packer templates for building lightweight Debian virtual appliances on GitHub. I modified these templates to use the Packer vSphere-ISO plugin and started porting over my appliance build process.

Customizing the Minecraft Experience

Do you want a flat world or something without mob spawns? Or try out a custom world seed? You can set that during the appliance deployment. I wanted the appliance to be self-configuring so I spent some time extending William Lam’s OVF properties XML file to include all of the Minecraft server attributes that you can configure in the Server.Properties file. This allows you to deploy the appliance and configure the Minecraft environment without having to SSH into it to manually edit the file.

One day, I may trust my kids enough to give them limited access to vCenter to deploy their own servers. This would make it easier for them.

Unfortunately, that day is not today. But this still makes my life easier.

Installing and Configuring Minecraft

The OVF file does not contain the Minecraft Server binaries. It actually gets installed during the appliance’s first boot. There are few reasons for this. First, the Minecraft EULA does not allow you to distribute the binaries. At least that was my understanding of it.

Second, and more importantly, you may not always want the latest and greatest server version, especially if you’re planning to develop or use mods. Mods are often developed against specific Minecraft versions, and they have a lengthy interoperability chart.

The appliance is not built to utilize mods out of the box, but there is nothing stopping someone from installing Forge, Fabric, or other modified binaries. I just don’t feel like taking on that level of effort, and my kids have so far resisted learning important life skills like the Bash CLI.

And finally, there isn’t much difference between downloading and installing the server binary on first boot and downloading and installing an updated binary. Minecraft Java edition is distributed as a JAR file, so I only really need to download it and place it in the correct folder.

I have a pair of PowerShell scripts that make these processes pretty easy. Both scripts have the same core function – query an online version manifest that is used by the Minecraft client and download the specified version to the local machine. The update script also has some extra logic in it to check if the service is running and gracefully stop it before downloading the updated server.jar file.

You can find these scripts in the files directory on GitHub.

Running Minecraft as a systemd Service

Finally, I didn’t want to have to deal with manually starting or restarting the Minecraft service. So I Googled around, and I found a bunch of systemd sample files. I did a lot of testing with these samples (and I apologize, I did not keep track of the links I used when creating my service file) to cobble together one of my own.

My service file has an external dependency. The MCRCON tool is required to shut down the service. While I was testing this, I ran into a number of issues where I could stop Minecraft, but it wouldn’t kill the Java process that spawned with it. It also didn’t guarantee that the world was properly saved or that users were alerted to the shutdown.

By using MCRCON, we can alert users to the shutdown, save the world, and gracefully exit all of the processes through a server shutdown command.

I also have the Minecraft service set to restart on failure. My kids have a tendency to crash the server by blowing up large stacks of TNT in a cave or other crazy things they see on YouTube, and that tends to crash the binary. This saves me a little headache by restarting the process.

Prerequisites

Before we begin, you’ll want to have a couple of prerequisites. These are:

  • The latest copy of HashiCorp’s Packer tool installed on your build machine
  • The latest copy of the Debian 11 NetInstall ISO
  • OVFTool

There are a couple of files that you should edit to match your environment before you attempt the build process these are:

  • Debian.auto.pkrvars.hcl – variables for the build process
  • debian-minecraft.pkr.hcl file – the iso_paths line includes part of a hard-coded path that may not reflect your environment, and you may want to change the CPUs or RAM allocated to the VM.
  • Preseed.cfg file located in the HTTP folder: localization information and root password

This build process uses the Packer vsphere-iso build process, so it talks to vCenter. It does not use the older vmware-iso build process.

The Appliance Build Process

As I mentioned above, I use Packer to orchestrate this build process. There is a Linux shell script in the public GitHub repo called build.sh that will kick off this build process.

The first step, obviously, is to install Debian. This step is fully automated and controlled by the preseed.cfg file that is referenced in the packer file.

Once Debian is installed, we copy over a default Bash configuration and our init-script that will run when the appliance boots for the first time to configure the hostname and networking stack.

After these are files are copied over, the Packer build begins to configure the appliance. The steps that it takes are:

  • Run an apt-get update & apt-get upgrade to upgrade any outdated installed packages
  • Install our system utilities, including UFW
  • Configure UFW to allow SSH and enable it
  • Install VMware Tools
  • Set up the Repos for and install PowerShell and the Temurin OpenJDK
  • Configure the rc.local file that runs on first boot
  • Disable IPv6 because Java will default to communicating over IPv6 if it is enabled

After this, we do our basic Minecraft setup. This step does the following:

  • creates our Minecraft service user and group
  • sets up our basic folder structure in /opt/Minecraft
  • downloads MCRCON into the /opt/Minecraft/tools/mcrcon directory.
  • Copy over the service file and scripts that will run on first boot

The last three steps of the build are to run a cleanup script, export the appliance to OVF, and create the OVA file with the configurable OVF properties. The cleanup script cleans out the local apt cache and log files and zeroes out the free space to reduce the size of the disks on export.

The configurable OVF properties include all of the networking settings, the root password and SSH key, and, as mentioned above, the configurable options in the Minecraft server.properties file. OVFTool and William Lam’s script are required to create the OVA file and inject the OVF properties, and the process is outlined in this blog post.

The XML file with the OVF Properties is located in the postprocess-ova-properties folder in my GitHub repo.

The outcome of this process is a ready-to-deploy OVA file that can be uploaded to a content library.

First Boot

So what happens after you deploy the appliance and boot it for the first time.

First, the debian-init.py script will run to configure the basic system identity. This includes the IP address and network settings, root password, and SSH public key for passwordless login.

Second, we will regenerate the host SSH keys so each appliance will have a unique key. If we don’t do this step, every appliance we deploy will have the same SSH host keys as the original template. This is handled by the debian-regeneratesshkeys.sh script that is based on various scripts that I found on other sites.

Our third step is to install and configure the Minecraft server using the debian-minecraftinstall.sh script. This has a couple of sub-steps. These are:

  • Retrieve our Minecraft-specific OVF Properties
  • Call our PowerShell script to download the correct Minecraft server version to /opt/Minecraft/bin
  • Initialize the Minecraft server to create all of the required folders and files
  • Edit eula.txt to accept the EULA. The server will not run and let users connect without this step
  • Edit the server.properties file and replace any default values with the OVFProperties values
  • Edit the systemd file and configure the firewall to use the Minecraft and RCON ports
  • Reset permissions and ownership on the /opt/Minecraft folders
  • Enable and start Minecraft
  • Configure our Cron job to automatically install system and Minecraft service updates

The end result is a ready-to-play Minecraft VM.

All of the Packer files and scripts are available in my GitHub repository. Feel free to check it out and adapt it to your needs.

What’s In The Studio – Pivoting Community Involvement to Video

As we all start off 2021, I wanted to talk a little about video.

As we all know, 2020 put the kibosh on large, in-person events. This included all of the vendor conferences, internal conferences, and community events like the VMware User Group UserCons and other user groups. Most of these events transitioned to online events with presenters delivering recorded sessions. It also meant more webinars, Zoom meetings, and video conferences.

And it doesn’t look like this will be changing for at least the first half of 2021.

I’ve seen a number of blog and Twitter posts recently about home studios (for example, this great post by Johan van Amersfoort or this Twitter thread from John Nicholson), and I thought I would share my setup.

Background

I was not entirely unprepared to transition to video last year. I had been a photographer since high school, and I made the jump to digital photography in college when Canon released the Digital Rebel. I mainly focused on sports that are played in venues that are a step or two above dimly lit caves. After college, I kind of put the camera down (except for a couple of vacations and trying my hand at a wedding or two which was not my thing). At the beginning of 2020, I decided it was time to get back into photography, thinking I might as well get back into photography since I was traveling ( 😂 ), and pick up a used Canon 6D that was opportunisticly priced. And it also could record video in 1080p.

Slideshow: Some of my photos from years past.

Video was new ground for me, and it resulted in a little lot of experimentation and purchasing in order to get things right. This was also happening at the beginning of the lockdowns when my whole family was at home all day and almost everything I needed was delayed or backordered. Some of this was driven by equipment limitations, which I will cover below, and some of it was driven by other factors.

And as I went through this, I spent a lot of time learning what worked and what didn’t work for me. For example, I found that sitting in front of my laptop trying to record in Zoom didn’t work for me. When recording for a VMUG or VMworld, I wanted to stand and have room to move around because that was what felt natural to me.

Before I go into my setup, I want to echo one point that Johan made in his post. The audio and video gear is there to support the message and enable remote delivery. If you are new to presenting, spend some time learning the craft of storytelling and presentation design. Johan recommended two books by Nancy Duarte – resonate and Slide:ology. I highly recommend these books as well. If you’re new to presenting in general, I also recommend finding a mentor and learning how to use PowerPoint as the graphics capabilities are powerful but intimidating. There are a number of good YouTube videos, for example, on how to do different things in PowerPoint.

Requirements and Constraints

I have primarily used my video gear two different ways. The first was for video conferencing. Whether it was Zoom, Teams, or “Other,” video became a major part of meetings to replace in-person meetings and workshops. The second use case case was the one I probably focused on more – producing recorded content for user groups and conferences, and my goal here was to try and replicate some of the feel of presenting live while taking advantage of the capabilities that video offers.

Most of the recorded video content was for VMUG UserCons. These sessions were 40 minutes long, and they wanted to have presenters on camera along with the slides.

There is a third use case, which didn’t really apply for 2020. This use case was live events such as webinars and video podcast recordings, although my studio kit can be used for this.

I had a few things I needed to consider when planning out my setup. The first was space. I had a few limiting factors when it came to having a space to record. My office was not set up properly for keeping the gear set up permanently, and the furniture arrangement was dictated by where the one outlet was located. (I have installed additional outlets in my office and rearranged.) I also wanted a space that I could record while standing. Both of these factors meant that I would be using common areas to record, so my gear selections would have to be something portable and easy to assemble.

Most of my recording was originally done in my kid’s playroom in my basement.

The second consideration was trying to keep this budget friendly. The key word here is trying. I may have failed there.

I already had a lot of Canon gear from my photography days, so I wanted to reuse it as much as possible. I already had a Canon EOS 6D, and that could record 1080p HD video. Although I did upgrade my camera bodies by trading in old gear, I stayed in the Canon ecosystem as I didn’t want to invest in all new lenses.

I had a copy of Camtasia for screen recording, but combining the Camtasia capture with video recorded in camera would require additional workflow to get the final video together. This would require some sort of video editing software. And I would also need audio and lighting gear. This gear had to fit the requirements and constraints laid out above and be both cost effective and portable.

Studio Gear

My studio setup in my office.

Note: I will be linking to Amazon, Adorama, and other sites in this section. These are NOT affiliate links. I have not monetized my site, and I make no money off of any purchases you choose to make.

Cameras and Lenses

Canon EOS R6 with Canon EF 50mm F/1.4 USM Lens and DC Adapter – Primary Camera

Canon EOS Rebel SL3 with Canon EF 40mm F/2.8 STM Lens and DC Adapter – Secondary/Backup Camera

Note: Both cameras use DC Adapters when set up in the studio because these cameras will eat through their batteries when doing video. Yes, I’ve lost a few hours while waiting for all of my battery packs to recharge.

Audio

Synco Audio WMic-T1 Wireless Lavalier Microphone System (x2) – primary audio

Comica CVM-V30 Pro Shotgun Microphone – Secondary audio

Blue Yeti USB Mic(Note: This is at my desk, but I only use it for recording voiceovers or while on Zoom/Teams/etc calls. If I ever restart my podcast, I will use this for that as well.)

Lighting

Neewer 288 Large LED Video Light Panel (x2)

Viltrox VL-162T Video Light (x2)

Amazon Basics Light Stands (x2)

Other Hardware and Software

Blackmagic Design ATEM Mini Pro ISO – See Below

Davinci Resolve (Note: Davinci Resolve is a free, full featured video editing suite. There is also a paid version, Davinci Resolve Studio, that has a one-time cost of $299. Yes. It’s a perpetual license.)

Camtasia

A note on why I’m using the ATEM Mini Pro ISO

When I started, I was using Camtasia to record my screen while I recorded my presentation using my camera. Creating the final output required a lot of post-processing work to line up the audio and video across multiple sources.

The ATEM Mini Pro ISO allows me to bring together all my audio and video sources into a single device and record each input. So I can bring both cameras, my microphones, and any computers that I’m displaying content on (such as slides or demos) and record all of these inputs to disk. This allows me to record everything on one disk, so I don’t have to worry about managing data on multiple memory cards, and it simplifies my post-production workflow because I don’t have to synchronize everything manually.

There is a second benefit that I haven’t covered. It also allows me to get around a video recording limit built into modern cameras.

Most DSLRs and mirrorless cameras are have a video recording time limit when recording to internal cards. Video segments are limited to approximately 29 minutes and 59 seconds. This limit isn’t due to file size or hardware limitations (although some cameras have shorter time limits due to heat dissipation issues). It’s an artificial limit due to import-duty restrictions that the European Union put on video cameras.

VMUG UserCon sessions are 40 minutes, and I was burned by the 30 minute time limit on a couple of occassions.

That recording time limit only applies when recording to the internal card, though. It does not apply to external devices like the ATEM Mini. In order to use this with a DSLR or mirrorless camera, you need a one that supports sending a clean video feed over HDMI (Clean HDMI Out). Canon has a good video that explains it here. (Note: There are also USB webcam drivers for many modern DSLR and mirrorless cameras that allow you to do the same type of thing with tools like OBS.)

Horizon 8.0 Part 10: Deploying the Unified Access Gateway

And we’re back…this week with the final part of deploying a Horizon 2006 environment – deploying the Unified Access Gateway to enable remote access to desktops.

Before we go into the deployment process, let’s dive into the background on the appliance.

The Unified Access Gateway (also abbreviated as UAG) is a purpose built virtual appliance that is designed to be the remote access component for VMware Horizon and Workspace One.  The appliance is hardened for deployment in a DMZ scenario, and it is designed to only pass authorized traffic from authenticated users into a secure network.

As of Horizon 2006, the UAG is the primary remote access component for Horizon.  This wasn’t always the case – previous Horizon releases the Horizon Security Server.  The Security Server was a Windows Server running a stripped-down version of the Horizon Connection Server, and this component was deprecated and removed with Horizon 2006.

The UAG has some benefits over the Security Server.  First, it does not require a Windows license.  The UAG is built on Photon, VMware’s lightweight Linux distribution, and it is distributed as an appliance.  Second, the UAG is not tightly coupled to a connection server, so you can use a load balancer between the UAG and the Connection Server to eliminate single points of failure.

And finally, multifactor authentication is validated on the UAG in the DMZ.  When multi-factor authentication is enabled, users are prompted for that second factor first, and they are only prompted for their Active Directory credentials if this authentication is successful.  The UAG can utilize multiple forms of MFA, including RSA, RADIUS, and SAML-based solutions, and setting up MFA on the UAG does not require any changes to the connection servers.

There have also been a couple of 3rd-party options that could be used with Horizon. I won’t be covering any of the other options in this post.

If you want to learn more about the Unified Access Gateway, including a deeper dive on its capabilities, sizing, and deployment architectures, please check out the Unified Access Gateway Architecture guide on VMware Techzone.

Deploying the Unified Access Gateway

There are two main ways to deploy the UAG.  The first is a manual deployment where the UAG’s OVA file is manually deployed through vCenter, and then the appliance is configured through the built-in Admin interface.  The second option is the PowerShell deployment method, where a PowerShell script and OVFTool are used to automatically deploy the OVA file, and the appliance’s configuration is injected from an INI file during deployment.

Typically, I prefer using the PowerShell deployment method.  This method consists of a PowerShell Deployment Script and an INI file that contains the configuration for each appliance that you’re deploying.  I like the PowerShell script over deploying the appliance through vCenter because the appliance is ready to use on first boot. It also allows administrators to track all configurations in a source control system such as Github, which provides both documentation for the configuration and change tracking.  This method makes it easy to redeploy or upgrade the Unified Access Gateway because I rerun the script with my config file and the new OVA file.

The PowerShell script requires the OVF Tool to be installed on the server or desktop where the PowerShell script will be executed.  The latest version of the OVF Tool can be downloaded from MyVMware.  PowerCLI is not required when deploying the UAG as OVF Tool will be deploying the appliance and injecting the configuration.

The zip file that contains the PowerShell scripts includes sample templates for different use cases.  This includes Horizon use cases with RADIUS and RSA-based multifactor authentication.  You can also find the reference guide for all options here.

If you haven’t deployed a UAG before, are implementing a new feature on the UAG, or you’re not comfortable creating the INI configuration file from scratch, then you can use the manual deployment method to configure your appliance and then export the configuration in the INI file format that the PowerShell deployment method can consume.  This exported configuration only contains the appliance’s Workspace ONE or Horizon configuration – you would still have to add in your vSphere and SSL Certificate configuration.

You can export the configuration from the UAG admin interface.  It is the last item in the Support Settings section.

UAG-Config-Export

One other thing that can trip people up when creating their first UAG deployment file is the deployment path used by OVFTool.  This is not always straightforward, and vCenter has some “hidden” objects that need to be included in the path.  OVFTool can be used to discover the path where the appliance will be deployed.

You can use OVFTool to connect to your vCenter with a partial path, and then view the objects in that location.  It may require multiple connection attempts with OVFTool to build out the path.  You can see an example of this over at the VMwareArena blog on how to export a VM with OVFTool or in question 8 in the troubleshooting section of the Using PowerShell to Deploy the Unified Access Gateway guide.

Before deploying the UAG, we need to get some prerequisites in place.  These are:

  1. Download the Unified Access Gateway OVA file, PowerShell deployment script zip file, and the latest version of OVFTool from MyVMware.
  2. Right click on the PowerShell zip file and select Properties.
  3. Click Unblock.  This step is required because the file was downloaded from the Internet, and is untrusted by default, and this can prevent the scripts from executing after we unzip them.
  4. Extract the contents of the downloaded ZIP file to a folder on the system where the deployment script will be run.  The ZIP file contains multiple files, but we will only be using the uagdeploy.ps1 script file and the uagdeploy.psm1 module file.  The other scripts are used to deploy the UAG to Hyper-V, Azure, and AWS EC2.The zip file will also contain a number of default templates.  When deploying the access points for Horizon, I recommend starting with the UAG2-Advanced.ini template.  This template provides the most options for configuring Horizon remote access and networking.  Once you have the UAG deployed successfully, I recommend copying the relevant portions of the SecurID or RADIUS auth templates into your working AP template.  This allows you to test remote access and your DMZ networking and routing before adding in MFA.
  5. Before we start filling out the template for our first access point, there are some things we’ll need to do to ensure a successful deployment. These steps are:
    1. Ensure that the OVF Tool is installed on your deployment machine.
    2. Locate the UAG’s OVA file and record the full file path.  The OVA file can be placed on a network share.
    3. We will need a copy of the certificate, including any intermediate and root CA certificates, and the private key in PFX or PEM format.  Place these files into a folder on the local or network folder and record the full path.If you are using PEM files, the certificate files should be concatenated so that the certificate and any CA certificates in the chain are in one file, and the private key should not have a password on it.  If you are using PFX files, you will be prompted for a password when deploying the UAG.
    4. We need to create the path to the vSphere resources that OVF Tool will use when deploying the appliance.  This path looks like: vi://user@PASSWORD:vcenter.fqdn.orIP/DataCenter Name/host/Host or Cluster Name/OVF Tool is case sensitive, so make sure that the datacenter name and host or cluster names are entered as they are displayed in vCenter.

      The uppercase PASSWORD in the OVFTool string is a variable that prompts the user for a password before deploying the appliance.  If you are automating your deployment, you can replace this with the password for the service account that will be used for deploying the UAG.

      Note: I don’t recommend saving the service account password in the INI files. If you plan to do this, remember best practices around saving passwords in plaintext files and ensure that your service account only has the required permissions for deploying the UAG appliances.


    5. Generate the passwords that  you will use for the appliance Root and Admin passwords.
    6. Get the SSL Thumbprint for the certificate on your Connection Server or load balancer that is in front of the connection servers.
  6. Fill out the template file.  The file has comments for documentation, so it should be pretty easy to fill out. You will need to have a valid port group for all three networks, even if you are only using the OneNic deployment option.
  7. Save your INI file as <UAGName>.ini in the same directory as the deployment scripts.

There is one change that we will need to configure on our Connection Servers before we deploy the UAGs – disabling the Blast and PCoIP secure gateways.  If these are not disabled, the UAG will attempt to tunnel the user protocol session traffic through the Connection Server, and users will get a black screen instead of a desktop.

The steps for disabling the gateways are:

  1. Log into your Connection Server admin interface.
  2. Go to Settings -> Servers -> Connection Servers
  3. Select your Connection Server and then click Edit.
  4. Uncheck the following options:
    1. Use Secure Tunnel to Connect to machine
    2. Use PCoIP Secure Gateway for PCoIP connections to machine
  5. Under Blast Secure Gateway, select Use Blast Secure Gateway for only HTML Access connections to machine.  This option may reduce the number of certificate prompts that users receive if using the HTML5 client to access their desktop.
  6. Click OK.

Connection Server Settings

Once all of these tasks are done, we can start deploying the UAGs.  The steps are:

  1. Open PowerShell and change to the directory where the deployment scripts are stored.
  2. Run the deployment script.  The syntax is .\UAGDeploy.ps1 –inifile <apname>.ini
  3. Enter the appliance root password twice.
  4. Enter the admin password twice.  This password is optional, however, if one is not configured, the REST API and Admin interface will not be available.
    Note: The UAG Deploy script has parameters for the root and admin passwords.  These can be used to reduce the number of prompts after running the script.
  5. If RADIUS is configured in the INI file, you will be prompted for the RADIUS shared secret.
  6. After the script opens the OVA and validates the manifest, it will prompt you for the password for accessing vCenter.  Enter it here.
  7. If a UAG with the same name is already deployed, it will be powered off and deleted.
  8. The appliance OVA will be deployed.  When the deployment is complete, the appliance will be powered on and get an IP address from DHCP.
  9. The appliance configuration defined in the INI file will be injected into the appliance and applied during the bootup.  It may take a few minutes for configuration to be completed.

image

Testing the Unified Access Gateway

Once the appliance has finished it’s deployment and self-configuration, it needs to be tested to ensure that it is operating properly. The best way that I’ve found for doing this is to use a mobile device, such as a smartphone or cellular-enabled tablet, to access the environment using the Horizon mobile app.  If everything is working properly, you should be prompted to sign in, and desktop pool connections should be successful.

If you are not able to sign in, or you can sign in but not connect to a desktop pool, the first thing to check is your firewall rules.  Validate that TCP and UDP ports 443, 8443 and 4172 are open between the Internet and your Unified Access Gateway.  You may also want to check your Connection Server configuration and ensure that HTTP Secure Gateway, PCoIP Secure Gateway, and Blast Secure Gateway are disabled.

If you’re deploying your UAGs with multiple NICs and your desktops live in a different subnet than your UAGs and/or your Connection Servers, you may need to statically define routes.  The UAG typically has the default route set on the Internet or external interface, so it may not have routes to the desktop subnets unless they are statically defined.  An example of a route configuration may look like the following:

routes1 = 192.168.2.0/24 192.168.1.1,192.168.3.0/24 192.168.1.1

If you need to make a routing change, the best way to handle it is to update the ini file and then redeploy the appliance.

Once deployed and tested, your Horizon infrastructure is configured, and you’re ready to start having users connect to the environment.

Horizon 8.0 Part 9: Creating Your First Desktop Pool

This week, we’re going to talk about desktop pools and how to create your first desktop pool in your new Horizon environment.

Desktop Pools – Explained

So what is a desktop pool?

Desktop pools are a logical grouping of virtual machines that users can access, and these groupings control specific settings about the pool. This includes how the desktops are provisioned and named, protocols that are available for connectivity, and what physical infrastructure they are deployed on.

Horizon has a few different types of desktop pools.  Each pool handles desktops in different ways, and they each have different purposes.  The type of pool that you select will be determined by a number of factors including the use case, the storage infrastructure and application requirements.

The type of desktop pools are:

  • Full Clone Pools – Each virtual desktop is a full virtual machine cloned from a template in vCenter.  The virtual machines require a desktop management tool for post-deployment management.  VMs are customized using existing Guest Customization Specifications. These desktops usually persist after the user logs out.
  • Linked Clone Pools – Each virtual desktop is based on a parent VM snapshot and shares its disk with the parent virtual machine.  Changes to the linked clone are written to a delta disk.  The virtual machines are managed by View Composer.   Linked Clone desktops can be Floating or Dedicated assignment, and they can be configured to be refreshed (or rolled back to a known good snapshot) or deleted on logoff. Linked Clone desktops are officially deprecated in Horizon 2006, and they will be removed in a future release.
  • Instant Clone Pools – Each virtual desktop is based on a parent VM snapshot. The snapshot is cloned to a VM that is deployed to each host, powered up, and then stunned. All guest VMs are then “forked” from this VM and quickly customized. Guest VMs share virtual disks and initial memory maps with the parent VMs.  VMs are managed by vCenter and a “next generation” Composer that is built into the Connection Servers.
  • Manual Pools – The machines that make up the manual pool consist of virtual and/or physical machines that have had the View Agent installed.  These machines are not managed by Horizon.
  • Remote Desktop Session Host Pool – The machines that make up these pools are Windows Servers with the Remote Desktop Session Host Role installed.  They can be provisioned as linked clones or manually, and they are used for published desktops and published applications.

There is one other choice that needs to be selected when creating a desktop pool, and that is the desktop assignment type.  There are two desktop assignment types:

  • Floating Assignment – Desktops are assigned to users at login and are returned to the pool of available desktops when the user signs out.
  • Dedicated Assignment – Desktops are assigned to a user, and the user gets the same desktop at each login.  Desktops can be assigned automatically at first login or manually by an administrator.

Creating Your Desktop Image

Before you can create a desktop pool, you need to have configured a desktop virtual machine with all of your applications and optimizations configured.  This virtual machine will be the template or gold pattern for all of the virtual machines that Horizon will deploy as part of the pool.

The virtual desktop template details, including the virtual machine specifications and installed applications, will depend on the results of any use case definition and desktop assessment exercises that are performed during the project’s design phase.  

I won’t cover how to create a desktop or RDSH template in this series.  Instead, I recommend you check out the Building an Optimized Windows Image guide on VMware Techzone or Graeme Gordon‘s session from VMworld – DWHV1823 Creating and Optimizing a Windows Image for VDI and Published Applications.

Creating A Desktop Pool

For this walkthrough, I will be doing an Automatic Floating Assignment Instant-Clone desktop pool.  These are otherwise known as Non-Persistent desktops because the desktop is destroyed when the user signs out.

If you’re familiar with previous versions of the series, you’ll notice that there are more screens and the order that some steps are performed in has changed.  Please note that some of the menu options will change depending on the type of desktop pool you’re provisioning.

1. Log into the Horizon 7 Administrator.  Under Inventory, select Desktops.

2.  Click Add to add a new pool.

3. Select the Pool Type that you want to create.  For this, we’ll select Automated Pool and click Next.

Note: In some environments, you may see the following error if you’re using Instant Clones when View Storage Accelerator is disabled. 

4.  Choose the type of virtual machines that will be deployed in the environment. For this walkthrough, select Instant Clone. If you have multiple vCenter Servers in your environment, select the vCenter where the desktops will be deployed. Click Next.

5. Select whether you want to have Floating or Dedicated Desktops. For this walkthrough, we’ll select Floating and click Next.

Note: The Enable Automatic Assignment option is only available if you select Dedicated. If this option is selected, View automatically assigns a desktop to a use when they log in to dedicated pool for the first time.

6. Select whether VSAN will be used to store desktops that are provisioned by Horizon.  If VSAN is not being used, select the second option – “Do Not Use VSAN.

If you want to store the Instant Clone replica disks that all VMs are provisioned from on different datastores from the VMs, and you are not using VSAN, select the Use Separate Datastores for Replica and Data Disks.

7. Each desktop pool needs an ID and, optionally, a Display Name.  The ID field is the official name of the pool, and it cannot contain any spaces.  The Display Name is the “friendly” name that users will see when they select a desktop pool to log into.  You can also add a description to the pool.

8. Configure the provisioning settings for the pool.  This screen allows you to control provisioning behavior, computer names, and the number of desktops provisioned in the pool.

9. After configuring the pool’s provisioning settings, you need to configure the pool’s vCenter settings.  This covers the Parent VM and the snapshot that the Instant Clones will be based on, the folder that they will be stored in within vCenter, and the cluster, datastores, and, optionally, the networks that will be used when the desktops are deployed.

In order to configure each setting, you will need to click the Browse button on the right hand side of the screen.  These steps must be completed in order.

9-A. First, select the parent VM that the Instant Clone desktops will be based on.  Select the VM that you want to use and click Submit.

9-B. The next step is to select the Parent VM snapshot that the Instant Clone desktops will be based on.  Select the snapshot that you want to use and click OK.

9-B. After you have selected a Parent VM and a snapshot, you need to configure the vCenter folder in the VMs and Templates view that the VMs will be placed in.  Select the folder and click OK.

9-D. The next step is to place the pool on a vSphere cluster.  The virtual machines that make up the desktop pool will be run on this cluster, and the remaining choices will be based on this selection.  Select the cluster that they should be run on and click OK.

9-E. The next step is to place the desktops into a Resource Pool.  In this example, I have not resource pools configured, so the desktops would be placed in the Cluster Root.

9-F. Next, you will need to pick the datastores that the desktops will be stored on. 

9-G. When using Instant Clone destops, you will have the option to configure the network or networks that the desktops are deployed onto. By default, all desktops are deployed to the same network as the parent VM, but administrators have the ability to optionally deploy virtual desktops to different networks.

10. After configuring the vCenter settings, you need to configure the Desktop Pool settings. These settings include:

  • Desktop Pool State – Enabled or Disabled
  • Connection Server Restrictions
  • Pool Session Types – Desktop only, Published Applications, or Both
  • Disconnect Policy
  • Cloud Management – Enable the pool to be consumed by the Universal Broker service and entitled from the Horizon Cloud Service

12. Configure the remote display settings. This includes choosing the default display protocol, allowing users to select a different protocol, and configuring the 3D rendering settings such as enabling the pool to use NVIDIA GRID vGPU. Administrators can also choose to enable Session Collaboration on the pool.

13. Configure Guest Customization settings by selecting the domain that the provisioned desktops will join, the OU where the accounts will be placed and any scripts that will be run after provisioning.

14. Review the settings for the pool and verify that everything is correct.  Before you click Finish, check the Entitle Users checkbox in the upper right.  This will allow you to select the users and/or groups who have permission to log into the desktops.

15. After you click Finish, you will need to grant access to the pool.  View allows you to entitle Active Directory users and groups.  Click Add to entitle users and groups.

16. Search for the user or group that you want to add to entitle.  If you are in a multi-domain environment, you can change domains by selecting the domain from the Domains box.  Click on the users or groups that you want to grant access to and click OK.

Note:  I recommend that you create Active Directory security groups and entitle those to desktop pools.  This makes it easier to manage a user’s pool assignments without having to log into View Administrator whenever you want to make a change.

17. Review the users or groups that will be entitled to the pool, and click OK.

19. You can check the status of your desktop pool creation in vCenter.  If this is a new pool, it will need to complete the Instant Clone provisioning process. To learn more about the parent VMs that are provisioned when Instant Clone pools are created, please see this article for traditional instant clones or this video for Instant Clone pools using Smart Provisioning.

Once the desktops have finished deploying, you will be able to log into them through the Horizon HTML5 Client or the Horizon Client for your endpoint’s platform.

I realize that there are a lot of steps in the process of creating a desktop pool.  It doesn’t take nearly as long as it seems once you get the hang of it, and you will be able to fly through it pretty quickly.