vSphere has been a core component of my home lab for at least 15 years. I’ve run all sorts of workloads on it ranging from Microsoft Exchange in the earliest days of my lab to full-stack VDI environments to building my own cloud for self-hosting Minecraft Servers and replacing paid SaaS apps with open source alternatives.
So last year, I made a hard decision to migrate my home lab off of VMware products and onto other platforms. There were multiple factors that drove this decision. Some of these decision points were:
- A desire to simplify my lab infrastructure
- Removing licensing dependencies like community/influencer program membership for my lab’s core infrastructure
- Support EUC solutions like Dizzion Frame that were not available on my platform
- Didn’t want to rebuild everything from scratch
I realized early on that this would need to be a lab migration. I had too many things in my lab that were “production-ish” in the sense that my family would miss them if they went away. There were also budget, hardware, and space constraints, so I wouldn’t be able to run the two full stacks to run services in parallel during this process even if I had wanted to rebuild services.
I also wanted to use the migration to rationalize aspects of the lab. Some parts of my lab had been operating for over a decade and had built up a lot of kruft. This would be a good opportunity to retire those parts of the lab and rebuild those instances if those services were needed.
Approach to Migration and Defining Requirements
Since my lab had taken on a “production-ish” quality, I wanted to approach the migration like I would have approached a customer or partner’s project. I evaluated my environment; documented my requirements, constraints, risks, and future state architectures; and developed an implementation and migration plan.
Some of the key requirements and constraints that I identified for my main workload cluster were:
- Open-Source or community licensing not constrained by NFRs or program membership
- Live migration support
- Utilize hyperconverged storage to simplify lab dependencies
- Hypervisor-level backup integration, particularly Veeam as I was using Community Edition in my lab
- Hashicorp Packer support
- Migration tool or path for vSphere workloads
- I was out of 10 GbE ports, so I could only add 1 new server at a time after retiring older hardware
The leading candidates for this project were narrowed down to Nutanix Community Edition, Proxmox, and XCP-NG. I selected Nutanix Community Edition because it checked the most boxes. It was the easiest to deploy, it had Nutanix Move for migrating from vSphere, and when I started this project in 2024, it was the only option that fit my licensing requirements that Veeam supported.
You’ll notice that EUC vendor support is not on the list. While trying out other EUC tools in my lab was a driver for changing my lab, I didn’t want to make that a limiting factor for my main workload cluster. This cluster would also be running my self-hosted applications, Minecraft servers, and other server workloads. Licensing and backup software support were bigger factors. I could always stand up a single node of a hypervisor to test out solutions if I needed to, although this became a moot point when I selected Nutanix.
I identified two major outcomes that I wanted to achieve. The first, as noted in the requirements, was to remove licensing dependencies for my lab. I didn’t want to be forced into having to migrate to a new platform in the future because of NFR licensing or community program changes. The lab needed to stand on it’s own.
The second outcome was to reduce my lab’s complexity. My lab had evolved to be a small-scale mirror of the partners I used to cover, and that lead to a lot of inherent complexity. Since I was no longer covering cloud provider partners, I could remove some of this complexity to make lab and workload management easier.
I ended up buying one new host for my lab. At the time I started this, my lab was a mix of Dell PowerEdge R430s, R620s, and R630s for virtual workloads. I wanted to run Nutanix on the newest hardware that I had available. Nutanix CE does not support 2-node clusters, and I wanted to have matching hardware configs for that cluster to simplify management.
Workload Migration
After deploying Nutanix CE, Prism Central, and Nutanix Move, I started planning my workload migration.
Migrating my workloads proved to be the easiest part of this process.Nutanix Move just worked for migrating workloads. I had a few minor challenges with my test workloads that I was able to address these with some preplanning.
What challenges did I encounter? The biggest challenges were with my Debian servers, and they came in two forms. The first was that the main network interface was renamed from ens192 to ens3 when changing hypervisors, and I worked around this by renaming the interface in the networking config before shutting down the servers and migrating off of vSphere.
The second challenge was due to how I deployed my Debian VMs. I built my Debian templates as OVA files, and I used OVF properties to configure my servers at deployment. There were scripts that ran on boot that would read the OVF properties and configure the default user, network stack, and some application information like Minecraft server properties. These servers needed to be disabled or removed prior to migration because they would error out if there were no OVF properties to read. Nutanix does not support OVF properties, and those attributes would be stripped from the VM during the migration.
Once I worked around these issues, I was able to accelerate the migration timeline and move all of my server workloads in a few days.
Impact on Workflows and Processes
Major infrastructure changes will impact your workflows and processes. And my lab migration was no exception to this.
The largest impact was how I built and deployed VMs. I use Packer to build my Debian and Windows templates. I reused most of my Packer builds, but I had to make a few small adjustments to jobs for them to function with Nutanix. There were also some functionality differences that were resolved wtih two pull requests to add missing functionality for Debian and Windows OS builds.
My Debian build and deploy process changed in two ways. First, I had to change how I consumed Debian. On vSphere, I would build each image from scratch using the latest Debian ISO and a custom installer file that was pulled down from a local webserver. The Nutanix Packer could not send keystrokes to the VM, so I was unable to load my custom installer configuration file.
I switched to using the prebuilt Debian Generic images, but this change had two further impacts. First, these images assumed that the VM had a serial port for console debugging. The Nutanix Packer plugin did not support adding a serial port to a VM, so I submitted a pull request to add this feature. The second impact was that I needed to learn Cloud-Init to inject a local user account and network configuration to build the VM. This was a good change since I also needed cloud-init to configure any VMs I deployed from these Debian templates.
I faced two small, but easily fixed, challenges with my Windows build process. The first challenge is that the Nutanix Packer Plugin did not support changing the boot order when the VM used UEFI. The Nutanix API supported this. However, the Packer plugin had never been updated which resulted in my second pull request. The other challenge I’ve encountered is that Nutanix does not support virtual floppy disk drives for the Windows Installer configuration files and initial configuration scripts, and this is easily solved by having Packer create an ISO for these files using xorriso or a similar tool.
I also ran into an issue with Veeam that delayed my implementation, but that had more to do with the network design choices I made for my self-hosted applications than anything specific to Veeam or AHV. That issue had to do with a legacy network design that I carried over from VMware Cloud Director that I should have ditched when I removed VCD from my lab but kept around because I was being lazy.
In an enterprise environment, these minor issues would have been found and addressed during an evaluation or very early in the migration process. But since I am running my lab as a single person, these issues were discovered after the migration and took longer to resolve than expected because I was juggling multiple tasks.
Lessons Learned
A hypervisor or cloud migration is a large project, and there were some key lessons that I learned from it. For my workloads and environment, the workload migration was the easy part. VMs are designed to be portable, and tools exist to make this process easy.
The hard part was everything around and supporting my workload. Automation, backup, monitoring…those pieces are impacted the most by platform changes. The APIs are different, and the capabilities of your tooling may change due to API changes. I’ve spent more time rebuilding automation that I had built over years than actually moving my workloads.
Those changes are also an opportunity to introduce new tools or capabilities. A migration may expose gaps that were papered over by manual processes or quick-fix scripts, and you can use this opportunity to replace them with more robust solutions. As I talked about above, I had been using OVF Properties for configuring Linux VMs instead of good configuration management practices. This change not only forced me to use Cloud-Init, but I’ve started to introduce Ansible for configuratoin management.
Here is what I would recommend to organizations that are considering a platform change or migration.
- Do your homework up front to make sure you understand your workloads, business and technical requirements, and your business IT ecosystem
- Get a partner involved if you don’t have a dedicated architecture team or the manpower to manage a migration while putting out fires. They can facilitate workshops, get vendors involved to answer questions, and act as a force multiplier for your IT team.
- Evaluate multiple options. One size does not fit all organizations or use cases, and you may find the need to run multiple platforms for business or technical reasons
- Test and update any integrations and automation before you start migrating workloads. Putting the work in up front will ensure that you can mark the project complete as soon as that last workload is migrated over.
In my case, I didn’t do #4. I was under a crunch due to space and budget limitations and using NFR licenses in my lab, and I wanted to move my workloads before those licenses expired.
If you have questions about hypervisor migrations or want to share your story, please use the contact me link. I would love to hear from you.