I have a 3 node HA cluster of enterprise hardware which runs a small home business and a bunch of personal VM’s backed by iSCSI ZFS LUNS. It’s extremely underutilized at this point, but I’m looking to expand my usage.
I have a bit of money to spend on upgrades, and I am looking into the feasibility of procuring some GPU’s. My budget is in the neighborhood of $600-1000. The main factor is I have 3 hosts. Because I have HA (and have experimented with ProxLB, which I am not using right now) all of my VM’s are built as if they can freely migrate, and that’s been fine so far, but all my reading about adding GPU’s has left me more confused than when I started. As for starting goals, I want to add GPU transcoding to my Jellyfin instance, dedicate a GPU to immich for doing ML on my library, add a GPU to a VDI VM I use as a virtual workspace, and potentially use local language models for some development which is not yet defined.
I’ve read a ton on using nvidia GPU’s with vGPU’s, but I have a lot of concerns. I do not have the funds to buy an actual vgpu license, and doing the fastapi-dls thing feels a little too unreliable to justify a large spend. A P40 would be in the budget range, but without tensor cores and with the licensing complexity I’m not sure it’s a good investment. Additionally, the documentation on how that would work with HA hasn’t clicked; I don’t even see HA mentioned here: https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE . Most threads on using non-nvidia GPU’s has led me to believe it isn’t really worth it.
Another option I though of was using multiple lower-cost GPU’s (like a Tesla P4) and just assign full cards to the VM’s; my reading though is I’m going to break things with HA, and I’ll have to use resource mapping which is more complex and outside my comfort zone: https://pve.proxmox.com/wiki/QEMU/KVM_Virtual_Machines#resource_mapping I also don’t have the tensor cores with this option.
It seems like I have a lot of options, but at this point I’m not sure what’s the best way to proceed. I don’t want to spend $600 and end up with something causing more headaches than it’s bringing value. I’d like to hear people’s advice, experience, and recommendations especially if there’s something I’m missing here that my reading hasn’t uncovered.


Yes, you have noticed that HA with GPU’s is very, very difficult. My understanding is that most people have given up, where they use something like kubernetes and just kill and restart the application on another machine, instead of truly failing over a virtual machine/container.
Now, you’ve stated that you can’t afford enterprise grade gpu’s. That’s okay. You should know that there exist projects to unlock vgpu features on non-enterprise/server grade nvidia gpus.
this is the main one: https://github.com/DualCoder/vgpu_unlock — but it only supports 2080 nvidia gpus and below. It looks like there is some work on getting 30/40 series nvidia gpus working, but I cannot find a public guide. It should be noted that only the 30/40 series has tensor cores!
Now, even if you cannot get vgpu working, you should also know that although you cannot share a virtual machine between virtual machines, you can can share a gpu between LXC containers, which proxmox supports management of.
I also want to clarify something about the way HA in proxmox works. Live migration only works when the relevant proxmox hosts are online. Failover, which is a form of high availability, happens when a host goes offline, and the new virtual machine reboots from the shared storage in use.
My honest recommendation is to give up on HA for this. Going for minimal cost, I would buy one cheaper GPU (intel arc’s offer best bang for buck right now iirc) which is dedicated to VDI and jellyfin hardware decoding (but this only works if your VDI and jellyfin are in LXC containers… since there is also no vgpu), and buy one more expensive nvidia gpu with tensor cores for machine learning and AI workflows.
A big part of my HA strategy was for evaluation and Dev of proxlb as a DRS equivalent, as we plan for offboarding from VMware at work and this lab is my work case for Proxmox as our landing spot. All of my apps are VMs and not containers. I have 3 PCI slots per host open, and they are R450s which I find to be tricky adding powered cards, which is a point for the P4 even though I lose the tensor cores.
I just have a hard time committing a bunch of money that will be dependent on a hack to work for hardware enablement. I’m kind of at the same conclusion, that tossing a P4 in each host and assigning those 3 VMs as non-HA is probably the best path forward. I can wait though to see if some new solution comes forward.