Deploying with OpenTofu on LINSTOR DRBD: the beginning of problems
After choosing LINSTOR DRBD as the distributed storage solution for my Proxmox HA cluster, it was time to move on to deployment automation with OpenTofu (open-source fork of Terraform). Spoiler: it didn't go as planned.
Context Recap
In my previous article about choosing a distributed storage technology, I opted for LINSTOR DRBD over Ceph for several reasons:
- Superior performance: LINSTOR DRBD uses synchronous block-level replication via DRBD, offering better performance than Ceph on a 1 Gbps network
- Simpler architecture: No need for monitors, managers, and OSDs like with Ceph
- Resource consumption: Lighter on RAM and CPU
My infrastructure consists of:
- 3 Proxmox nodes: acemagician (192.168.100.10), elitedesk (192.168.100.20), thinkpad (192.168.100.30)
- LINSTOR configuration: thinkpad = controller, acemagician/elitedesk = satellites
- Storage pools:
linstor_storage(DRBD-replicated),local-lvm(local storage)
The Problem: LINSTOR Cannot Dynamically Provision VMs
When I attempted to deploy my K3s VMs with OpenTofu using the Proxmox provider, I encountered a systematic error:
Error: error creating VM: error cloning vm: 500 Internal Server Error:
unable to create VM 1000 - lvcreate 'linstor_storage/vm-1000-disk-0' error:
Exited with error code 1: Resource definition 'vm-1000-disk-0' not found.
My OpenTofu configuration was standard:
resource "proxmox_vm_qemu" "k3s_server_1" {
name = "k3s-server-1"
target_node = "acemagician"
clone = "ubuntu-2404-cloudinit"
vmid = 1000
disk {
storage = "linstor_storage"
size = "100G"
}
}
Why Does It Fail?
LINSTOR uses a strict hierarchical object model:
- Resource Definition: Template for a storage resource
- Volume Definition: Volume size and properties
- Resource: Actual storage instance on a node
The normal LINSTOR workflow requires manually creating these objects:
linstor resource-definition create mydata
linstor volume-definition create mydata 100G
linstor resource create node1 mydata --storage-pool linstor_storage
linstor resource create node2 mydata --storage-pool linstor_storage
The problem: When Proxmox attempts to clone a VM to LINSTOR:
- Proxmox calls
lvcreateto create the new disk - The LINSTOR plugin searches for the resource definition
vm-XXX-disk-0 - The resource definition doesn't exist (Proxmox assumes dynamic creation)
- The operation fails
LINSTOR cannot dynamically create resource definitions during Proxmox clone operations. This is a fundamental architectural limitation.
Workaround Attempts
I tried several approaches:
1. Explicit Full Clone
resource "proxmox_vm_qemu" "k3s_server_2" {
name = "k3s-server-2"
target_node = "elitedesk"
clone = "ubuntu-2404-cloudinit"
vmid = 1001
full_clone = true # Explicit full clone
disk {
storage = "linstor_storage"
size = "100G"
}
}
Result: Same error - "Resource definition not found"
2. Linked Clone
full_clone = false # Attempt a linked clone
Result:
Error: 400 Bad Request:
Parameter verification failed. (400)
storage: linked clone feature is not supported for 'linstor_storage'
This makes sense: DRBD doesn't support snapshots, which are required for linked clones.
Considered Solutions
Faced with this problem, I'm torn between several options:
Option 1: Pre-creation Script for Resource Definitions
Create a bash or Python script that runs before OpenTofu to pre-create the resource definitions:
#!/usr/bin/env python3
import subprocess
import json
def create_linstor_resource(vm_id, size_gb, nodes):
"""Creates a LINSTOR resource definition for a VM"""
resource_name = f"vm-{vm_id}-disk-0"
# Create the resource definition
subprocess.run([
"linstor", "resource-definition", "create", resource_name
], check=True)
# Create the volume definition
subprocess.run([
"linstor", "volume-definition", "create",
resource_name, f"{size_gb}G"
], check=True)
# Create the resource on each node
for node in nodes:
subprocess.run([
"linstor", "resource", "create",
node, resource_name,
"--storage-pool", "linstor_storage"
], check=True)
# Create resources for K3s VMs
vms = [
{"id": 1000, "size": 100, "nodes": ["acemagician", "elitedesk"]},
{"id": 1001, "size": 100, "nodes": ["elitedesk", "thinkpad"]},
{"id": 1002, "size": 20, "nodes": ["thinkpad", "acemagician"]},
]
for vm in vms:
create_linstor_resource(vm["id"], vm["size"], vm["nodes"])
Advantages:
- Keeps LINSTOR DRBD usage
- Allows automation via CI/CD
- Maintains DRBD's superior performance
- Preserves distributed storage for VMs
Disadvantages:
- Additional complexity in the pipeline
- Risk of desynchronization between script and OpenTofu configuration
- Requires rigorous management of VMIDs and their allocation
Option 2: Manual Creation of Resource Definitions
Manually create LINSTOR resources before each deployment:
# For k3s-server-1 (VMID 1000)
linstor resource-definition create vm-1000-disk-0
linstor volume-definition create vm-1000-disk-0 100G
linstor resource create acemagician vm-1000-disk-0 --storage-pool linstor_storage
linstor resource create elitedesk vm-1000-disk-0 --storage-pool linstor_storage
Advantages:
- Simple and straightforward solution
- Full control over LINSTOR resources
Disadvantages:
- Loss of Infrastructure as Code: Configuration drift guaranteed
- Loss of GitOps: No traceability in git
- Not automatable: Manual intervention for each deployment
- Not scalable: Impossible for frequent deployments
This option completely contradicts my automation goals. This solution should be ruled out.
Option 3: Partition NVMe Drives (Local Storage + LINSTOR)
Partition the NVMe drives on each node into two parts:
- One partition for local LVM storage (
local-lvm) - One partition for the LINSTOR DRBD pool (
linstor_storage)
Then use local-lvm for VM disks (simple provisioning) and linstor_storage for other needs requiring replication.
Important note for my Kubernetes use case: Using local-lvm (without Proxmox-level replication) is viable for a Kubernetes cluster because Kubernetes handles high availability, not Proxmox. With etcd distributed across 3 nodes and a replicated control plane, the loss of a VM doesn't impact the cluster - Kubernetes continues to function with the remaining nodes. VMs become "cattle" (replaceable via Infrastructure as Code) while real "pets" data (precious) would reside in application-level storage solutions.
Advantages:
- Simple and fast VM provisioning on
local-lvm - Preservation of LINSTOR DRBD for distributed storage needs
- Optimal use of available hardware
- Maximum performance for VMs (direct local access)
- HA ensured at the right level: Kubernetes, not Proxmox
Disadvantages:
- Setup complexity: Disk repartitioning required
- Risk of data loss: Invasive operation on existing disks
- Capacity planning: Need to determine partition size in advance
- Less flexibility: Fixed partition sizes, difficult to modify
- No HA at Proxmox level: VMs no longer benefit from replication (acceptable if HA at Kubernetes level)
Option 4: Migrate to Ceph with Network Upgrade
Abandon LINSTOR DRBD and migrate to Ceph, upgrading the network to 5 Gbps (or 10 Gbps if budget allows):
Advantages:
- Native support for dynamic provisioning in Proxmox
- Perfect integration with OpenTofu/Terraform
- Mature and well-documented ecosystem
- Snapshots and clones natively supported
- Acceptable performance with a 5 Gbps NIC
Disadvantages:
- Hardware cost: Purchase of 5 Gbps (or 10 Gbps) network cards for the 3 nodes
- Increased complexity: Monitors, Managers, OSDs to manage
- Resource consumption: More demanding on RAM and CPU than LINSTOR
- Complete migration: Reconstruction of existing storage
- Still inferior performance: Even with 5 Gbps, greater overhead than DRBD
My Current Thinking
I'm currently torn between these options:
Option 1 (Script) appeals to me because it preserves LINSTOR and automates everything. With fixed VMIDs (1000, 1001, 1002), the script would be relatively simple to maintain. Just need to ensure the script runs before OpenTofu in the CI/CD pipeline.
Option 3 (Partitioning) is technically interesting but very invasive. Repartitioning NVMe drives in production is risky, and I lose high availability at the Proxmox level for the VMs themselves. However, in my Kubernetes context, this isn't necessarily a problem since HA is managed at the K3s cluster level, not at the individual VM level. If a VM goes down, Kubernetes continues to function with the other nodes.
Option 4 (Ceph + network upgrade) solves all technical problems but involves a hardware investment. A 5 Gbps switch + 3 network cards represents a significant budget for a homelab. On the other hand, it opens the door to other future possibilities.
Key Takeaways
LINSTOR ≠ General-purpose Storage for Proxmox
LINSTOR excels for certain use cases, but dynamic VM provisioning via Proxmox cloning is not one of them. LINSTOR documentation is heavily focused on resource-group and application storage, not Proxmox integration.
The Limitation Is Architectural, Not a Bug
This isn't a configuration problem or my mistake: LINSTOR is designed with an explicit resource management model. On-the-fly dynamic provisioning simply isn't in its philosophy.
HA Can Be Delegated to a Higher Layer
For a Kubernetes cluster, losing HA at the Proxmox level (VMs on local storage) isn't necessarily problematic. Kubernetes is designed to handle node failures - that's actually its main role. With distributed etcd and a replicated control plane, the cluster survives the loss of one or more nodes.
Each Solution Has Its Cost
- Script → Software complexity
- Partitioning → Operational complexity and loss of HA at Proxmox level
- Ceph → System complexity and hardware cost
There's no silver bullet. I must choose which type of complexity I'm willing to accept.
Next Steps
I'll probably test Option 1 (pre-creation script) first, as it allows me to:
- Keep LINSTOR DRBD and its performance
- Fully automate deployment
- Avoid immediate hardware investment
- Learn to better manage LINSTOR programmatically
If this approach proves too complex or fragile, I'll reconsider either Option 3 (partitioning, acceptable in a Kubernetes context), or Option 4 (Ceph + network upgrade), which is the most "standard" and documented solution in the Proxmox ecosystem.
I'll document my final decision and its implementation in a future article.
After testing Option 1 with a Python script for LINSTOR resource management, I found that this approach, while functional, added too much complexity and synchronization risks for production use.
The final decision: Partition the NVMe drives on each Proxmox node according to the following strategy:
-
300 GB allocated to LINSTOR DRBD (
linstor_storage) for:- VMs and LXC containers requiring high availability at the Proxmox level
- The LXC container hosting the NFS server (see zfs-sync-nfs-ha project)
- Any distributed storage managed by Proxmox HA
-
200 GB allocated to local-lvm (
local-lvm) for:- K3S cluster VMs that don't need HA at the Proxmox level
- High availability ensured by the Kubernetes cluster itself
- Simple and fast provisioning via OpenTofu
This architecture allows using the right tool for the right purpose: LINSTOR DRBD for what truly requires synchronous replication at the infrastructure level, and performant local storage for workloads where HA is managed by the application layer (Kubernetes).
A detailed article on this implementation and the HA NFS container will follow soon.
References:
