Proxmox cloud-image templates for EduKit VmService

This doc describes how OPS prepares and maintains the Proxmox templates that EduKit VmService clones from. It’s the contract between the infrastructure team and the backend team (VmService).

Strategy: full-clone from a Proxmox template

EduKit does not import cloud-image .qcow2 files at provision time. Instead, OPS prepares a VM from a cloud-image, converts it to a Proxmox template (qm template <vmid>), and VmService full-clones it each time a student VM is created. Rationale:

  • 1 API call per VM instead of 10+. Clone is atomic; import-from requires a storage upload + disk import + config wiring.
  • OsImage.ProxmoxTemplateId stays natural. Value is the VMID of the template, stringified (e.g. "9999").
  • Setup is OPS-simple. One template per OS; add more when we need more distros.
  • Future-proof. Pre-baked templates (e.g. a Python-dev image, a data-science image) slot in without any code change on the backend.

Current inventory (Sprint 3)

VMID OS Node Storage Status Owner
9999 Debian 12 (cloud-init) pve4 local-lvm ✅ template OPS

Ubuntu is not available yet. To add it, OPS must create a new VM from an Ubuntu cloud-image, convert it to a template, and share the VMID with the backend team.

How to prepare a new cloud-image template

Run on the Proxmox node (here pve4). Replace <VMID> with the next free id, <URL> with the cloud-image URL, and <STORAGE> with the target storage (typically local-lvm).

# 1. Download the cloud-image
cd /var/lib/vz/template/iso
wget <URL> -O <os>-cloudimg.qcow2

# 2. Create the holder VM
qm create <VMID> \
    --name <os>-cloud-template \
    --memory 2048 --cores 2 \
    --net0 virtio,bridge=vmbr0 \
    --scsihw virtio-scsi-pci \
    --agent enabled=1,fstrim_cloned_disks=1 \
    --serial0 socket --vga serial0

# 3. Import the disk
qm importdisk <VMID> /var/lib/vz/template/iso/<os>-cloudimg.qcow2 <STORAGE>
qm set <VMID> --scsi0 <STORAGE>:vm-<VMID>-disk-0

# 4. Attach a cloud-init drive (scsi1 slot)
qm set <VMID> --ide2 <STORAGE>:cloudinit

# 5. Enable boot order + serial console (cloud-init needs serial on some images)
qm set <VMID> --boot order=scsi0
qm set <VMID> --ipconfig0 ip=dhcp

# 6. Convert to a Proxmox template
qm template <VMID>

# 7. Verify
qm config <VMID> | grep -E '^(template|agent|scsi0|ide2|boot):'
# Expected: template: 1, agent: enabled=1,fstrim_cloned_disks=1, scsi0: <storage>:..., ide2: <storage>:cloudinit

After conversion the VM cannot be started - it’s now a clone source only.

Adding the template to VmService

Once OPS has confirmed a new template, seed a corresponding OsImage record in VmService:

new OsImage(
    name: "Ubuntu 24.04 (cloud-init)",
    proxmoxTemplateId: "9998",            // ← VMID stringified
    architecture: CpuArchitecture.X86_64,
    minDiskSizeGb: 4,
    minRamMb: 512,
    description: "Ubuntu 24.04 LTS cloud-init template")

See Edukit-vmservice/VmService/Infrastructure/Persistence/Seed/ApplicationDbContextSeed.cs.

Access & credentials

VmService interacts with Proxmox via two channels. Both are documented in the Sprint 3 plan and live in Vaultwarden.

Channel Purpose Credential Vaultwarden entry
HTTPS API Clone / config / resize / start / delete API token edukit-api-dev@pve!vm-service vmservice-proxmox-api-token
SSH / SFTP Drop cloud-init snippets into /var/lib/vz/snippets/ ed25519 key for service account edukit-deploy on port 2244 vmservice-proxmox-ssh

The SSH key is service-only: no human login, no shared shell. Anyone who needs SSH access to the Proxmox node uses their personal key.

TLS certificate

VmService talks to the Proxmox API over HTTPS. Certificate validation is controlled by the Proxmox:VerifySsl flag:

Environment Proxmox cert Proxmox:VerifySsl Rationale
dev / lab Self-signed (Proxmox default) false Lab clusters don’t justify a real CA. VmService bypasses validation only because the flag says so.
staging / prod Signed by a trusted CA (Let’s Encrypt, internal CA, etc.) true (default) Production talks to Proxmox with a real trust chain - no exception.

Setting up prod : OPS must provision a valid TLS certificate on the Proxmox node(s) VmService connects to (via pvenode cert set or the Proxmox UI → Datacenter → ACME). The cert’s CN/SAN must match the hostname VmService uses in Proxmox:BaseUrl.

Proxmox:VerifySsl=false is only intended for lab / self-signed setups. Setting it to false in production silently strips all certificate checks and must never be done - the VmService DI layer also refuses to install the bypass handler in Production environments as a defense-in-depth guard.

ASPNETCORE_ENVIRONMENT - the defense-in-depth guard above reads IHostEnvironment.IsProduction(), which is driven by the ASPNETCORE_ENVIRONMENT environment variable at runtime:

  • K3s / prod deployments must run with ASPNETCORE_ENVIRONMENT=Production (or leave it unset - the .NET default is already Production). Never inherit Development from another environment’s manifest.
  • Dev / staging containers can set Development or Staging as appropriate.
  • The value is logged on boot; check the pod logs for Running in environment: Production to confirm.

Changes that break VmService

Coordinate with the backend team before doing any of these:

  • Deleting or re-numbering a template whose VMID is referenced by an OsImage row (VmService will fail to clone).
  • Disabling the cloudinit drive on an existing template.
  • Revoking the edukit-api-dev@pve!vm-service API token or rotating its secret without updating the K3s secret.
  • Changing the SSH port, username, or authorized_keys for edukit-deploy.
  • Moving /var/lib/vz/snippets/ to a different path or removing write permission for edukit-deploy.

Retour en haut