home / skills / basher83 / lunar-claude / proxmox-infrastructure

This skill manages a three-node Proxmox VE cluster with CEPH storage, VLAN networking, and cloud-init automation using Python, Ansible, and Terraform.

npx playbooks add skill basher83/lunar-claude --skill proxmox-infrastructure

Review the files below or copy the command above to add this skill to your agents.

Files (16)
SKILL.md
3.2 KB
---
name: proxmox-infrastructure
description: Manage the Matrix/Virgo-Core Proxmox VE cluster (nodes Foxtrot, Golf, Hotel - MINISFORUM MS-A2 with AMD Ryzen 9 9955HX). Covers VM provisioning, VLAN networking, CEPH storage, and automation via Python (proxmoxer), Ansible (community.general.proxmox), and Terraform (Telmate/proxmox).
---

# Proxmox Infrastructure: Matrix/Virgo-Core Cluster

Manage 3-node Proxmox VE cluster with CEPH storage, VLAN networking, and cloud-init automation.

## Trigger Phrases

Use this skill when you see:
- "Clone the Ubuntu template to create a new VM"
- "Check CEPH storage health on the cluster"
- "Configure VLAN 30 networking for a Proxmox VM"
- "Create a cloud-init template from Ubuntu noble image"
- "What's the status of nodes Foxtrot, Golf, or Hotel?"
- "Deploy VMs using Terraform and the Proxmox provider"
- "Configure MTU 9000 for CEPH storage networks"
- "Troubleshoot VM deployment or networking issues"
- "Use Ansible community.general.proxmox modules"
- "Validate template configuration via Proxmox API"

## Available Tools

### Python Scripts (uv-based)
- **validate_template.py** - Validate template health via Proxmox API
- **cluster_status.py** - Cluster health metrics and node status
- **check_ceph_health.py** - CEPH storage pool health monitoring
- **check_cluster_health.py** - Comprehensive cluster diagnostics

All scripts support `--help` for usage. Run with: `uv run tools/<script>.py`

### Automation Examples
- **Ansible playbooks** - Template creation, VLAN bridging setup
- **Terraform modules** - VM cloning, multi-node deployments with dual NICs

See examples/ and workflows/ for working configurations.

## Core Capabilities

**Template Management:**
- Ubuntu/Debian cloud-init templates with virtio-scsi
- Serial console configuration for cloud images
- Proper boot order and cloud-init CD-ROM (ide2)

**Network Infrastructure:**
- vmbr0: Management (VLAN 9 for Corosync)
- vmbr1: CEPH Public (192.168.5.0/24, MTU 9000)
- vmbr2: CEPH Private (192.168.7.0/24, MTU 9000)
- VLAN-aware bridging (supports VLANs 2-4094)
- 802.3ad LACP bonding

**CEPH Storage:**
- Multi-OSD configuration (2× 4TB NVMe per node)
- MTU 9000 for storage networks
- Public/private network separation
- Health monitoring and diagnostics

**API Automation:**
- Python via proxmoxer library
- Ansible via community.general.proxmox_* modules
- Terraform via Telmate/proxmox provider

## Cluster Architecture (Matrix)

**Hardware:** 3× MINISFORUM MS-A2
- AMD Ryzen 9 9955HX (16C/32T)
- 64GB DDR5 @ 5600 MT/s
- 3× NVMe: 1× 1TB boot, 2× 4TB CEPH OSDs
- 4× NICs: 2× 10GbE SFP+, 2× 2.5GbE

**Nodes:** Foxtrot, Golf, Hotel

## Quick Examples

**Clone template to VM:**
```bash
qm clone 9000 101 --name web-01
qm set 101 --ipconfig0 ip=192.168.3.100/24,gw=192.168.3.1
qm set 101 --net0 virtio,bridge=vmbr0,tag=30
qm start 101
```

**Check cluster health:**
```bash
uv run tools/cluster_status.py
uv run tools/check_ceph_health.py
```

## For Details

- **reference/** - Cloud-init, networking, API, storage, QEMU guest agent
- **workflows/** - Cluster formation, CEPH deployment automation
- **examples/** - Terraform configs, Ansible playbooks
- **anti-patterns/** - Common mistakes from real deployments

Overview

This skill manages the Matrix/Virgo-Core Proxmox VE cluster (Foxtrot, Golf, Hotel) including VM provisioning, VLAN networking, CEPH storage, and automation. It provides tools and examples for cloud-init templates, MTU-9000 storage networks, and API-driven workflows using Python, Ansible, and Terraform. Use it to validate templates, monitor cluster and CEPH health, and automate multi-node deployments.

How this skill works

The skill exposes Python uv-run scripts that query the Proxmox API (proxmoxer) to validate templates, report cluster health, and check CEPH status. It also includes Ansible playbooks and Terraform modules to create cloud-init templates, configure VLAN-aware bridges and LACP bonds, and clone or provision VMs with dual NICs. Network and storage best practices (separate CEPH public/private networks, MTU 9000, tagged VLANs) are codified in examples and workflows.

When to use it

  • Create or clone Ubuntu/Debian cloud-init templates and validate them via API
  • Provision VMs with VLAN tagging, dual NICs, or serial console/cloud-init settings
  • Check overall cluster or CEPH pool health and run diagnostics
  • Automate repeatable deployments with Ansible or Terraform using provided modules
  • Configure or verify MTU 9000 and VLAN-aware bridges for CEPH and VM traffic

Best practices

  • Keep CEPH public and private networks on dedicated vmbr interfaces with MTU 9000
  • Use virtio-scsi and cloud-init CD-ROM (ide2) for cloud images and set serial console
  • Validate templates with validate_template.py before mass cloning
  • Use VLAN tags on vmbr bridges for guest segmentation and keep Corosync on VLAN 9
  • Automate via Ansible/Terraform for repeatability; prefer API checks in CI

Example use cases

  • Clone an Ubuntu template to create web-01, assign VLAN 30 and static IP, then start the VM
  • Run cluster health scripts to get node status for Foxtrot, Golf, and Hotel before maintenance
  • Deploy a multi-VM application stack with Terraform using the Telmate/proxmox provider
  • Configure CEPH OSD networks, verify MTU settings, and monitor pool health with check_ceph_health.py
  • Use an Ansible playbook to create VLAN bridging and apply consistent network config across nodes

FAQ

Which tools are provided to check cluster and CEPH health?

Use cluster_status.py for node and cluster metrics and check_ceph_health.py for CEPH pool diagnostics. Both run with uv run tools/<script>.py.

How do I automate VM provisioning?

Use the Terraform modules (Telmate/proxmox) for repeatable deployments or Ansible community.general.proxmox modules and the included examples/workflows for playbook-driven automation.