home / skills / basher83 / lunar-claude / omni-talos

This skill provides operational tooling and guidance to manage Omni Proxmox infrastructure, enabling provider control, cluster setup, and CEL storage selectors.

npx playbooks add skill basher83/lunar-claude --skill omni-talos

Review the files below or copy the command above to add this skill to your agents.

Files (18)
SKILL.md
3.8 KB
---
name: omni-talos
description: This skill provides operational tooling and guidance for Omni Proxmox
  infrastructure. Use when the user asks to "check provider status", "restart provider",
  "view provider logs", "debug provider registration", "create a machine class",
  "configure Proxmox provider", "set up CEL storage selectors", "create a Talos cluster",
  or needs guidance on Omni + Proxmox infrastructure integration for Talos Kubernetes
  clusters.
---

# Omni + Proxmox Infrastructure Provider

Operational tooling for Talos Linux Kubernetes clusters via Sidero Omni with Proxmox infrastructure provider.

## Provider Operations

Use `${CLAUDE_PLUGIN_ROOT}/skills/omni-talos/scripts/provider-ctl.py` for provider management:

| Task | Command |
|------|---------|
| View logs | `${CLAUDE_PLUGIN_ROOT}/skills/omni-talos/scripts/provider-ctl.py --logs 50` |
| Raw JSON logs | `${CLAUDE_PLUGIN_ROOT}/skills/omni-talos/scripts/provider-ctl.py --logs 50 --raw` |
| Restart provider | `${CLAUDE_PLUGIN_ROOT}/skills/omni-talos/scripts/provider-ctl.py --restart` |

The provider runs on Foxtrot LXC (CT 200) — script handles SSH automatically.

## Current Deployment

| Component | Location | IP | Endpoint |
|-----------|----------|-----|----------|
| Omni | Holly (VMID 101, Quantum) | 192.168.10.20 | omni.spaceships.work |
| Proxmox Provider | Foxtrot LXC (CT 200, Matrix) | 192.168.3.10 | L2 adjacent to Talos VMs |
| Target Cluster | Matrix (Foxtrot/Golf/Hotel) | 192.168.3.{5,6,7} | Proxmox API |
| Storage | CEPH RBD | — | `vm_ssd` pool |

## Quick Reference

**omnictl commands:**

```bash
omnictl cluster status <cluster-name>
omnictl get machines -l omni.sidero.dev/cluster=<cluster-name>
omnictl get machineclasses
omnictl apply -f machine-classes/<name>.yaml
omnictl cluster template sync -f clusters/<name>.yaml
```

Note: `--cluster` flag does not exist. Use label selector `-l` instead.

**MachineClass minimal example:**

```yaml
metadata:
  namespace: default
  type: MachineClasses.omni.sidero.dev
  id: matrix-worker
spec:
  autoprovision:
    providerid: matrix-cluster
    providerdata: |
      cores: 4
      memory: 16384
      disk_size: 100
      storage_selector: name == "vm_ssd"
      node: foxtrot
```

See `references/machine-classes.md` for full field reference.

## Key Constraints

| Constraint | Details |
|------------|---------|
| L2 adjacency | Provider MUST be on same L2 as Talos VMs (Foxtrot LXC) |
| CEL `type` reserved | Use `name` only for storage selectors |
| Hostname bug | Use `:local-fix` tag, not `:latest` |
| No CP pinning | Omni allows only 1 ControlPlane section per template |
| No VM migration | Destroys node state — destroy/recreate instead |
| Split-horizon DNS | `omni.spaceships.work` → 192.168.10.20 (LAN) |

## Reference Files

| File | Content |
|------|---------|
| `references/architecture.md` | Network topology, design decisions |
| `references/machine-classes.md` | Full provider data fields (compute, storage, network, PCI) |
| `references/provider-setup.md` | Provider config, compose setup, credentials |
| `references/cluster-templates.md` | Cluster template structure, patches |
| `references/cel-storage-selectors.md` | CEL syntax and patterns |
| `references/debugging.md` | Common issues |
| `references/recovery-procedures.md` | Recovery from stuck states |
| `references/proxmox-permissions.md` | API token setup |
| `references/omnictl-auth.md` | Authentication methods |

## Examples

| File | Description |
|------|-------------|
| `examples/machineclass-ceph.yaml` | CEPH storage |
| `examples/machineclass-local.yaml` | Local LVM |
| `examples/cluster-template.yaml` | Complete cluster |
| `examples/proxmox-gpu-worker.yaml` | GPU passthrough |
| `examples/proxmox-storage-node.yaml` | Storage node |
| `examples/proxmox-worker-multi-net.yaml` | Multi-network |

Overview

This skill provides operational tooling and guidance for running Talos Linux Kubernetes clusters via Sidero Omni with a Proxmox infrastructure provider. It bundles scripts and concise reference material to inspect provider health, manage machine classes, and coordinate Omni <> Proxmox integration. Use it to perform common provider operations, debug registration and networking constraints, and apply validated cluster templates and storage selectors.

How this skill works

The skill surfaces a small set of scripts and omnctl command patterns to interact with the Omni provider process running in a Foxtrot LXC. It can view logs (plain or raw JSON), restart the provider, and automate common omnictl workflows like syncing cluster templates, listing machines, and applying MachineClass YAML. Reference files explain Proxmox API credentials, network topology constraints, and CEL-based storage selectors so you can author correct provider data.

When to use it

  • Check provider status or view recent provider logs to diagnose failures.
  • Restart the Proxmox provider process when it becomes unresponsive.
  • Create or apply MachineClass resources for Proxmox-backed Talos machines.
  • Configure Proxmox provider credentials, storage selectors, or networking.
  • Debug provider registration or cluster template sync problems.
  • Prepare cluster templates and storage selectors before provisioning Talos clusters.

Best practices

  • Run provider commands from the management host where the provider script handles SSH to Foxtrot LXC.
  • Use omnctl label selectors (-l) rather than a nonexistent --cluster flag when querying machines.
  • Keep the provider on the same L2 as Talos VMs (no routing between provider and VMs).
  • Prefer explicit CEL storage selector 'name' attribute; the CEL type field is reserved.
  • Tag hostnames with :local-fix instead of :latest to avoid a known hostname bug.
  • Treat VM migration as destructive; destroy and recreate nodes rather than migrate.

Example use cases

  • Run provider-ctl.py --logs 50 to capture the last 50 log lines when a cluster fails to join.
  • Restart the Proxmox provider with provider-ctl.py --restart after updating API credentials.
  • Apply a minimal MachineClass YAML to provision a new worker on the vm_ssd pool.
  • Sync a complete cluster template with omnctl cluster template sync -f clusters/mycluster.yaml.
  • Compose CEL storage selectors for CEPH RBD pools when creating machine classes that require vm_ssd storage.

FAQ

Where does the provider run and how do I reach it?

The provider runs in Foxtrot LXC (CT 200) on the same L2 as Talos VMs. The management script SSHs into that container automatically; use the provided IPs and split-horizon DNS for local access.

How do I list machines for a specific cluster?

Use omnctl get machines -l omni.sidero.dev/cluster=<cluster-name>. Do not use a --cluster flag; it does not exist.