home / skills / terrylica / cc-skills / system-health-check

system-health-check skill

safe

/plugins/tts-telegram-sync/skills/system-health-check

This skill runs a comprehensive ten-subsystem health check for the TTS engine and Telegram bot, producing a clear pass/fail report with actionable fixes.

npx playbooks add skill terrylica/cc-skills --skill system-health-check

Review the files below or copy the command above to add this skill to your agents.

Files (3)

SKILL.md

6.9 KB

---
name: system-health-check
description: Health check for TTS and Telegram bot subsystems. TRIGGERS - health check, bot health, kokoro health, tts health, tts lock, system status, diagnostics.
allowed-tools: Read, Bash, Glob, AskUserQuestion
---

# System Health Check

Run a comprehensive 10-subsystem health check across the TTS engine, Telegram bot, and supporting infrastructure. Produces a pass/fail report table with actionable fix recommendations.

> **Platform**: macOS (Apple Silicon)

## When to Use This Skill

- Diagnose why TTS or Telegram bot is not working
- Verify system readiness after bootstrap or configuration changes
- Routine health check before a demo or presentation
- Investigate intermittent failures in the TTS pipeline
- Check for stale locks, zombie processes, or orphaned temp files

## Requirements

- Bun runtime (for bot process)
- Python 3.13 with Kokoro venv at `~/.local/share/kokoro/.venv`
- Telegram bot token in `~/.claude/.secrets/ccterrybot-telegram`
- mise.toml configured in `~/.claude/automation/claude-telegram-sync/`

## Workflow Phases

### Phase 1: Setup

Load environment variables from mise to ensure `BOT_TOKEN` and other secrets are available:

```bash
cd ~/.claude/automation/claude-telegram-sync && eval "$(mise env)"
```

### Phase 2: Run All 10 Health Checks

Execute each check and collect results. Each check returns `[OK]` or `[FAIL]` with a brief diagnostic message.

#### Check 1: Bot Process

```bash
pgrep -la 'bun.*src/main.ts'
```

Pass if exactly one process is found. Fail if zero or more than one.

#### Check 2: Telegram API

```bash
BOT_TOKEN=$(cat ~/.claude/.secrets/ccterrybot-telegram)
curl -s "https://api.telegram.org/bot${BOT_TOKEN}/getMe" | jq .ok
```

Pass if response is `true`. Fail if `false`, null, or connection error.

#### Check 3: Kokoro venv

```bash
[[ -d ~/.local/share/kokoro/.venv ]]
```

Pass if the directory exists.

#### Check 4: Kokoro Python Import

```bash
~/.local/share/kokoro/.venv/bin/python -c "import kokoro"
```

Pass if import succeeds with exit code 0.

#### Check 5: MPS Available (Apple Silicon GPU)

```bash
~/.local/share/kokoro/.venv/bin/python -c "import torch; assert torch.backends.mps.is_available()"
```

Pass if assertion succeeds. Fail if torch is missing or MPS is not available.

#### Check 6: Lock State

```bash
LOCK_FILE="/tmp/kokoro-tts.lock"
if [[ -f "$LOCK_FILE" ]]; then
  LOCK_PID=$(cat "$LOCK_FILE")
  LOCK_AGE=$(( $(date +%s) - $(stat -f %m "$LOCK_FILE") ))
  if kill -0 "$LOCK_PID" 2>/dev/null; then
    if [[ $LOCK_AGE -gt 30 ]]; then
      echo "STALE (PID $LOCK_PID alive but lock age ${LOCK_AGE}s > 30s threshold)"
    else
      echo "ACTIVE (PID $LOCK_PID, age ${LOCK_AGE}s)"
    fi
  else
    echo "ORPHANED (PID $LOCK_PID not running, age ${LOCK_AGE}s)"
  fi
else
  echo "NO LOCK (idle)"
fi
```

Pass if no lock or active lock with age under 30s. Fail if stale or orphaned.

#### Check 7: Audio Processes

```bash
pgrep -x afplay
pgrep -x say
```

Informational check. Reports count of running audio processes. Not a pass/fail -- just reports state.

#### Check 8: Secrets File

```bash
[[ -f ~/.claude/.secrets/ccterrybot-telegram ]]
```

Pass if the file exists and is non-empty.

#### Check 9: Stale WAV Files

```bash
find /tmp -maxdepth 1 -name "kokoro-tts-*.wav" -mmin +5 2>/dev/null
```

Pass if no stale WAV files found (older than 5 minutes). Fail if orphaned WAVs exist.

#### Check 10: Shell Symlinks

```bash
[[ -L ~/.local/bin/tts_kokoro.sh ]] && readlink ~/.local/bin/tts_kokoro.sh
```

Pass if symlink exists and points to a valid target within the plugin.

### Phase 3: Report

Display results as a table:

```
| # | Subsystem        | Status | Detail                          |
|---|------------------|--------|---------------------------------|
| 1 | Bot Process      | [OK]   | PID 12345                       |
| 2 | Telegram API     | [OK]   | Bot @ccterrybot responding      |
| 3 | Kokoro venv      | [OK]   | ~/.local/share/kokoro/.venv     |
| 4 | Kokoro Import    | [OK]   | kokoro module loaded            |
| 5 | MPS Available    | [OK]   | Apple Silicon GPU active        |
| 6 | Lock State       | [OK]   | No lock (idle)                  |
| 7 | Audio Processes  | [OK]   | 0 afplay, 0 say                |
| 8 | Secrets File     | [OK]   | ccterrybot-telegram present     |
| 9 | Stale WAVs       | [OK]   | No orphaned files               |
|10 | Shell Symlinks   | [OK]   | tts_kokoro.sh -> plugin script  |
```

### Phase 4: Summary and Recommendations

- Report total pass/fail counts (e.g., "9/10 checks passed")
- For each failure, recommend the appropriate fix or skill to invoke

## TodoWrite Task Templates

```
1. [Setup] Load environment variables from mise in bot source directory
2. [Run] Execute all 10 health checks and collect results
3. [Report] Display results table with [OK]/[FAIL] status for each subsystem
4. [Summary] Show pass/fail counts (e.g., 9/10 passed)
5. [Recommend] Suggest fixes for any failures, referencing relevant skills
```

## Post-Change Checklist

- [ ] All 10 checks executed (none skipped due to early exit)
- [ ] Results table displayed with consistent formatting
- [ ] Each failure has an actionable recommendation
- [ ] No sensitive values (tokens, secrets) exposed in output

## Troubleshooting

| Issue                             | Cause                               | Solution                                                                |
| --------------------------------- | ----------------------------------- | ----------------------------------------------------------------------- |
| All checks fail                   | Environment not set up              | Run `full-stack-bootstrap` skill first                                  |
| Only Kokoro checks fail (3-5)     | Kokoro venv missing or broken       | Run `kokoro-install.sh --health` for detailed report                    |
| Lock stuck (check 6)              | Stale lock from crashed TTS process | Check lock age and PID; see `diagnostic-issue-resolver` skill           |
| Bot process missing (check 1)     | Bot crashed or was never started    | See `bot-process-control` skill                                         |
| Telegram API fails (check 2)      | Token expired or network issue      | Verify token in `~/.claude/.secrets/ccterrybot-telegram`; check network |
| MPS not available (check 5)       | Running on Intel Mac or torch issue | Verify Apple Silicon; reinstall torch with MPS support                  |
| Stale WAVs found (check 9)        | TTS process crashed mid-generation  | Clean with `rm /tmp/kokoro-tts-*.wav`; investigate crash cause          |
| Shell symlinks missing (check 10) | Bootstrap incomplete                | Re-run symlink setup from `full-stack-bootstrap` skill                  |

## Reference Documentation

- [Health Checks](./references/health-checks.md) - Detailed description of each check, failure meaning, and remediation
- [Evolution Log](./references/evolution-log.md) - Change history for this skill

Overview

This skill runs a focused health check across the TTS engine, Telegram bot, and supporting infrastructure, producing a pass/fail table and actionable recommendations. It targets macOS (Apple Silicon) setups and validates runtime, secrets, GPU availability, locks, and temporary artifacts. Use it to quickly verify system readiness or to diagnose intermittent failures in the TTS pipeline.

How this skill works

The skill executes ten discrete checks: bot process presence, Telegram API responsiveness, Kokoro virtualenv existence and import, Apple Silicon MPS availability via PyTorch, lock file state, audio process counts, secrets file presence, stale WAV artifacts, and shell symlink validity. Each check returns [OK] or [FAIL] with a short diagnostic. Results are collected into a table and summarized with pass/fail counts and recommended fixes for failures.

When to use it

Diagnose why TTS or Telegram bot is not working
Verify system readiness after bootstrap or configuration changes
Run a pre-demo or presentation system sanity check
Investigate intermittent TTS pipeline failures or crashes
Detect stale locks, zombie processes, or orphaned temp files

Best practices

Load environment variables from your mise configuration before running checks to ensure BOT_TOKEN and other secrets are available
Run the full 10-check sequence — don’t skip Kokoro or MPS checks when using Apple Silicon
Treat lock failures (stale/orphaned) as high priority; inspect PID and lock age before removing files
Never print secrets or tokens in report output; report only presence/absence and safe diagnostics
Automate this skill as a pre-deploy or CI gating step to catch regressions early

Example use cases

Pre-demo health report: run all checks and confirm 10/10 before presenting TTS features
Post-crash triage: run checks to find stale locks, orphaned WAVs, or missing bot processes
CI readiness gate: verify Kokoro venv, Python import, and MPS availability on macOS runners
On-call diagnostic: quickly determine whether Telegram API failures are token or network related
Maintenance script: scheduled run to detect and clean stale WAVs and reset orphaned locks

FAQ

What platforms are supported?

This skill is designed for macOS on Apple Silicon; MPS checks require PyTorch with MPS support and Kokoro in the expected venv path.

Will it expose my bot token or secrets?

No. The checks validate presence and API responses but do not print secrets or raw token values in the report.

What indicates a critical failure?

Failures of core checks (bot process missing, Telegram API false, Kokoro venv/import failing, or stale/orphaned locks) are critical and should be addressed before running TTS workloads.