home / skills / bobmatnyc / claude-mpm-skills / local-llm-ops

local-llm-ops skill

/toolchains/ai/ops/local-llm-ops

This skill helps you manage local LLM operations on Apple Silicon with Ollama, from setup to benchmarks and diagnostics.

npx playbooks add skill bobmatnyc/claude-mpm-skills --skill local-llm-ops

Review the files below or copy the command above to add this skill to your agents.

Files (2)
SKILL.md
2.3 KB
---
name: local-llm-ops
description: Local LLM operations with Ollama on Apple Silicon, including setup, model pulls, chat launchers, benchmarks, and diagnostics.
version: 1.0.0
category: toolchain
author: Claude MPM Team
license: MIT
progressive_disclosure:
  entry_point:
    summary: "Run local LLMs with Ollama: setup venv, start service, pull models, launch chat, benchmark, and diagnose."
    when_to_use: "Operating local LLMs on macOS, running Ollama-based chat sessions, or benchmarking models for speed/latency."
    quick_start: "1. ./setup_chatbot.sh 2. ./chatllm 3. ollama pull mistral (if no models)"
tags:
  - llm
  - ollama
  - local
  - benchmark
  - chat
  - ops
---

# Local LLM Ops (Ollama)

## Overview

Your `localLLM` repo provides a full local LLM toolchain on Apple Silicon: setup scripts, a rich CLI chat launcher, benchmarks, and diagnostics. The operational path is: install Ollama, ensure the service is running, initialize the venv, pull models, then launch chat or benchmarks.

## Quick Start

```bash
./setup_chatbot.sh
./chatllm
```

If no models are present:

```bash
ollama pull mistral
```

## Setup Checklist

1. Install Ollama: `brew install ollama`
2. Start the service: `brew services start ollama`
3. Run setup: `./setup_chatbot.sh`
4. Verify service: `curl http://localhost:11434/api/version`

## Chat Launchers

- `./chatllm` (primary launcher)
- `./chat` or `./chat.py` (alternate launchers)
- Aliases: `./install_aliases.sh` then `llm`, `llm-code`, `llm-fast`

Task modes:

```bash
./chat -t coding -m codellama:70b
./chat -t creative -m llama3.1:70b
./chat -t analytical
```

## Benchmark Workflow

Benchmarks are scripted in `scripts/run_benchmarks.sh`:

```bash
./scripts/run_benchmarks.sh
```

This runs `bench_ollama.py` with:

- `benchmarks/prompts.yaml`
- `benchmarks/models.yaml`
- Multiple runs and max token limits

## Diagnostics

Run the built-in diagnostic script when setup fails:

```bash
./diagnose.sh
```

Common fixes:

- Re-run `./setup_chatbot.sh`
- Ensure `ollama` is in PATH
- Pull at least one model: `ollama pull mistral`

## Operational Notes

- Virtualenv lives in `.venv`
- Chat configs and sessions live under `~/.localllm/`
- Ollama API runs at `http://localhost:11434`

## Related Skills

- `toolchains/universal/infrastructure/docker`

Overview

This skill provides a complete local LLM operations toolkit for Apple Silicon using Ollama, including setup scripts, model management, chat launchers, benchmarks, and diagnostics. It streamlines getting a local LLM service running, pulling models, and running chat or benchmarking workflows. The tooling focuses on reproducible local development with a simple CLI surface.

How this skill works

The skill guides you through installing Ollama, starting the Ollama service, creating a Python virtualenv, and pulling models into the local host. It exposes lightweight CLI launchers (primary: ./chatllm) for task-specific sessions and scripts for running automated benchmarks and diagnostics. Diagnostic scripts inspect service availability, PATH, and model presence, while benchmark scripts run repeatable tests using YAML-configured prompts and models.

When to use it

  • Setting up a local LLM environment on Apple Silicon for development or testing.
  • Launching quick chat sessions with different task modes or model choices.
  • Pulling and managing models locally to avoid remote API costs and latency.
  • Running repeatable benchmarks across models and prompts to compare performance.
  • Diagnosing service, PATH, or model issues when a chat launcher fails.

Best practices

  • Install Ollama via Homebrew and keep the Ollama service running with brew services start ollama.
  • Use the included setup script to initialize the virtualenv (.venv) and install Python dependencies.
  • Pull at least one model (e.g., ollama pull mistral) before launching a chat session.
  • Store chat sessions and configs under the default ~/.localllm/ for portability and backups.
  • Run diagnose.sh when encountering failures and re-run setup if PATH or dependencies are missing.

Example use cases

  • Developer wants an offline coding assistant: run ./setup_chatbot.sh, pull a code model, then ./chat -t coding -m codellama:70b.
  • Researcher comparing models: run ./scripts/run_benchmarks.sh to execute bench_ollama.py with prompts.yaml and models.yaml.
  • QA troubleshooting a failed chat launcher: run curl http://localhost:11434/api/version and ./diagnose.sh to locate issues.
  • Power user creating shortcuts: install aliases with ./install_aliases.sh to enable llm, llm-code, and llm-fast commands.
  • Ops running repeated load tests: configure runs and max tokens in benchmarks YAML and automate via CI.

FAQ

What if the chat launcher reports no models?

Pull a model with ollama pull <model> (example: ollama pull mistral) and verify the service is running, then retry the launcher.

How do I verify Ollama is accessible?

Confirm the service with curl http://localhost:11434/api/version and ensure ollama is on your PATH.