home / skills / doanchienthangdev / omgkit / foundation-models

This skill helps you configure large language model generation and interpret foundation model behavior with structured outputs and sampling guidance.

npx playbooks add skill doanchienthangdev/omgkit --skill foundation-models

Review the files below or copy the command above to add this skill to your agents.

Files (1)
SKILL.md
2.4 KB
---
name: foundation-models
description: Understanding Foundation Models - architecture, sampling parameters, structured outputs, post-training. Use when configuring LLM generation, selecting models, or understanding model behavior.
---

# Foundation Models

Deep understanding of how Foundation Models work.

## Sampling Parameters

```python
# Temperature Guide
TEMPERATURE = {
    "factual_qa": 0.0,           # Deterministic
    "code_generation": 0.2,       # Slightly creative
    "translation": 0.3,           # Mostly deterministic
    "creative_writing": 0.9,      # Creative
    "brainstorming": 1.2,         # Very creative
}

# Key parameters
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    temperature=0.7,    # 0.0-2.0, controls randomness
    top_p=0.9,          # Nucleus sampling (0.0-1.0)
    max_tokens=1000,    # Maximum output length
)
```

## Structured Outputs

```python
# JSON Mode
response = client.chat.completions.create(
    model="gpt-4",
    messages=[...],
    response_format={"type": "json_object"}
)

# Function Calling
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
}]
```

## Post-Training Stages

| Stage | Purpose | Result |
|-------|---------|--------|
| Pre-training | Learn language patterns | Base model |
| SFT | Instruction following | Chat model |
| RLHF/DPO | Human preference alignment | Aligned model |

## Model Selection Factors

| Factor | Consideration |
|--------|---------------|
| Context length | 4K-128K+ tokens |
| Multilingual | Tokenization costs (up to 10x for non-Latin) |
| Domain | General vs specialized (code, medical, legal) |
| Latency | TTFT, tokens/second |
| Cost | Input/output token pricing |

## Best Practices

1. Match temperature to task type
2. Use structured outputs when parsing needed
3. Consider context length limits
4. Test sampling parameters systematically
5. Account for knowledge cutoff dates

## Common Pitfalls

- High temperature for factual tasks
- Ignoring tokenization costs for multilingual
- Not accounting for context length limits
- Expecting determinism without temperature=0

Overview

This skill explains core concepts of foundation models, including architecture, sampling parameters, structured outputs, and post-training stages. It helps engineers and product teams choose models, tune generation, and design reliable structured responses. Practical guidance focuses on outcomes: accurate generation, predictable behavior, and efficient cost/latency trade-offs.

How this skill works

The skill breaks down how sampling parameters (temperature, top_p, max_tokens) shape randomness, creativity, and determinism in outputs. It describes structured output mechanisms such as JSON modes and function-calling to produce machine-parseable results. It also outlines post-training stages (pre-training, supervised fine-tuning, RLHF/DPO) and how they influence instruction following and alignment.

When to use it

  • Configuring LLM generation settings for a product or experiment
  • Selecting a model based on context length, latency, and domain
  • Designing parsable outputs for downstream systems (JSON, function calls)
  • Comparing model behavior after different fine-tuning or alignment stages
  • Estimating cost and tokenization impact for multilingual use cases

Best practices

  • Match temperature and top_p to the task: low values for factual tasks, higher for creative tasks
  • Use structured outputs (JSON or function calling) when deterministic parsing is required
  • Systematically test sampling parameters and seed repeatability to measure stability
  • Account for model context length limits and break inputs into chunks if needed
  • Factor tokenization costs for non-Latin scripts and choose models accordingly

Example use cases

  • Set temperature=0.0 and top_p low for deterministic factual QA or checks
  • Use temperature around 0.2–0.4 for code generation to allow slight creativity without errors
  • Enable JSON response format or function calling for API integrations and automation
  • Select a model with long context (e.g., 64K tokens) for large document summarization
  • Run A/B tests on sampling params to measure hallucination rate versus diversity

FAQ

When should I set temperature to 0?

Use temperature=0 for deterministic outputs such as factual QA, exact templates, or verification steps where repeatability is required.

How do I choose between temperature and top_p?

Temperature scales randomness globally; top_p applies nucleus sampling by probability mass. Tune both together: reduce temperature for determinism, or lower top_p to constrain the sampling distribution.