home / skills / doanchienthangdev / omgkit / ai-engineering

ai-engineering skill

safe

This skill helps you collect and analyze user feedback to drive rapid AI improvements, looping insights into updates, prompts, and experiments.

npx playbooks add skill doanchienthangdev/omgkit --skill ai-engineering

Review the files below or copy the command above to add this skill to your agents.

Files (13)

SKILL.md

3.6 KB

---
name: user-feedback
description: Collecting and using user feedback - explicit/implicit signals, feedback analysis, improvement loops, A/B testing. Use when improving AI systems, understanding user satisfaction, or iterating on quality.
---

# User Feedback Skill

Leveraging feedback to improve AI systems.

## Feedback Collection

### Explicit Feedback
```python
class FeedbackCollector:
    def collect_explicit(self, response_id, feedback):
        self.db.save({
            "type": "explicit",
            "response_id": response_id,
            "rating": feedback.get("rating"),      # 1-5
            "thumbs": feedback.get("thumbs"),      # up/down
            "comment": feedback.get("comment"),
            "timestamp": datetime.now()
        })
```

### Implicit Feedback
```python
def extract_implicit(conversation):
    signals = []

    for i, turn in enumerate(conversation[1:], 1):
        prev = conversation[i-1]

        # Negative signals
        if is_correction(turn, prev):
            signals.append(("correction", i))
        if is_repetition(turn, prev):
            signals.append(("repetition", i))
        if is_abandonment(turn):
            signals.append(("abandonment", i))

        # Positive signals
        if is_acceptance(turn, prev):
            signals.append(("acceptance", i))
        if is_follow_up(turn, prev):
            signals.append(("engagement", i))

    return signals
```

### Natural Language Feedback
```python
def extract_from_text(turn, model):
    prompt = f"""Extract feedback signal from user message.

Message: {turn}

Sentiment (positive/negative/neutral):
Specific issue (if any):
Suggestion (if any):"""

    return model.generate(prompt)
```

## Feedback Analysis

```python
class FeedbackAnalyzer:
    def categorize(self, feedbacks):
        prompt = f"""Categorize these feedback items:

{json.dumps(feedbacks)}

Categories:
1. Accuracy issues
2. Format issues
3. Relevance issues
4. Safety issues
5. Missing features

Summary:"""
        return self.llm.generate(prompt)

    def find_patterns(self, feedbacks):
        # Cluster similar complaints
        embeddings = [self.embed(f["text"]) for f in feedbacks]
        clusters = self.cluster(embeddings)

        patterns = {}
        for cluster_id, indices in clusters.items():
            cluster_feedback = [feedbacks[i] for i in indices]
            patterns[cluster_id] = {
                "count": len(cluster_feedback),
                "summary": self.summarize(cluster_feedback),
                "examples": cluster_feedback[:3]
            }

        return patterns
```

## Improvement Loop

```python
class FeedbackLoop:
    def run_cycle(self):
        # 1. Collect
        recent = self.db.get_recent(days=7)
        analysis = self.analyze(recent)

        # 2. Identify improvements
        if analysis["accuracy_issues"] > threshold:
            training_data = self.create_training_data(
                analysis["corrections"]
            )

            # 3. Improve
            if len(training_data) > 1000:
                self.finetune(training_data)
            else:
                self.update_prompts(analysis)

        # 4. Evaluate
        metrics = self.evaluate(self.test_set)

        # 5. Deploy if improved
        if metrics["quality"] > self.baseline:
            self.deploy()

        return metrics
```

## A/B Testing

```python
class ABTest:
    def __init__(self, variants):
        self.variants = variants
        self.results = {v: {"count": 0, "positive": 0} for v in variants}

    def assign(self, user_id):
        # Consistent assignment
        return self.variants[hash(user_id) % len(self.variants)]

    def record(self, user_id, positive):
        variant = self.assign(user_id)
        self.results[variant]["count"] += 1
        if positive:
            self.results[variant]["positive"] += 1

    def analyze(self):
        for variant, data in self.results.items():
            rate = data["positive"] / max(data["count"], 1)
            print(f"{variant}: {rate:.2%} ({data['count']} samples)")
```

## Best Practices

1. Collect both explicit and implicit feedback
2. Analyze patterns, not individual feedback
3. Close the loop (feedback → improvement)
4. A/B test changes
5. Monitor long-term trends

Overview

This skill collects and operationalizes user feedback to improve conversational AI. It captures explicit signals (ratings, thumbs, comments), derives implicit signals from conversation behavior, and extracts structured insights from natural-language feedback. The workflow closes the loop by analyzing patterns, testing fixes, and deploying verified improvements.

How this skill works

The skill ingests explicit feedback entries and mines implicit signals like corrections, repetitions, abandonments, acceptances, and follow-ups from conversation turns. It uses LLMs and embeddings to categorize feedback, cluster similar complaints, and summarize common patterns. Improvement cycles generate training data or prompt updates, run A/B tests, and gate deployments by evaluation metrics.

When to use it

When you need to measure user satisfaction and identify failure modes
When iterating on model responses or prompt templates
When planning fine-tuning or data augmentation efforts
When validating changes through A/B testing before rollout
When monitoring long-term quality and regressions

Best practices

Collect both explicit and implicit signals to reduce bias and increase coverage
Aggregate and cluster feedback to surface patterns rather than chasing individual comments
Close the loop: prioritize issues, apply fixes, evaluate on a test set, and deploy only on measurable improvement
Use A/B testing and consistent assignment to validate UX or model changes
Track trends over time to detect regressions and seasonal shifts

Example use cases

Improve answer accuracy by converting correction signals into curated training examples for fine-tuning
Detect format or relevance issues by clustering user comments and updating response templates
Run A/B tests to compare prompt variants or system behaviors and select the winning variant by positive-feedback rate
Extract structured issues and suggested fixes from free-text comments using an LLM to speed triage
Implement a weekly feedback loop that identifies top categories and triggers either prompt edits or retraining

FAQ

How do implicit signals differ from explicit feedback?

Implicit signals are behavior-derived cues (corrections, abandonments, repetitions, follow-ups) that suggest satisfaction or problems, while explicit feedback is direct user input like ratings, thumbs, or comments.

When should I fine-tune versus update prompts?

If you can assemble large, high-quality labeled data (e.g., 1,000+ examples) for a repeated accuracy problem, fine-tune. For smaller, frequent or format-related issues, prioritize prompt or system-message updates and A/B testing.

How do I avoid bias from a small set of vocal users?

Cluster feedback and weight signals by sample size; prioritize patterns that appear across diverse users and validate changes with A/B tests and holdout evaluation sets.