home / skills / darshil321 / fynd-backend-skills / fynd-backend-microservices
This skill provides expert debugging for FYND's Kubernetes, Kafka, Redis, and Node.js backend to reduce outages and latency.
npx playbooks add skill darshil321/fynd-backend-skills --skill fynd-backend-microservicesReview the files below or copy the command above to add this skill to your agents.
---
name: fynd-backend-microservices
description: Expert debugging for FYND's Kubernetes/GCP/Kafka/Node.js backend. Use for pods crashing, Kafka lag, Redis memory high, API latency, DB migration failures, LLM cost spikes, memory leaks, and service failures.
---
# FYND Backend Microservices 🛠️
**Expert debugging for FYND's Kubernetes/GCP/Kafka/Node.js backend.**
## 🎯 Use When
```
"Pods crashing" | "Kafka lag" | "Redis memory high" | "API latency"
"Database migration failed" | "LLM costs spiking" | "Memory leak" | "Service failures"
```
## 🛠️ 8 Core Skills
1. **K8s/GCP Deployment** (40% issues) - pods, scaling, graceful shutdown
2. **Kafka Resilience** (25%) - consumer lag, DLQ, rebalancing
3. **Redis Optimization** (15%) - memory, TTL, pub/sub
4. **Distributed Tracing** (10%) - correlation IDs, Langfuse
5. **Database Patterns** (8%) - Sequelize, pgvector, MongoDB
6. **LangGraph Orchestration** (5%) - multi-LLM, token counting
7. **Performance Analysis** (4%) - heap profiling, slow queries
8. **Resilience Patterns** (3%) - circuit breaker, backoff
## 🔍 Diagnostic Flow
1. Gather metrics (kubectl, kafka-consumer-groups, redis-cli)
2. Form hypotheses (OOM? poison pill? slow query?)
3. Test systematically
4. Provide fix + monitoring
## 📦 Bundled Resources (load as needed)
- `scripts/diagnose.js` - quick pod snapshot (`kubectl get pods`)
- `references/patterns.md` - fast CLI patterns for pod crash, Kafka lag, Redis memory
- `references/fynd-backend-skills.md` - full architecture + skills matrix + flowcharts + checklists
- `references/fynd-agent-integration.md` - LangGraph/LangChain tool integration guide
- `references/fynd-backend-skill-template.md` - template for creating new skills or extensions
**Search tips:** use `rg -n "Use Cases|Agent Actions|Checklist|Flowchart"` in the reference files to jump to relevant sections quickly.
## 📊 Success
94% accuracy | <5s response | 30% MTTR reduction | $740k/year savings
This skill provides expert debugging and operational guidance for FYND’s Node.js backend running on Kubernetes, GCP, Kafka, Redis, and related services. I focus on fast triage and practical fixes for pods crashing, Kafka lag, Redis memory issues, API latency, DB migration failures, memory leaks, LLM cost spikes, and general service failures. The goal is to reduce mean time to recovery and prevent recurrence with concrete remediation and monitoring guidance.
I start by collecting key signals: Kubernetes pod states and logs, consumer group offsets, Redis memory usage, application traces, and database errors. I form hypotheses (OOM, poison message, slow query, leaking allocation), test them with targeted commands or small experiments, and deliver step-by-step fixes plus monitoring and resilience recommendations. Bundled scripts and checklists speed common diagnostics and ensure repeatable remediation.
How quickly can I get a triage plan?
I provide an initial diagnostic checklist and prioritized hypotheses within minutes, and a tested remediation plan within the same incident session.
Do you require production access for diagnostics?
I can work from logs and metrics snapshots, but live access accelerates root-cause identification and fixes; minimal, read-only access is sufficient for most diagnostics.