home / skills / xfstudio / skills / application-performance-performance-optimization

application-performance-performance-optimization skill

Q: Can this skill run load tests against production?

Not recommended. Load testing production requires approvals, traffic shaping, and safeguards. Use staging with production-like data when possible.

Q: What tools does the workflow use?

It integrates common tools like OpenTelemetry, Datadog/New Relic, Prometheus/Grafana, k6/Gatling, and CDN features (CloudFront/Cloudflare), but it adapts to available tooling.

Q: How do we measure success?

Success is defined by concrete targets: response-time percentiles, Core Web Vitals thresholds, throughput goals, resource utilization, and a defined performance-per-dollar improvement.

safe

/application-performance-performance-optimization

This skill coordinates end-to-end performance optimization across backend, frontend, and observability to deliver measurable, data-driven improvements.

npx playbooks add skill xfstudio/skills --skill application-performance-performance-optimization

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

10.9 KB

---
name: application-performance-performance-optimization
description: "Optimize end-to-end application performance with profiling, observability, and backend/frontend tuning. Use when coordinating performance optimization across the stack."
---

Optimize application performance end-to-end using specialized performance and optimization agents:

[Extended thinking: This workflow orchestrates a comprehensive performance optimization process across the entire application stack. Starting with deep profiling and baseline establishment, the workflow progresses through targeted optimizations in each system layer, validates improvements through load testing, and establishes continuous monitoring for sustained performance. Each phase builds on insights from previous phases, creating a data-driven optimization strategy that addresses real bottlenecks rather than theoretical improvements. The workflow emphasizes modern observability practices, user-centric performance metrics, and cost-effective optimization strategies.]

## Use this skill when

- Coordinating performance optimization across backend, frontend, and infrastructure
- Establishing baselines and profiling to identify bottlenecks
- Designing load tests, performance budgets, or capacity plans
- Building observability for performance and reliability targets

## Do not use this skill when

- The task is a small localized fix with no broader performance goals
- There is no access to metrics, tracing, or profiling data
- The request is unrelated to performance or scalability

## Instructions

1. Confirm performance goals, constraints, and target metrics.
2. Establish baselines with profiling, tracing, and real-user data.
3. Execute phased optimizations across the stack with measurable impact.
4. Validate improvements and set guardrails to prevent regressions.

## Safety

- Avoid load testing production without approvals and safeguards.
- Roll out performance changes gradually with rollback plans.

## Phase 1: Performance Profiling & Baseline

### 1. Comprehensive Performance Profiling

- Use Task tool with subagent_type="performance-engineer"
- Prompt: "Profile application performance comprehensively for: $ARGUMENTS. Generate flame graphs for CPU usage, heap dumps for memory analysis, trace I/O operations, and identify hot paths. Use APM tools like DataDog or New Relic if available. Include database query profiling, API response times, and frontend rendering metrics. Establish performance baselines for all critical user journeys."
- Context: Initial performance investigation
- Output: Detailed performance profile with flame graphs, memory analysis, bottleneck identification, baseline metrics

### 2. Observability Stack Assessment

- Use Task tool with subagent_type="observability-engineer"
- Prompt: "Assess current observability setup for: $ARGUMENTS. Review existing monitoring, distributed tracing with OpenTelemetry, log aggregation, and metrics collection. Identify gaps in visibility, missing metrics, and areas needing better instrumentation. Recommend APM tool integration and custom metrics for business-critical operations."
- Context: Performance profile from step 1
- Output: Observability assessment report, instrumentation gaps, monitoring recommendations

### 3. User Experience Analysis

- Use Task tool with subagent_type="performance-engineer"
- Prompt: "Analyze user experience metrics for: $ARGUMENTS. Measure Core Web Vitals (LCP, FID, CLS), page load times, time to interactive, and perceived performance. Use Real User Monitoring (RUM) data if available. Identify user journeys with poor performance and their business impact."
- Context: Performance baselines from step 1
- Output: UX performance report, Core Web Vitals analysis, user impact assessment

## Phase 2: Database & Backend Optimization

### 4. Database Performance Optimization

- Use Task tool with subagent_type="database-cloud-optimization::database-optimizer"
- Prompt: "Optimize database performance for: $ARGUMENTS based on profiling data: {context_from_phase_1}. Analyze slow query logs, create missing indexes, optimize execution plans, implement query result caching with Redis/Memcached. Review connection pooling, prepared statements, and batch processing opportunities. Consider read replicas and database sharding if needed."
- Context: Performance bottlenecks from phase 1
- Output: Optimized queries, new indexes, caching strategy, connection pool configuration

### 5. Backend Code & API Optimization

- Use Task tool with subagent_type="backend-development::backend-architect"
- Prompt: "Optimize backend services for: $ARGUMENTS targeting bottlenecks: {context_from_phase_1}. Implement efficient algorithms, add application-level caching, optimize N+1 queries, use async/await patterns effectively. Implement pagination, response compression, GraphQL query optimization, and batch API operations. Add circuit breakers and bulkheads for resilience."
- Context: Database optimizations from step 4, profiling data from phase 1
- Output: Optimized backend code, caching implementation, API improvements, resilience patterns

### 6. Microservices & Distributed System Optimization

- Use Task tool with subagent_type="performance-engineer"
- Prompt: "Optimize distributed system performance for: $ARGUMENTS. Analyze service-to-service communication, implement service mesh optimizations, optimize message queue performance (Kafka/RabbitMQ), reduce network hops. Implement distributed caching strategies and optimize serialization/deserialization."
- Context: Backend optimizations from step 5
- Output: Service communication improvements, message queue optimization, distributed caching setup

## Phase 3: Frontend & CDN Optimization

### 7. Frontend Bundle & Loading Optimization

- Use Task tool with subagent_type="frontend-developer"
- Prompt: "Optimize frontend performance for: $ARGUMENTS targeting Core Web Vitals: {context_from_phase_1}. Implement code splitting, tree shaking, lazy loading, and dynamic imports. Optimize bundle sizes with webpack/rollup analysis. Implement resource hints (prefetch, preconnect, preload). Optimize critical rendering path and eliminate render-blocking resources."
- Context: UX analysis from phase 1, backend optimizations from phase 2
- Output: Optimized bundles, lazy loading implementation, improved Core Web Vitals

### 8. CDN & Edge Optimization

- Use Task tool with subagent_type="cloud-infrastructure::cloud-architect"
- Prompt: "Optimize CDN and edge performance for: $ARGUMENTS. Configure CloudFlare/CloudFront for optimal caching, implement edge functions for dynamic content, set up image optimization with responsive images and WebP/AVIF formats. Configure HTTP/2 and HTTP/3, implement Brotli compression. Set up geographic distribution for global users."
- Context: Frontend optimizations from step 7
- Output: CDN configuration, edge caching rules, compression setup, geographic optimization

### 9. Mobile & Progressive Web App Optimization

- Use Task tool with subagent_type="frontend-mobile-development::mobile-developer"
- Prompt: "Optimize mobile experience for: $ARGUMENTS. Implement service workers for offline functionality, optimize for slow networks with adaptive loading. Reduce JavaScript execution time for mobile CPUs. Implement virtual scrolling for long lists. Optimize touch responsiveness and smooth animations. Consider React Native/Flutter specific optimizations if applicable."
- Context: Frontend optimizations from steps 7-8
- Output: Mobile-optimized code, PWA implementation, offline functionality

## Phase 4: Load Testing & Validation

### 10. Comprehensive Load Testing

- Use Task tool with subagent_type="performance-engineer"
- Prompt: "Conduct comprehensive load testing for: $ARGUMENTS using k6/Gatling/Artillery. Design realistic load scenarios based on production traffic patterns. Test normal load, peak load, and stress scenarios. Include API testing, browser-based testing, and WebSocket testing if applicable. Measure response times, throughput, error rates, and resource utilization at various load levels."
- Context: All optimizations from phases 1-3
- Output: Load test results, performance under load, breaking points, scalability analysis

### 11. Performance Regression Testing

- Use Task tool with subagent_type="performance-testing-review::test-automator"
- Prompt: "Create automated performance regression tests for: $ARGUMENTS. Set up performance budgets for key metrics, integrate with CI/CD pipeline using GitHub Actions or similar. Create Lighthouse CI tests for frontend, API performance tests with Artillery, and database performance benchmarks. Implement automatic rollback triggers for performance regressions."
- Context: Load test results from step 10, baseline metrics from phase 1
- Output: Performance test suite, CI/CD integration, regression prevention system

## Phase 5: Monitoring & Continuous Optimization

### 12. Production Monitoring Setup

- Use Task tool with subagent_type="observability-engineer"
- Prompt: "Implement production performance monitoring for: $ARGUMENTS. Set up APM with DataDog/New Relic/Dynatrace, configure distributed tracing with OpenTelemetry, implement custom business metrics. Create Grafana dashboards for key metrics, set up PagerDuty alerts for performance degradation. Define SLIs/SLOs for critical services with error budgets."
- Context: Performance improvements from all previous phases
- Output: Monitoring dashboards, alert rules, SLI/SLO definitions, runbooks

### 13. Continuous Performance Optimization

- Use Task tool with subagent_type="performance-engineer"
- Prompt: "Establish continuous optimization process for: $ARGUMENTS. Create performance budget tracking, implement A/B testing for performance changes, set up continuous profiling in production. Document optimization opportunities backlog, create capacity planning models, and establish regular performance review cycles."
- Context: Monitoring setup from step 12, all previous optimization work
- Output: Performance budget tracking, optimization backlog, capacity planning, review process

## Configuration Options

- **performance_focus**: "latency" | "throughput" | "cost" | "balanced" (default: "balanced")
- **optimization_depth**: "quick-wins" | "comprehensive" | "enterprise" (default: "comprehensive")
- **tools_available**: ["datadog", "newrelic", "prometheus", "grafana", "k6", "gatling"]
- **budget_constraints**: Set maximum acceptable costs for infrastructure changes
- **user_impact_tolerance**: "zero-downtime" | "maintenance-window" | "gradual-rollout"

## Success Criteria

- **Response Time**: P50 < 200ms, P95 < 1s, P99 < 2s for critical endpoints
- **Core Web Vitals**: LCP < 2.5s, FID < 100ms, CLS < 0.1
- **Throughput**: Support 2x current peak load with <1% error rate
- **Database Performance**: Query P95 < 100ms, no queries > 1s
- **Resource Utilization**: CPU < 70%, Memory < 80% under normal load
- **Cost Efficiency**: Performance per dollar improved by minimum 30%
- **Monitoring Coverage**: 100% of critical paths instrumented with alerting

Performance optimization target: $ARGUMENTS

Overview

This skill optimizes end-to-end application performance by orchestrating profiling, targeted tuning, load validation, and continuous monitoring across frontend, backend, and infrastructure. It guides a phased, data-driven workflow that finds real bottlenecks, applies measurable fixes, and prevents regressions. The approach balances user-centric metrics, cost-effectiveness, and operational safety.

How this skill works

The skill runs a phased process: establish baselines via profiling and RUM, assess observability gaps, then apply database, backend, microservices, frontend, CDN, and mobile optimizations. Each phase produces actionable artifacts (flame graphs, optimized queries, CDN rules, load-test reports) which feed the next phase. Final steps validate improvements with load tests and integrate continuous monitoring, performance budgets, and automated regression checks.

When to use it

Coordinating cross-stack performance efforts involving frontend, backend, and infra
Establishing profiling baselines and identifying true bottlenecks before changes
Designing load tests, capacity plans, or performance budgets
Implementing observability and SLI/SLOs for critical user journeys
Planning an iterative optimization program with measurable outcomes

Best practices

Confirm goals, constraints, and acceptable user-impact tolerance before any test or rollout
Start with real profiling and RUM data — avoid premature micro-optimizations
Prioritize fixes by user-impact and cost-effectiveness (quick wins first, deep fixes later)
Run load tests in controlled environments and never stress production without approvals and safeguards
Automate performance regression tests and integrate them into CI/CD for early detection

Example use cases

Reduce backend API P95 latency from 1.2s to <1s by optimizing slow queries and adding caching
Improve Core Web Vitals for a high-traffic SPA using code-splitting, lazy loading, and image optimization
Scale a microservices platform to handle 2x peak traffic with service mesh tuning and message-queue optimizations
Design a performance budget and CI checks to prevent bundle size regressions and ensure SLIs remain within targets

FAQ

Can this skill run load tests against production?

Not recommended. Load testing production requires approvals, traffic shaping, and safeguards. Use staging with production-like data when possible.

What tools does the workflow use?

It integrates common tools like OpenTelemetry, Datadog/New Relic, Prometheus/Grafana, k6/Gatling, and CDN features (CloudFront/Cloudflare), but it adapts to available tooling.

How do we measure success?

Success is defined by concrete targets: response-time percentiles, Core Web Vitals thresholds, throughput goals, resource utilization, and a defined performance-per-dollar improvement.