home / skills / gtmagents / gtm-agents / site-performance-watch
This skill helps you monitor ecommerce site speed, errors, and uptime, and respond with proactive alerts and incident communications.
npx playbooks add skill gtmagents/gtm-agents --skill site-performance-watchReview the files below or copy the command above to add this skill to your agents.
---
name: site-performance-watch
description: Monitoring and alerting framework for ecommerce site speed, errors, and
uptime.
---
# Site Performance Watch Skill
## When to Use
- Setting up proactive monitoring during campaigns or launches.
- Investigating conversion drops tied to latency or availability issues.
- Communicating performance incidents to merchandising and engineering teams.
## Framework
1. **KPI Stack** – FCP, LCP, CLS, TTFB, checkout API latency, error rates, uptime.
2. **Segmentation** – device, geography, browser, promotion, traffic source.
3. **Alerting Rules** – thresholds, aggregation windows, escalation paths, war-room triggers.
4. **Diagnostics** – logging, tracing, screenshot/session replay hooks.
5. **Comms Kit** – stakeholder updates, status pages, rollback plans.
## Templates
- Performance scorecard with spark lines + thresholds.
- Incident log template with root cause, mitigation, and follow-up tasks.
- Escalation matrix covering engineering, DevOps, merchandising, and CX leads.
## Tips
- Pair synthetic monitoring with RUM data to catch both systemic and localized issues.
- Freeze experiment/merch changes when performance breaches critical thresholds.
- Use alongside `diagnose-conversion-drop` to correlate experience and performance data.
---
This skill provides a production-ready monitoring and alerting framework for ecommerce site speed, errors, and uptime. It combines KPI tracking, segmentation, alerting rules, diagnostics hooks, and stakeholder communications to detect and manage performance issues that impact conversion and revenue. The implementation is Python-based and designed to integrate synthetic checks, RUM data, and incident workflows.
The skill continuously collects a KPI stack (FCP, LCP, CLS, TTFB, checkout API latency, error rates, uptime) from synthetic probes and real user monitoring. It segments metrics by device, geography, browser, promotion, and traffic source, applies threshold and aggregation rules, and triggers alerts with escalation paths. Diagnostic hooks capture logs, traces, and session screenshots to speed root cause analysis and populate incident templates and status updates.
Can this skill correlate RUM and synthetic data?
Yes. The framework ingests both data types and aligns them by segment to identify systemic issues versus localized user impact.
How are alerts routed?
Alerts follow configurable escalation matrices with aggregation windows, notification channels, and war-room triggers for critical breaches.