home / skills / gtmagents / gtm-agents / site-performance-watch

site-performance-watch skill

safe

/plugins/e-commerce/skills/site-performance-watch

This skill helps you monitor ecommerce site speed, errors, and uptime, and respond with proactive alerts and incident communications.

npx playbooks add skill gtmagents/gtm-agents --skill site-performance-watch

Review the files below or copy the command above to add this skill to your agents.

Files (1)

SKILL.md

1.3 KB

---
name: site-performance-watch
description: Monitoring and alerting framework for ecommerce site speed, errors, and
  uptime.
---

# Site Performance Watch Skill

## When to Use
- Setting up proactive monitoring during campaigns or launches.
- Investigating conversion drops tied to latency or availability issues.
- Communicating performance incidents to merchandising and engineering teams.

## Framework
1. **KPI Stack** – FCP, LCP, CLS, TTFB, checkout API latency, error rates, uptime.
2. **Segmentation** – device, geography, browser, promotion, traffic source.
3. **Alerting Rules** – thresholds, aggregation windows, escalation paths, war-room triggers.
4. **Diagnostics** – logging, tracing, screenshot/session replay hooks.
5. **Comms Kit** – stakeholder updates, status pages, rollback plans.

## Templates
- Performance scorecard with spark lines + thresholds.
- Incident log template with root cause, mitigation, and follow-up tasks.
- Escalation matrix covering engineering, DevOps, merchandising, and CX leads.

## Tips
- Pair synthetic monitoring with RUM data to catch both systemic and localized issues.
- Freeze experiment/merch changes when performance breaches critical thresholds.
- Use alongside `diagnose-conversion-drop` to correlate experience and performance data.

---

Overview

This skill provides a production-ready monitoring and alerting framework for ecommerce site speed, errors, and uptime. It combines KPI tracking, segmentation, alerting rules, diagnostics hooks, and stakeholder communications to detect and manage performance issues that impact conversion and revenue. The implementation is Python-based and designed to integrate synthetic checks, RUM data, and incident workflows.

How this skill works

The skill continuously collects a KPI stack (FCP, LCP, CLS, TTFB, checkout API latency, error rates, uptime) from synthetic probes and real user monitoring. It segments metrics by device, geography, browser, promotion, and traffic source, applies threshold and aggregation rules, and triggers alerts with escalation paths. Diagnostic hooks capture logs, traces, and session screenshots to speed root cause analysis and populate incident templates and status updates.

When to use it

Before and during product launches, marketing campaigns, or peak traffic events to detect regressions early
When investigating unexplained conversion drops or cart abandonment that may be tied to latency or errors
To formalize incident response and communicate performance incidents to merchandising, CX, and engineering
During experiments or merchandising changes to ensure experience stability under varied conditions

Best practices

Pair synthetic monitoring with RUM to capture both systemic outages and localized user impact
Define clear thresholds, aggregation windows, and escalation paths to reduce alert noise
Freeze UX experiments or merchandising changes when critical performance thresholds are breached
Instrument checkout APIs and key flows with tracing and screenshot hooks for faster diagnostics
Use templated scorecards and incident logs to standardize postmortems and follow-up tasks

Example use cases

Detecting a regional CDN degradation that increases TTFB and correlates with a drop in conversions
Alerting on rising checkout API latency during a flash sale and triggering a war-room escalation
Correlating increased CLS on mobile with a recent A/B test and rolling back the change
Populating a stakeholder status page and incident log after a partial outage affecting select browsers

FAQ

Can this skill correlate RUM and synthetic data?

Yes. The framework ingests both data types and aligns them by segment to identify systemic issues versus localized user impact.

How are alerts routed?

Alerts follow configurable escalation matrices with aggregation windows, notification channels, and war-room triggers for critical breaches.