home / skills / giuseppe-trisciuoglio / developer-kit / aws-cloudformation-cloudwatch

This skill helps you implement production-grade AWS CloudWatch monitoring with CloudFormation templates, covering metrics, alarms, dashboards, logs, and

npx playbooks add skill giuseppe-trisciuoglio/developer-kit --skill aws-cloudformation-cloudwatch

Review the files below or copy the command above to add this skill to your agents.

Files (3)
SKILL.md
48.6 KB
---
name: aws-cloudformation-cloudwatch
description: Provides AWS CloudFormation patterns for CloudWatch monitoring, metrics, alarms, dashboards, logs, and observability. Use when creating CloudWatch metrics, alarms, dashboards, log groups, log subscriptions, anomaly detection, synthesized canaries, Application Signals, and implementing template structure with Parameters, Outputs, Mappings, Conditions, cross-stack references, and CloudWatch best practices for monitoring production infrastructure.
category: aws
tags: [aws, cloudformation, cloudwatch, monitoring, observability, metrics, alarms, logs, dashboards, infrastructure, iaac]
version: 1.0.0
allowed-tools: Read, Write, Bash
---

# AWS CloudFormation CloudWatch Monitoring

## Overview

Create production-ready monitoring and observability infrastructure using AWS CloudFormation templates. This skill covers CloudWatch metrics, alarms, dashboards, log groups, log insights, anomaly detection, synthesized canaries, Application Signals, and best practices for parameters, outputs, and cross-stack references.

## When to Use

Use this skill when:
- Creating custom CloudWatch metrics
- Configuring CloudWatch alarms for thresholds and anomaly detection
- Creating CloudWatch dashboards for multi-region visualization
- Implementing log groups with retention and encryption
- Configuring log subscriptions and cross-account log aggregation
- Implementing synthesized canaries for synthetic monitoring
- Enabling Application Signals for application performance monitoring
- Organizing templates with Parameters, Outputs, Mappings, Conditions
- Implementing cross-stack references with export/import
- Using Transform for macros and reuse

## Instructions

Follow these steps to create CloudWatch monitoring infrastructure with CloudFormation:

1. **Define Alarm Parameters**: Specify metric namespaces, dimensions, and threshold values
2. **Create CloudWatch Alarms**: Set up alarms for CPU, memory, disk, and custom metrics
3. **Configure Alarm Actions**: Define SNS topics for notification delivery
4. **Create Dashboards**: Build visualization widgets for metrics across resources
5. **Set Up Log Groups**: Configure retention policies and encryption settings
6. **Implement Anomaly Detection**: Enable ML-based detection for unusual patterns
7. **Add Synthesized Canaries**: Create CI/CD checks for critical endpoints
8. **Configure Composite Alarms**: Build multi-condition alarm logic

For complete examples, see the [EXAMPLES.md](references/examples.md) file.

## Examples

The following examples demonstrate common CloudWatch patterns:

### Example 1: CPU Utilization Alarm

```yaml
HighCpuAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    AlarmName: !Sub "${AWS::StackName}-high-cpu"
    AlarmDescription: Trigger when CPU utilization exceeds threshold
    MetricName: CPUUtilization
    Namespace: AWS/EC2
    Dimensions:
      - Name: InstanceId
        Value: !Ref InstanceId
    Statistic: Average
    Period: 60
    EvaluationPeriods: 3
    Threshold: 80
    ComparisonOperator: GreaterThanThreshold
    AlarmActions:
      - !Ref SnsTopicArn
```

### Example 2: Dashboard with Multiple Widgets

```yaml
Dashboard:
  Type: AWS::CloudWatch::Dashboard
  Properties:
    DashboardName: !Sub "${AWS::StackName}-dashboard"
    DashboardBody: !Sub |
      {
        "widgets": [
          {
            "type": "metric",
            "x": 0, "y": 0,
            "width": 12, "height": 6,
            "properties": {
              "title": "CPU Utilization",
              "metrics": [["AWS/EC2", "CPUUtilization", "InstanceId", "!Ref InstanceId"]],
              "period": 300,
              "stat": "Average",
              "region": "!Ref AWS::Region"
            }
          }
        ]
      }
```

### Example 3: Log Group with Retention

```yaml
ApplicationLogGroup:
  Type: AWS::Logs::LogGroup
  Properties:
    LogGroupName: !Sub "/ecs/${AWS::StackName}"
    RetentionInDays: 30
    KmsKeyId: !Ref KmsKeyArn
```

For complete production-ready examples, see [EXAMPLES.md](references/examples.md).

## CloudFormation Template Structure

### Base Template with Standard Format

```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: CloudWatch monitoring and observability stack

Metadata:
  AWS::CloudFormation::Interface:
    ParameterGroups:
      - Label:
          default: Monitoring Configuration
        Parameters:
          - Environment
          - LogRetentionDays
          - EnableAnomalyDetection
      - Label:
          default: Alarm Thresholds
        Parameters:
          - ErrorRateThreshold
          - LatencyThreshold
          - CpuUtilizationThreshold

Parameters:
  Environment:
    Type: String
    Default: dev
    AllowedValues:
      - dev
      - staging
      - production
    Description: Deployment environment

  LogRetentionDays:
    Type: Number
    Default: 30
    AllowedValues:
      - 1
      - 3
      - 5
      - 7
      - 14
      - 30
      - 60
      - 90
      - 120
      - 150
      - 180
      - 365
      - 400
      - 545
      - 731
      - 1095
      - 1827
      - 2190
      - 2555
      - 2922
      - 3285
      - 3650
    Description: Number of days to retain log events

  EnableAnomalyDetection:
    Type: String
    Default: false
    AllowedValues:
      - true
      - false
    Description: Enable CloudWatch anomaly detection

  ErrorRateThreshold:
    Type: Number
    Default: 5
    Description: Error rate threshold for alarms (percentage)

  LatencyThreshold:
    Type: Number
    Default: 1000
    Description: Latency threshold in milliseconds

  CpuUtilizationThreshold:
    Type: Number
    Default: 80
    Description: CPU utilization threshold (percentage)

Mappings:
  EnvironmentConfig:
    dev:
      LogRetentionDays: 7
      ErrorRateThreshold: 10
      LatencyThreshold: 2000
      CpuUtilizationThreshold: 90
    staging:
      LogRetentionDays: 14
      ErrorRateThreshold: 5
      LatencyThreshold: 1500
      CpuUtilizationThreshold: 85
    production:
      LogRetentionDays: 30
      ErrorRateThreshold: 1
      LatencyThreshold: 500
      CpuUtilizationThreshold: 80

Conditions:
  IsProduction: !Equals [!Ref Environment, production]
  IsStaging: !Equals [!Ref Environment, staging]
  EnableAnomaly: !Equals [!Ref EnableAnomalyDetection, true]

Transform:
  - AWS::Serverless-2016-10-31

Resources:
  # Log Group per applicazione
  ApplicationLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: !Sub "/aws/applications/${Environment}/application"
      RetentionInDays: !Ref LogRetentionDays
      KmsKeyId: !Ref LogKmsKey
      Tags:
        - Key: Environment
          Value: !Ref Environment
        - Key: Application
          Value: !Ref ApplicationName

Outputs:
  LogGroupName:
    Description: Name of the application log group
    Value: !Ref ApplicationLogGroup
    Export:
      Name: !Sub "${AWS::StackName}-LogGroupName"
```

## Parameters Best Practices

### AWS-Specific Parameter Types

```yaml
Parameters:
  # AWS-specific types for validation
  CloudWatchNamespace:
    Type: AWS::CloudWatch::Namespace
    Description: CloudWatch metric namespace

  AlarmActionArn:
    Type: AWS::SNS::Topic::Arn
    Description: SNS topic ARN for alarm actions

  LogKmsKeyArn:
    Type: AWS::KMS::Key::Arn
    Description: KMS key ARN for log encryption

  DashboardArn:
    Type: AWS::CloudWatch::Dashboard::Arn
    Description: Existing dashboard ARN to import

  AnomalyDetectorArn:
    Type: AWS::CloudWatch::AnomalyDetector::Arn
    Description: Existing anomaly detector ARN
```

### Parameter Constraints

```yaml
Parameters:
  MetricName:
    Type: String
    Description: CloudWatch metric name
    ConstraintDescription: Must be 1-256 characters, alphanumeric, underscore, period, dash
    MinLength: 1
    MaxLength: 256
    AllowedPattern: "[a-zA-Z0-9._-]+"

  ThresholdValue:
    Type: Number
    Description: Alarm threshold value
    MinValue: 0
    MaxValue: 1000000000

  EvaluationPeriods:
    Type: Number
    Description: Number of evaluation periods
    Default: 5
    MinValue: 1
    MaxValue: 100
    ConstraintDescription: Must be between 1 and 100

  DatapointsToAlarm:
    Type: Number
    Description: Datapoints that must breach to trigger alarm
    Default: 5
    MinValue: 1
    MaxValue: 10

  Period:
    Type: Number
    Description: Metric period in seconds
    Default: 300
    AllowedValues:
      - 10
      - 30
      - 60
      - 300
      - 900
      - 3600
      - 21600
      - 86400

  ComparisonOperator:
    Type: String
    Description: Alarm comparison operator
    Default: GreaterThanThreshold
    AllowedValues:
      - GreaterThanThreshold
      - GreaterThanOrEqualToThreshold
      - LessThanThreshold
      - LessThanOrEqualToThreshold
      - GreaterThanUpperBound
      - LessThanLowerBound
```

### SSM Parameter References

```yaml
Parameters:
  AlarmTopicArn:
    Type: AWS::SSM::Parameter::Value<AWS::SNS::Topic::Arn>
    Default: /monitoring/alarms/topic-arn
    Description: SNS topic ARN from SSM Parameter Store

  DashboardConfig:
    Type: AWS::SSM::Parameter::Value<String>
    Default: /monitoring/dashboards/config
    Description: Dashboard configuration from SSM
```

## Outputs and Cross-Stack References

### Export/Import Patterns

```yaml
# Stack A - Monitoring Stack
AWSTemplateFormatVersion: 2010-09-09
Description: Central monitoring infrastructure stack

Resources:
  AlarmTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: !Sub "${AWS::StackName}-alarms"
      DisplayName: !Sub "${AWS::StackName} Alarm Notifications"

  LogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: !Sub "/aws/monitoring/${AWS::StackName}"
      RetentionInDays: 30

Outputs:
  AlarmTopicArn:
    Description: ARN of the alarm SNS topic
    Value: !Ref AlarmTopic
    Export:
      Name: !Sub "${AWS::StackName}-AlarmTopicArn"

  LogGroupName:
    Description: Name of the log group
    Value: !Ref LogGroup
    Export:
      Name: !Sub "${AWS::StackName}-LogGroupName"

  LogGroupArn:
    Description: ARN of the log group
    Value: !GetAtt LogGroup.Arn
    Export:
      Name: !Sub "${AWS::StackName}-LogGroupArn"
```

```yaml
# Stack B - Application Stack (imports from Monitoring Stack)
AWSTemplateFormatVersion: 2010-09-09
Description: Application stack with monitoring integration

Parameters:
  MonitoringStackName:
    Type: String
    Description: Name of the monitoring stack

Resources:
  LambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: !Sub "${AWS::StackName}-processor"
      Runtime: python3.11
      Handler: app.handler
      Code:
        S3Bucket: !Ref CodeBucket
        S3Key: lambda/function.zip
      Role: !GetAtt LambdaExecutionRole.Arn

  ErrorAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-errors"
      AlarmDescription: Alert on Lambda errors
      MetricName: Errors
      Namespace: AWS/Lambda
      Dimensions:
        - Name: FunctionName
          Value: !Ref LambdaFunction
      Statistic: Sum
      Period: 60
      EvaluationPeriods: 5
      Threshold: 1
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - !ImportValue
          !Sub "${MonitoringStackName}-AlarmTopicArn"

  HighLatencyAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-latency"
      AlarmDescription: Alert on high latency
      MetricName: Duration
      Namespace: AWS/Lambda
      Dimensions:
        - Name: FunctionName
          Value: !Ref LambdaFunction
      Statistic: P99
      Period: 60
      EvaluationPeriods: 3
      Threshold: 5000
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - !ImportValue
          !Sub "${MonitoringStackName}-AlarmTopicArn"
```

### Nested Stacks for Modularity

```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: Main stack with nested monitoring stacks

Resources:
  # Nested stack for alarms
  AlarmsStack:
    Type: AWS::CloudFormation::Stack
    Properties:
      TemplateURL: https://s3.amazonaws.com/bucket/monitoring/alarms.yaml
      TimeoutInMinutes: 15
      Parameters:
        Environment: !Ref Environment
        AlarmTopicArn: !Ref AlarmTopicArn

  # Nested stack for dashboards
  DashboardsStack:
    Type: AWS::CloudFormation::Stack
    Properties:
      TemplateURL: https://s3.amazonaws.com/bucket/monitoring/dashboards.yaml
      TimeoutInMinutes: 15
      Parameters:
        Environment: !Ref Environment
        LogGroupNames: !Join [",", [!GetAtt AlarmsStack.Outputs.LogGroupName]]

  # Nested stack for log insights
  LogInsightsStack:
    Type: AWS::CloudFormation::Stack
    Properties:
      TemplateURL: https://s3.amazonaws.com/bucket/monitoring/log-insights.yaml
      TimeoutInMinutes: 15
      Parameters:
        Environment: !Ref Environment
```

## CloudWatch Metrics and Alarms

### Base Metric Alarm

```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: CloudWatch metric alarms

Resources:
  # Error rate alarm
  ErrorRateAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-error-rate"
      AlarmDescription: Alert when error rate exceeds threshold
      MetricName: ErrorRate
      Namespace: !Ref CustomNamespace
      Dimensions:
        - Name: Service
          Value: !Ref ServiceName
        - Name: Environment
          Value: !Ref Environment
      Statistic: Average
      Period: 60
      EvaluationPeriods: 5
      DatapointsToAlarm: 3
      Threshold: !Ref ErrorRateThreshold
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - !Ref AlarmTopic
      InsufficientDataActions:
        - !Ref AlarmTopic
      OKActions:
        - !Ref AlarmTopic
      Tags:
        - Key: Environment
          Value: !Ref Environment
        - Key: Severity
          Value: high

  # P99 latency alarm
  LatencyAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-p99-latency"
      AlarmDescription: Alert when P99 latency exceeds threshold
      MetricName: Latency
      Namespace: !Ref CustomNamespace
      Dimensions:
        - Name: Service
          Value: !Ref ServiceName
      Statistic: p99
      ExtendedStatistic: "p99"
      Period: 60
      EvaluationPeriods: 3
      Threshold: !Ref LatencyThreshold
      ComparisonOperator: GreaterThanThreshold
      TreatMissingData: notBreaching
      AlarmActions:
        - !Ref AlarmTopic

  # 4xx errors alarm
  ClientErrorAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-4xx-errors"
      AlarmDescription: Alert on high 4xx error rate
      MetricName: 4XXError
      Namespace: AWS/ApiGateway
      Dimensions:
        - Name: ApiName
          Value: !Ref ApiName
        - Name: Stage
          Value: !Ref StageName
      Statistic: Sum
      Period: 300
      EvaluationPeriods: 2
      Threshold: 100
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - !Ref AlarmTopic

  # 5xx errors alarm
  ServerErrorAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-5xx-errors"
      AlarmDescription: Alert on high 5xx error rate
      MetricName: 5XXError
      Namespace: AWS/ApiGateway
      Dimensions:
        - Name: ApiName
          Value: !Ref ApiName
        - Name: Stage
          Value: !Ref StageName
      Statistic: Sum
      Period: 60
      EvaluationPeriods: 2
      Threshold: 10
      ComparisonOperator: GreaterThanThreshold
      ComparisonOperator: GreaterThanOrEqualToThreshold
      AlarmActions:
        - !Ref AlarmTopic
```

### Composite Alarm

```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: CloudWatch composite alarms

Resources:
  # Base alarm for Lambda errors
  LambdaErrorAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-lambda-errors"
      MetricName: Errors
      Namespace: AWS/Lambda
      Dimensions:
        - Name: FunctionName
          Value: !Ref LambdaFunction
      Statistic: Sum
      Period: 60
      EvaluationPeriods: 5
      Threshold: 5
      ComparisonOperator: GreaterThanThreshold

  # Base alarm for Lambda throttles
  LambdaThrottleAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-lambda-throttles"
      MetricName: Throttles
      Namespace: AWS/Lambda
      Dimensions:
        - Name: FunctionName
          Value: !Ref LambdaFunction
      Statistic: Sum
      Period: 60
      EvaluationPeriods: 5
      Threshold: 3
      ComparisonOperator: GreaterThanThreshold

  # Composite alarm combining both
  LambdaHealthCompositeAlarm:
    Type: AWS::CloudWatch::CompositeAlarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-lambda-health"
      AlarmDescription: Composite alarm for Lambda function health
      AlarmRule: !Or
        - !Ref LambdaErrorAlarm
        - !Ref LambdaThrottleAlarm
      ActionsEnabled: true
      AlarmActions:
        - !Ref AlarmTopic
      Tags:
        - Key: Service
          Value: lambda
        - Key: Tier
          Value: application
```

### Anomaly Detection Alarm

```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: CloudWatch anomaly detection

Resources:
  # Anomaly detector for metric
  RequestRateAnomalyDetector:
    Type: AWS::CloudWatch::AnomalyDetector
    Properties:
      MetricName: RequestCount
      Namespace: !Ref CustomNamespace
      Dimensions:
        - Name: Service
          Value: !Ref ServiceName
        - Name: Environment
          Value: !Ref Environment
      Statistic: Sum
      Configuration:
        ExcludedTimeRanges:
          - StartTime: "2023-12-25T00:00:00"
            EndTime: "2023-12-26T00:00:00"
        MetricTimeZone: UTC

  # Alarm based on anomaly detection
  AnomalyAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-anomaly-detection"
      AlarmDescription: Alert on anomalous metric behavior
      MetricName: RequestCount
      Namespace: !Ref CustomNamespace
      Dimensions:
        - Name: Service
          Value: !Ref ServiceName
      AnomalyDetectorConfiguration:
        ExcludeTimeRange:
          StartTime: "2023-12-25T00:00:00"
          EndTime: "2023-12-26T00:00:00"
      Statistic: Sum
      Period: 300
      EvaluationPeriods: 2
      Threshold: 2
      ComparisonOperator: GreaterThanUpperThreshold
      AlarmActions:
        - !Ref AlarmTopic

  # Alarm for low anomalous value
  LowTrafficAnomalyAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-low-traffic"
      AlarmDescription: Alert on unusually low traffic
      MetricName: RequestCount
      Namespace: !Ref CustomNamespace
      Dimensions:
        - Name: Service
          Value: !Ref ServiceName
      AnomalyDetectorConfiguration:
        Bound: Lower
      Statistic: Sum
      Period: 300
      EvaluationPeriods: 3
      Threshold: 0.5
      ComparisonOperator: LessThanLowerThreshold
      AlarmActions:
        - !Ref AlarmTopic
```

## CloudWatch Dashboards

### Dashboard Base

```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: CloudWatch dashboard

Resources:
  # Main dashboard
  MainDashboard:
    Type: AWS::CloudWatch::Dashboard
    Properties:
      DashboardName: !Sub "${AWS::StackName}-main"
      DashboardBody: !Sub |
        {
          "widgets": [
            {
              "type": "metric",
              "x": 0,
              "y": 0,
              "width": 12,
              "height": 6,
              "properties": {
                "title": "API Gateway Requests",
                "view": "timeSeries",
                "stacked": false,
                "region": "${AWS::Region}",
                "metrics": [
                  ["AWS/ApiGateway", "Count", "ApiName", "${ApiName}", "Stage", "${StageName}"],
                  [".", "4XXError", ".", ".", ".", "."],
                  [".", "5XXError", ".", ".", ".", "."]
                ],
                "period": 300,
                "stat": "Sum"
              }
            },
            {
              "type": "metric",
              "x": 12,
              "y": 0,
              "width": 12,
              "height": 6,
              "properties": {
                "title": "API Gateway Latency",
                "view": "timeSeries",
                "region": "${AWS::Region}",
                "metrics": [
                  ["AWS/ApiGateway", "Latency", "ApiName", "${ApiName}", "Stage", "${StageName}", {"stat": "p99"}],
                  [".", ".", ".", ".", ".", ".", {"stat": "Average"}]
                ],
                "period": 300
              }
            },
            {
              "type": "metric",
              "x": 0,
              "y": 6,
              "width": 12,
              "height": 6,
              "properties": {
                "title": "Lambda Invocations",
                "view": "timeSeries",
                "region": "${AWS::Region}",
                "metrics": [
                  ["AWS/Lambda", "Invocations", "FunctionName", "${LambdaFunction}"],
                  [".", "Errors", ".", "."],
                  [".", "Throttles", ".", "."]
                ],
                "period": 60,
                "stat": "Sum"
              }
            },
            {
              "type": "metric",
              "x": 12,
              "y": 6,
              "width": 12,
              "height": 6,
              "properties": {
                "title": "Lambda Duration",
                "view": "timeSeries",
                "region": "${AWS::Region}",
                "metrics": [
                  ["AWS/Lambda", "Duration", "FunctionName", "${LambdaFunction}", {"stat": "p99"}],
                  [".", ".", ".", ".", {"stat": "Average"}],
                  [".", ".", ".", ".", {"stat": "Maximum"}]
                ],
                "period": 60
              }
            },
            {
              "type": "log",
              "x": 0,
              "y": 12,
              "width": 24,
              "height": 6,
              "properties": {
                "title": "Application Logs",
                "view": "table",
                "region": "${AWS::Region}",
                "logGroupName": "${ApplicationLogGroup}",
                "timeRange": {
                  "type": "relative",
                  "from": 3600
                },
                "filterPattern": "ERROR | WARN"
              }
            }
          ]
        }

  # Dashboard for specific service
  ServiceDashboard:
    Type: AWS::CloudWatch::Dashboard
    Properties:
      DashboardName: !Sub "${AWS::StackName}-${ServiceName}"
      DashboardBody: !Sub |
        {
          "start": "-PT6H",
          "widgets": [
            {
              "type": "text",
              "x": 0,
              "y": 0,
              "width": 24,
              "height": 1,
              "properties": {
                "markdown": "# ${ServiceName} - ${Environment} Dashboard"
              }
            },
            {
              "type": "metric",
              "x": 0,
              "y": 1,
              "width": 8,
              "height": 6,
              "properties": {
                "title": "Request Rate",
                "view": "timeSeries",
                "stacked": false,
                "region": "${AWS::Region}",
                "metrics": [
                  ["${CustomNamespace}", "RequestCount", "Service", "${ServiceName}", "Environment", "${Environment}"]
                ],
                "period": 60,
                "stat": "Sum"
              }
            },
            {
              "type": "metric",
              "x": 8,
              "y": 1,
              "width": 8,
              "height": 6,
              "properties": {
                "title": "Error Rate %",
                "view": "timeSeries",
                "region": "${AWS::Region}",
                "metrics": [
                  ["${CustomNamespace}", "ErrorCount", "Service", "${ServiceName}"],
                  [".", "RequestCount", ".", "."],
                  [".", "SuccessCount", ".", "."]
                ],
                "period": 60,
                "stat": "Average"
              }
            },
            {
              "type": "metric",
              "x": 16,
              "y": 1,
              "width": 8,
              "height": 6,
              "properties": {
                "title": "P99 Latency",
                "view": "timeSeries",
                "region": "${AWS::Region}",
                "metrics": [
                  ["${CustomNamespace}", "Latency", "Service", "${ServiceName}"]
                ],
                "period": 60,
                "stat": "p99"
              }
            }
          ]
        }
```

## CloudWatch Logs

### Log Group Configurations

```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: CloudWatch log groups configuration

Parameters:
  LogRetentionDays:
    Type: Number
    Default: 30
    AllowedValues:
      - 1
      - 3
      - 5
      - 7
      - 14
      - 30
      - 60
      - 90
      - 120
      - 150
      - 180
      - 365
      - 400
      - 545
      - 731
      - 1095
      - 1827
      - 2190
      - 2555
      - 2922
      - 3285
      - 3650

Resources:
  # Application log group
  ApplicationLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: !Sub "/aws/applications/${Environment}/${ApplicationName}"
      RetentionInDays: !Ref LogRetentionDays
      KmsKeyId: !Ref LogKmsKeyArn
      Tags:
        - Key: Environment
          Value: !Ref Environment
        - Key: Application
          Value: !Ref ApplicationName
        - Key: Service
          Value: !Ref ServiceName

  # Lambda log group
  LambdaLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: !Sub "/aws/lambda/${LambdaFunctionName}"
      RetentionInDays: !Ref LogRetentionDays
      KmsKeyId: !Ref LogKmsKeyArn

  # Subscription filter for Log Insights
  LogSubscriptionFilter:
    Type: AWS::Logs::SubscriptionFilter
    Properties:
      DestinationArn: !GetAtt LogDestination.Arn
      FilterPattern: '[timestamp=*Z, request_id, level, message]'
      LogGroupName: !Ref ApplicationLogGroup
      RoleArn: !GetAtt LogSubscriptionRole.Arn

  # Metric filter for errors
  ErrorMetricFilter:
    Type: AWS::Logs::MetricFilter
    Properties:
      FilterPattern: '[level="ERROR", msg]'
      LogGroupName: !Ref ApplicationLogGroup
      MetricTransformations:
        - MetricValue: "1"
          MetricNamespace: !Sub "${AWS::StackName}/Application"
          MetricName: ErrorCount
        - MetricValue: "$level"
          MetricNamespace: !Sub "${AWS::StackName}/Application"
          MetricName: LogLevel

  # Metric filter for warnings
  WarningMetricFilter:
    Type: AWS::Logs::MetricFilter
    Properties:
      FilterPattern: '[level="WARN", msg]'
      LogGroupName: !Ref ApplicationLogGroup
      MetricTransformations:
        - MetricValue: "1"
          MetricNamespace: !Sub "${AWS::StackName}/Application"
          MetricName: WarningCount

  # Log group with custom retention
  AuditLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: !Sub "/aws/audit/${Environment}/${ApplicationName}"
      RetentionInDays: 365
      KmsKeyId: !Ref LogKmsKeyArn
```

### Log Insights Query

```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: CloudWatch Logs Insights queries

Resources:
  # Query definition for recent errors
  RecentErrorsQuery:
    Type: AWS::Logs::QueryDefinition
    Properties:
      Name: !Sub "${AWS::StackName}-recent-errors"
      QueryString: |
        fields @timestamp, @message
        | sort @timestamp desc
        | limit 100
        | filter @message like /ERROR/
        | display @timestamp, @message, @logStream

  # Query for performance analysis
  PerformanceQuery:
    Type: AWS::Logs::QueryDefinition
    Properties:
      Name: !Sub "${AWS::StackName}-performance"
      QueryString: |
        fields @timestamp, @message, @duration
        | filter @duration > 1000
        | sort @duration desc
        | limit 50
        | display @timestamp, @duration, @message
```

## Synthesized Canaries

```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: CloudWatch Synthesized Canaries

Parameters:
  CanarySchedule:
    Type: String
    Default: rate(5 minutes)
    Description: Schedule expression for canary

Resources:
  # Canary for API endpoint
  ApiCanary:
    Type: AWS::Synthetics::Canary
    Properties:
      Name: !Sub "${AWS::StackName}-api-check"
      ArtifactS3Location: !Sub "s3://${ArtifactBucket}/canary/${AWS::StackName}"
      Code:
        S3Bucket: !Ref CanariesCodeBucket
        S3Key: canary/api-check.zip
        Handler: apiCheck.handler
      ExecutionRoleArn: !GetAtt CanaryRole.Arn
      RuntimeVersion: syn-python-selenium-1.1
      Schedule:
        Expression: !Ref CanarySchedule
        DurationInSeconds: 120
      SuccessRetentionPeriodInDays: 31
      FailureRetentionPeriodInDays: 31
      Tags:
        - Key: Environment
          Value: !Ref Environment
        - Key: Service
          Value: api

  # Alarm for canary failure
  CanaryFailureAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-canary-failed"
      AlarmDescription: Alert when synthesized canary fails
      MetricName: Failed
      Namespace: AWS/Synthetics
      Dimensions:
        - Name: CanaryName
          Value: !Ref ApiCanary
      Statistic: Sum
      Period: 60
      EvaluationPeriods: 2
      Threshold: 1
      ComparisonOperator: GreaterThanOrEqualToThreshold
      AlarmActions:
        - !Ref AlarmTopic

  # Alarm for canary latency
  CanaryLatencyAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-canary-slow"
      AlarmDescription: Alert when canary latency is high
      MetricName: Duration
      Namespace: AWS/Synthetics
      Dimensions:
        - Name: CanaryName
          Value: !Ref ApiCanary
      Statistic: p99
      Period: 300
      EvaluationPeriods: 3
      Threshold: 5000
      ComparisonOperator: GreaterThanThreshold

  CanaryRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: !Sub "${AWS::StackName}-canary-role"
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service: synthetics.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: SyntheticsLeastPrivilege
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action:
                  - synthetics:DescribeCanaries
                  - synthetics:DescribeCanaryRuns
                  - synthetics:GetCanary
                  - synthetics:ListTagsForResource
                Resource: "*"
              - Effect: Allow
                Action:
                  - synthetics:StartCanary
                  - synthetics:StopCanary
                Resource: !Ref ApiCanary
              - Effect: Allow
                Action:
                  - logs:CreateLogGroup
                  - logs:CreateLogStream
                  - logs:PutLogEvents
                  - logs:DescribeLogStreams
                Resource: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/cw-syn-canary-*"
              - Effect: Allow
                Action:
                  - s3:PutObject
                  - s3:GetObject
                Resource: !Sub "s3://${ArtifactBucket}/canary/${AWS::StackName}/*"
              - Effect: Allow
                Action:
                  - kms:Decrypt
                Resource: !Ref KmsKeyArn
                Condition:
                  StringEquals:
                    kms:ViaService: !Sub "s3.${AWS::Region}.amazonaws.com"
```

## CloudWatch Application Signals

```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: CloudWatch Application Signals for APM

Resources:
  # Service level indicator for availability
  AvailabilitySLI:
    Type: AWS::CloudWatch::ServiceLevelObjective
    Properties:
      Name: !Sub "${AWS::StackName}-availability"
      Description: Service level objective for availability
      Monitor:
        MonitorName: !Sub "${AWS::StackName}-monitor"
        MonitorType: AWS_SERVICE_LEVEL_INDICATOR
        ResourceGroup: !Ref ResourceGroup
      SliMetric:
        MetricName: Availability
        Namespace: !Sub "${AWS::StackName}/Application"
        Dimensions:
          - Name: Service
            Value: !Ref ServiceName
      Target:
        ComparisonOperator: GREATER_THAN_OR_EQUAL
        Threshold: 99.9
        Period:
          RollingInterval:
            Count: 1
            TimeUnit: HOUR
      Goal:
        TargetLevel: 99.9

  # Service level indicator for latency
  LatencySLI:
    Type: AWS::CloudWatch::ServiceLevelIndicator
    Properties:
      Name: !Sub "${AWS::StackName}-latency-sli"
      Monitor:
        MonitorName: !Sub "${AWS::StackName}-monitor"
      Metric:
        MetricName: Latency
        Namespace: !Sub "${AWS::StackName}/Application"
        Dimensions:
          - Name: Service
            Value: !Ref ServiceName
      OperationName: GetItem
      AccountId: !Ref AWS::AccountId

  # Monitor for application performance
  ApplicationMonitor:
    Type: AWS::CloudWatch::ApplicationMonitor
    Properties:
      MonitorName: !Sub "${AWS::StackName}-app-monitor"
      MonitorType: CW_MONITOR
      Telemetry:
        - Type: APM
          Config:
            Eps: 100
```

## Conditions and Transform

### Conditions for Environment-Specific Resources

```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: CloudWatch with conditional resources

Parameters:
  Environment:
    Type: String
    Default: dev
    AllowedValues:
      - dev
      - staging
      - production
    Description: Deployment environment

Conditions:
  IsProduction: !Equals [!Ref Environment, production]
  IsStaging: !Equals [!Ref Environment, staging]
  CreateAnomalyDetection: !Or [!Equals [!Ref Environment, staging], !Equals [!Ref Environment, production]]
  CreateSLI: !Equals [!Ref Environment, production]

Resources:
  # Base alarm for all environments
  BaseAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-errors"
      MetricName: Errors
      Namespace: !Ref CustomNamespace
      Dimensions:
        - Name: Service
          Value: !Ref ServiceName
      Statistic: Sum
      Period: 60
      EvaluationPeriods: 5
      Threshold: 10
      ComparisonOperator: GreaterThanThreshold

  # Alarm with different thresholds for production
  ProductionAlarm:
    Type: AWS::CloudWatch::Alarm
    Condition: IsProduction
    Properties:
      AlarmName: !Sub "${AWS::StackName}-errors-production"
      MetricName: Errors
      Namespace: !Ref CustomNamespace
      Dimensions:
        - Name: Service
          Value: !Ref ServiceName
      Statistic: Sum
      Period: 60
      EvaluationPeriods: 3
      Threshold: 1
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - !Ref ProductionAlarmTopic

  # Anomaly detector only for staging and production
  AnomalyDetector:
    Type: AWS::CloudWatch::AnomalyDetector
    Condition: CreateAnomalyDetection
    Properties:
      MetricName: RequestCount
      Namespace: !Ref CustomNamespace
      Dimensions:
        - Name: Service
          Value: !Ref ServiceName
      Statistic: Sum

  # SLI only for production
  ServiceLevelIndicator:
    Type: AWS::CloudWatch::ServiceLevelIndicator
    Condition: CreateSLI
    Properties:
      Name: !Sub "${AWS::StackName}-sli"
      Monitor:
        MonitorName: !Sub "${AWS::StackName}-monitor"
      Metric:
        MetricName: Availability
        Namespace: !Sub "${AWS::StackName}/Application"
```

### Transform for Code Reuse

```yaml
AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31

Description: Using SAM Transform for CloudWatch resources

Globals:
  Function:
    Timeout: 30
    Runtime: python3.11
    Environment:
      Variables:
        LOG_LEVEL: INFO
    LoggingConfiguration:
      LogGroup:
        Name: !Sub "/aws/lambda/${FunctionName}"
        RetentionInDays: 30

Resources:
  # Lambda function with automatic logging
  MonitoredFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: !Sub "${AWS::StackName}-monitored"
      Handler: app.handler
      CodeUri: functions/monitored/
      Policies:
        - PolicyName: LogsLeastPrivilege
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action:
                  - logs:CreateLogGroup
                  - logs:CreateLogStream
                  - logs:PutLogEvents
                  - logs:DescribeLogStreams
                  - logs:GetLogEvents
                  - logs:FilterLogEvents
                Resource: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/${AWS::StackName}-*"
              - Effect: Allow
                Action:
                  - logs:DescribeLogGroups
                Resource: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:*"
      Events:
        Api:
          Type: Api
          Properties:
            Path: /health
            Method: get
```

## Best Practices

### Security

- Encrypt log groups with KMS keys
- Use resource-based policies for log access
- Implement cross-account log aggregation with proper IAM
- Configure log retention appropriate for compliance
- Use VPC endpoints for CloudWatch to isolate traffic
- Implement least privilege for IAM roles

### Performance

- Use appropriate metric periods (60s for alarms, 300s for dashboards)
- Implement composite alarms to reduce alarm fatigue
- Use anomaly detection for non-linear patterns
- Configure dashboards with efficient widgets
- Limit retention period for log groups

### Monitoring

- Implement SLI/SLO for service health
- Use multi-region dashboards for global applications
- Configure alarms with proper evaluation periods
- Implement canaries for synthetic monitoring
- Use Application Signals for APM

### Deployment

- Use change sets before deployment
- Test templates with cfn-lint
- Organize stacks by ownership (network, app, data)
- Use nested stacks for modularity
- Implement stack policies for protection

## CloudFormation Best Practices

### Stack Policies

Stack policies protect critical resources from unintentional updates. Use them to prevent modifications to production resources.

```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: CloudWatch stack with protection policies

Resources:
  CriticalAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub "${AWS::StackName}-critical"
      MetricName: Errors
      Namespace: AWS/Lambda
      Statistic: Sum
      Period: 60
      EvaluationPeriods: 5
      Threshold: 1
      ComparisonOperator: GreaterThanThreshold

Metadata:
  AWS::CloudFormation::StackPolicy:
    Statement:
      - Effect: Deny
        Principal: "*"
        Action:
          - Update:Delete
          - Update:Modify
        Resource: "*"
      - Effect: Allow
        Principal: "*"
        Action:
          - Update:Modify
        Resource: "*"
        Condition:
          StringEquals:
            aws:RequestedOperation:
              - Describe*
              - List*
```

### Termination Protection

Enable termination protection to prevent accidental stack deletion, especially for production monitoring stacks.

**Via Console:**
1. Select the stack
2. Go to Stack actions > Change termination protection
3. Enable termination protection

**Via CLI:**
```bash
aws cloudformation update-termination-protection \
  --stack-name my-monitoring-stack \
  --enable-termination-protection
```

**Via CloudFormation (Stack Set):**
```yaml
Resources:
  MonitoringStack:
    Type: AWS::CloudFormation::Stack
    Properties:
      TemplateURL: !Sub "https://${BucketName}.s3.amazonaws.com/monitoring.yaml"
      TerminationProtection: true
```

### Drift Detection

Detect when actual infrastructure differs from the CloudFormation template.

**Detect drift on a single stack:**
```bash
aws cloudformation detect-drift \
  --stack-name my-monitoring-stack
```

**Get drift detection status:**
```bash
aws cloudFormation describe-stack-drift-detection-process-status \
  --stack-drift-detection-id <detection-id>
```

**Get resources that have drifted:**
```bash
aws cloudformation list-stack-resources \
  --stack-name my-monitoring-stack \
  --query "StackResourceSummaries[?StackResourceDriftStatus!='IN_SYNC']"
```

**Automation with Lambda:**
```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: Automated drift detection scheduler

Resources:
  DriftDetectionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: CloudWatchDrift
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action:
                  - cloudformation:DetectStackDrift
                  - cloudformation:DescribeStacks
                  - cloudformation:ListStackResources
                Resource: "*"
              - Effect: Allow
                Action:
                  - sns:Publish
                Resource: !Ref AlertTopic

  DriftDetectionFunction:
    Type: AWS::Lambda::Function
    Properties:
      Runtime: python3.11
      Handler: drift_detector.handler
      Code:
        S3Bucket: !Ref CodeBucket
        S3Key: functions/drift-detector.zip
      Role: !GetAtt DriftDetectionRole.Arn
      Environment:
        Variables:
          SNS_TOPIC_ARN: !Ref AlertTopic

  DriftDetectionRule:
    Type: AWS::Events::Rule
    Properties:
      ScheduleExpression: "rate(1 day)"
      Targets:
        - Id: DriftDetection
          Arn: !GetAtt DriftDetectionFunction.Arn

  AlertTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: !Sub "${AWS::StackName}-drift-alerts"
```

### Change Sets

Use change sets to preview and review changes before applying them.

**Create change set:**
```bash
aws cloudformation create-change-set \
  --stack-name my-monitoring-stack \
  --template-body file://updated-template.yaml \
  --change-set-name my-changeset \
  --capabilities CAPABILITY_IAM
```

**List change sets:**
```bash
aws cloudformation list-change-sets \
  --stack-name my-monitoring-stack
```

**Describe change set:**
```bash
aws cloudformation describe-change-set \
  --stack-name my-monitoring-stack \
  --change-set-name my-changeset
```

**Execute change set:**
```bash
aws cloudformation execute-change-set \
  --stack-name my-monitoring-stack \
  --change-set-name my-changeset
```

**Pipeline integration:**
```yaml
AWSTemplateFormatVersion: 2010-09-09
Description: CI/CD pipeline for CloudWatch stacks

Resources:
  Pipeline:
    Type: AWS::CodePipeline::Pipeline
    Properties:
      Name: !Sub "${AWS::StackName}-pipeline"
      RoleArn: !GetAtt PipelineRole.Arn
      Stages:
        - Name: Source
          Actions:
            - Name: SourceAction
              ActionTypeId:
                Category: Source
                Owner: AWS
                Provider: CodeCommit
                Version: "1"
              Configuration:
                RepositoryName: !Ref RepositoryName
                BranchName: main
              OutputArtifacts:
                - Name: SourceOutput
        - Name: Validate
          Actions:
            - Name: ValidateTemplate
              ActionTypeId:
                Category: Test
                Owner: AWS
                Provider: CloudFormation
                Version: "1"
              Configuration:
                ActionMode: VALIDATE_ONLY
                TemplatePath: SourceOutput::template.yaml
              InputArtifacts:
                - Name: SourceOutput
        - Name: Review
          Actions:
            - Name: CreateChangeSet
              ActionTypeId:
                Category: Deploy
                Owner: AWS
                Provider: CloudFormation
                Version: "1"
              Configuration:
                ActionMode: CHANGE_SET_REPLACE
                StackName: !Ref StackName
                ChangeSetName: !Sub "${StackName}-changeset"
                TemplatePath: SourceOutput::template.yaml
                Capabilities: CAPABILITY_IAM,CAPABILITY_NAMED_IAM
              InputArtifacts:
                - Name: SourceOutput
            - Name: Approval
              ActionTypeId:
                Category: Approval
                Owner: AWS
                Provider: Manual
                Version: "1"
              Configuration:
                CustomData: Review changes before deployment
        - Name: Deploy
          Actions:
            - Name: ExecuteChangeSet
              ActionTypeId:
                Category: Deploy
                Owner: AWS
                Provider: CloudFormation
                Version: "1"
              Configuration:
                ActionMode: CHANGE_SET_EXECUTE
                StackName: !Ref StackName
                ChangeSetName: !Sub "${StackName}-changeset"

  PipelineRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: Allow
            Principal:
              Service: codepipeline.amazonaws.com
            Action: sts:AssumeRole
      Policies:
        - PolicyName: PipelinePolicy
          PolicyDocument:
            Version: "2012-10-17"
            Statement:
              - Effect: Allow
                Action:
                  - codecommit:Get*
                  - codecommit:List*
                  - codecommit:BatchGet*
                Resource: "*"
              - Effect: Allow
                Action:
                  - s3:GetObject
                  - s3:PutObject
                Resource: !Sub "arn:aws:s3:::${ArtifactBucket}/*"
              - Effect: Allow
                Action:
                  - cloudformation:*
                  - iam:PassRole
                Resource: "*"
              - Effect: Allow
                Action:
                  - sns:Publish
                Resource: !Ref ApprovalTopic

  ApprovalTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: !Sub "${AWS::StackName}-approval"
```

## Related Resources

- [CloudWatch Documentation](https://docs.aws.amazon.com/cloudwatch/)
- [AWS CloudFormation User Guide](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/)
- [CloudWatch Best Practices](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_best_practices.html)
- [Service Level Indicators](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ServiceLevelIndicators.html)
- [CloudFormation Drift Detection](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-stack-drift.html)
- [CloudFormation Change Sets](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-changesets.html)

## Constraints and Warnings

### Resource Limits

- **Alarms Limits**: Maximum 5000 CloudWatch alarms per AWS account per region
- **Dashboards Limits**: Maximum 500 dashboards per account per region
- **Metrics Limits**: Each dashboard can contain up to 100 metrics
- **Log Groups Limits**: Maximum number of log groups per account is virtually unlimited but retention has cost implications

### Operational Constraints

- **Metric Resolution**: Some metrics have 1-minute minimum resolution; higher resolution costs more
- **Alarm Evaluation Periods**: Alarms with too few evaluation periods may trigger false positives
- **Dashboard Widgets**: Each dashboard is limited to 500 widgets
- **Cross-Account Metrics**: Cross-account sharing requires explicit resource policies

### Security Constraints

- **Log Data Access**: CloudWatch Logs may contain sensitive information
- **Encryption**: KMS keys required for encrypted log groups incur additional costs
- **Metric Filters**: Metric filters count as separate billable CloudWatch Logs operations
- **Alarm Actions**: SNS topics used for alarm actions must have appropriate permissions

### Cost Considerations

- **Detailed Monitoring**: Enabling detailed monitoring doubles the number of metrics for EC2
- **Metric Storage**: Custom metrics stored in CloudWatch incur costs based on resolution and retention
- **Log Retention**: Longer retention periods significantly increase storage costs
- **Dashboards**: Dashboard widgets don't directly cost but each metric queried does

### Data Constraints

- **Metric Age**: Metrics are retained for 15 months by default; high-resolution metrics have shorter retention
- **Log Ingestion**: Large log volumes can impact ingestion latency and query performance
- **Metric Filters**: Metric filters have limits on number of filters and matching patterns

## Additional Files

For complete details on resources and their properties, consult:
- [REFERENCE.md](references/reference.md) - Detailed reference guide for all CloudFormation resources
- [EXAMPLES.md](references/examples.md) - Complete production-ready examples

Overview

This skill provides production-ready AWS CloudFormation patterns for CloudWatch monitoring and observability. It packages reusable templates and conventions for metrics, alarms, dashboards, log groups, anomaly detection, synthesized canaries, and Application Signals. Use it to enforce consistent monitoring across environments and to simplify cross-stack integrations and exports.

How this skill works

The skill supplies CloudFormation constructs and example templates that create CloudWatch metrics, alarms, dashboards, log groups, anomaly detectors, and canaries. It defines parameterized building blocks (Parameters, Mappings, Conditions, Outputs) so templates can be reused across dev, staging, and production. The patterns include SNS actions, SSM-backed parameters, encryption and retention settings, composite alarms, and cross-stack export/import wiring for centralized monitoring.

When to use it

  • When you need standard alarms for CPU, memory, disk, latency, error rates, or custom metrics
  • When creating dashboards that combine metrics across accounts or regions
  • When implementing log groups with retention, encryption, and subscription filters
  • When enabling anomaly detection, composite alarms, or synthesized canaries for SLOs
  • When organizing templates for reuse, cross-stack references, or nested stacks

Best practices

  • Parameterize thresholds, periods, and evaluation periods so environments differ by config, not code
  • Use AWS-specific parameter types (SNS::Topic::Arn, KMS::Key::Arn) for validation and clarity
  • Export central monitoring resources (SNS topic, log group) and import them in application stacks
  • Set sensible log retention, enable KMS encryption for log streams, and treat missing data explicitly
  • Use nested stacks and Transform macros to keep templates modular and maintainable

Example use cases

  • Central monitoring stack that exports an Alarm SNS topic and a central LogGroup for import by application stacks
  • Application stack that imports monitoring exports and creates resource-specific alarms (Lambda errors, P99 latency)
  • Multi-widget dashboard template that references instance IDs or function names via parameters for per-environment views
  • Template that provisions synthesized canaries for critical endpoints and wires canary results into dashboards and alarms
  • Anomaly detection configuration to automatically detect unusual traffic or latency and trigger composite alarms

FAQ

Can I reuse the templates across multiple AWS accounts?

Yes — use cross-account log subscriptions and export/import patterns. Centralize exports in a monitoring stack and grant required permissions for imports.

How do I limit noise from alarms in non-production?

Parameterize thresholds and enable conditions to disable or relax alarms for dev and staging, and use mapping defaults per environment.