home / skills / aaronontheweb / dotnet-skills / opentelementry-dotnet-instrumentation

opentelementry-dotnet-instrumentation skill

/skills/opentelementry-dotnet-instrumentation

This skill guides implementing OpenTelemetry instrumentation in .NET apps, covering tracing, metrics, naming, error handling, performance, and API design.

npx playbooks add skill aaronontheweb/dotnet-skills --skill opentelementry-dotnet-instrumentation

Review the files below or copy the command above to add this skill to your agents.

Files (1)

skill.md

16.8 KB

---
name: OpenTelemetry-NET-Instrumentation
description: Provides guidance for implementing OpenTelemetry instrumentation in .NET codebases, covering tracing (Activities/Spans), metrics, naming conventions, error handling, performance, and API design best practices.
version: 1.0.0
tags:
  - opentelemetry
  - dotnet
  - observability
  - tracing
  - metrics
  - performance
---

# OpenTelemetry .NET Instrumentation Skill

## Description
Provides guidance for implementing OpenTelemetry instrumentation in .NET codebases, covering tracing (Activities/Spans), metrics, naming conventions, error handling, performance, and API design best practices.

## When to Use
- Adding OpenTelemetry instrumentation to .NET code
- Creating or modifying ActivitySources and metrics
- Reviewing telemetry implementations for compliance
- Optimizing instrumentation performance
- Designing telemetry APIs that become part of the public surface

## Prerequisites
- .NET application with OpenTelemetry SDK
- Understanding of System.Diagnostics.Metrics and ActivitySource APIs
- Access to observability backend (e.g., Jaeger, Prometheus, Grafana)

## Core Principles

### Resiliency First
**CRITICAL**: Exceptions in diagnostic/tracing/metrics logic MUST NEVER impact application processing.
- Always protect against null Activity references except in Activity extension methods (use `activity?.ExtensionMethod()`)
- Assume Activity instances can be null (only created when listeners subscribe)
- Guard all instrumentation code with appropriate null checks

### API Surface Awareness
- Any telemetry emitted becomes part of the public API surface
- Changes are subject to breaking changes guidelines
- Telemetry should be emitted by default (users opt-in to collection via OpenTelemetry extensions)
- Exception: High-cardinality metric dimensions may require explicit opt-in

### Standards Compliance
- Follow Microsoft best practices for [distributed tracing instrumentation](https://docs.microsoft.com/en-us/dotnet/core/diagnostics/distributed-tracing-instrumentation-walkthroughs)
- Follow [OpenTelemetry semantic conventions](https://opentelemetry.io/docs/concepts/semantic-conventions/)
- All attributes must be non-null, non-empty strings

## Traces / Spans (Activities)

### ActivitySource Setup

```csharp
// ✅ CORRECT: Use ActivitySource, not DiagnosticSource
public class MyFeature
{
    // Primary ActivitySource - name typically matches the component or NuGet package name
    private static readonly ActivitySource ActivitySource = new("MyApp.MyComponent", "1.0.0");

    // Specialized ActivitySource for opt-in scenarios
    private static readonly ActivitySource DetailedActivitySource = new("MyApp.MyComponent.Detailed", "1.0.0");
}
```

**Rules**:
- Every component defines a primary `ActivitySource` for mainstream activities
- Name typically matches the component or NuGet package (e.g., `"MyCompany.MyLibrary"`)
- Version the ActivitySource using SemVer
- Create separate ActivitySources for specialized/opt-in scenarios

### Creating Activities

```csharp
// ✅ CORRECT: Check HasListeners before creating
if (ActivitySource.HasListeners())
{
    using var activity = ActivitySource.StartActivity("ProcessItem", ActivityKind.Internal);

    if (activity != null)
    {
        activity.DisplayName = "Processing order #12345";

        // Only compute expensive tags if requested
        if (activity.IsAllDataRequested)
        {
            activity.SetTag("app.item_id", itemId);
            activity.SetTag("app.item_type", itemType);
        }
    }
}

// ❌ WRONG: Don't start activities in async helper methods (breaks AsyncLocal)
async Task HelperAsync()
{
    using var activity = ActivitySource.StartActivity("Helper"); // ❌ BAD
    await DoWorkAsync();
}
```

**Rules**:
- Check `ActivitySource.HasListeners()` before creating (zero-allocation fast path)
- Always check if activity is null after creation
- Never start activities in asynchronous helper methods (`Activity.Current` uses `AsyncLocal`)
- Use `activity.IsAllDataRequested` before expensive computations
- Always use W3C ID format (enforce format change if parent uses hierarchical)

### Activity Naming

```csharp
// ✅ CORRECT: Unique operation name, friendly display name
using var activity = ActivitySource.StartActivity(
    name: "ProcessItem",              // Unique, identifies class of spans
    kind: ActivityKind.Internal
);
activity.DisplayName = "Processing order #12345"; // User-friendly, can be specific

// ❌ WRONG: Don't include runtime data in operation name
using var activity = ActivitySource.StartActivity($"Process_{itemId}"); // ❌ BAD
```

**Rules**:
- Each span type has unique `OperationName` (identifies statistically interesting class of spans)
- Operation name should NOT contain runtime data (only compile/config-time info)
- Use human-readable `DisplayName` for specifics
- Follow [OpenTelemetry span naming conventions](https://opentelemetry.io/docs/specs/otel/trace/api/#span)

### Span Attributes (Tags)

```csharp
// ✅ CORRECT: Namespace, lowercase, underscore-delimited
activity?.SetTag("myapp.order_id", orderId);
activity?.SetTag("myapp.order_type", orderType);
activity?.SetTag("myapp.db.table_name", tableName);

// Standard semantic conventions where applicable
activity?.SetTag("db.system", "postgresql");
activity?.SetTag("http.method", "GET");

// ❌ WRONG: Various naming violations
activity?.SetTag("MyApp.OrderId", orderId);         // ❌ Wrong case
activity?.SetTag("myapp.order-id", orderId);        // ❌ Wrong delimiter
activity?.SetTag("myapp.orders", count);            // ❌ Plural
activity?.SetTag("unrelated.ip_address", ip);       // ❌ Not characteristic
```

**Naming Conventions**:
- Use a namespace prefix matching your component: `myapp.*`, `myapp.db.*`
- All lowercase letters
- Underscore (`_`) delimiters for multi-word attributes
- Singular form
- Only set tags directly relevant to this activity
- Prefer standard [OpenTelemetry semantic conventions](https://opentelemetry.io/docs/specs/semconv/) over custom attributes where they exist
- Only use standard semantic conventions if certain no downstream library will set them

### Activity Status and Errors

```csharp
// ✅ CORRECT: Set status and record exceptions
try
{
    await ProcessItemAsync();
    activity?.SetStatus(ActivityStatusCode.Ok);
}
catch (Exception ex)
{
    if (activity != null)
    {
        activity.SetStatus(ActivityStatusCode.Error);
        activity.SetTag("otel.status_code", "error");
        activity.SetTag("otel.status_description", ex.Message);

        // Record exception event per OTel spec
        activity.AddEvent(new ActivityEvent(
            "exception",
            tags: new ActivityTagsCollection
            {
                ["exception.type"] = ex.GetType().FullName,
                ["exception.message"] = ex.Message,
                ["exception.stacktrace"] = ex.ToString()
            }
        ));
    }
    throw;
}
```

**Rules**:
- Set `ActivityStatusCode.Ok` on success
- Set `ActivityStatusCode.Error` on exception
- Always add `otel.status_code` and `otel.status_description` tags
- Record exception events following [OTel exception conventions](https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/exceptions/)

### Activity Events

```csharp
// ✅ CORRECT: Use events for additional context (sparingly)
activity?.AddEvent(new ActivityEvent("ItemRetried", tags: new ActivityTagsCollection
{
    ["retry_attempt"] = retryCount,
    ["next_retry_delay"] = delayMs
}));

// ❌ WRONG: Don't use events for verbose logging
activity?.AddEvent(new ActivityEvent($"Step {i} completed")); // ❌ Use logging instead
```

**Rules**:
- Events stored in-memory until transmission (use sparingly)
- Only for additional context; consider nested spans for multiple events
- Use logging for verbose information

### Accessing Activities

```csharp
// ❌ WRONG: Don't rely on Activity.Current when you need a specific span
public async Task HandleAsync(Context context)
{
    var activity = Activity.Current; // ❌ Might be a user-created span, not yours
    activity?.SetTag("custom", "value");
}

// ✅ CORRECT: Pass Activity explicitly or store it in a dedicated context object
public async Task HandleAsync(Context context)
{
    if (context.TryGetActivity(out var activity))
    {
        activity?.SetTag("custom", "value");
    }
}
```

## Metrics

### Meter and Metrics Class Setup

```csharp
// ✅ CORRECT: Group metrics by feature/component
public sealed class OrderProcessingMetrics : IDisposable
{
    private readonly Meter meter;
    private readonly Histogram<double> processingDuration;
    private readonly Counter<long> itemsProcessed;

    public OrderProcessingMetrics()
    {
        meter = new Meter("MyApp.OrderProcessing", "1.0.0");

        // Singular names, appropriate units, nested hierarchy
        processingDuration = meter.CreateHistogram<double>(
            "myapp.order.processing.duration",
            unit: "s",
            description: "Duration of order processing"
        );

        itemsProcessed = meter.CreateCounter<long>(
            "myapp.order.processing.count",
            unit: "{order}",
            description: "Number of orders processed"
        );
    }

    public void Dispose() => meter.Dispose();
}
```

**Naming Conventions** (follow [OTel semantic conventions](https://opentelemetry.io/docs/specs/semconv/general/metrics/)):
- Singular names (use `_count` suffix instead of pluralization)
- Nested hierarchy: `myapp.order.processing.duration`
- Define units (s, ms, {item}, {connection})
- Avoid technical suffixes (`_counter`, `_histogram`)
- Start with pre-1.0.0 version until adoption proven

### Metric Recording Method Naming

```csharp
// ✅ CORRECT: Action/outcome-based naming, separate methods per outcome
public sealed class OrderProcessingMetrics
{
    // Event happened: describe what occurred
    public void OrderProcessingSucceeded(string orderType, TimeSpan duration)
    {
        processingDuration.Record(duration.TotalSeconds,
            new KeyValuePair<string, object?>("myapp.order_type", orderType),
            new KeyValuePair<string, object?>("outcome", "success")
        );
    }

    public void OrderProcessingFailed(string orderType, Exception exception, TimeSpan duration)
    {
        processingDuration.Record(duration.TotalSeconds,
            new KeyValuePair<string, object?>("myapp.order_type", orderType),
            new KeyValuePair<string, object?>("outcome", "failure"),
            new KeyValuePair<string, object?>("exception.type", exception.GetType().Name)
        );
    }

    public void ConnectionOpened() => connectionsOpen.Add(1);
    public void ConnectionClosed() => connectionsOpen.Add(-1);
}

// ❌ WRONG: Various naming anti-patterns
public void RecordOrderProcessingDuration(...) { } // ❌ Don't name after metric
public void RecordError(bool succeeded, Exception? ex) { } // ❌ Confusing signature
```

**Rules** (inspired by ASP.NET Core patterns):
- Name after action/outcome: `OrderProcessingSucceeded`, `RetryAttempted`, `ConnectionFailed`
- NOT after metric name: avoid `RecordXxx`, `IncrementXxx`
- Separate methods for different outcomes (avoid boolean flags + optional exceptions)
- Event-based naming for state changes: `ConnectionOpened()`, `ItemQueued()`

### Metric Dimensions

```csharp
// ✅ CORRECT: Low-cardinality, predefined dimensions
public void OrderProcessingSucceeded(string orderType, TimeSpan duration)
{
    processingDuration.Record(duration.TotalSeconds,
        new KeyValuePair<string, object?>("myapp.order_type", orderType),
        new KeyValuePair<string, object?>("myapp.region", region),
        new KeyValuePair<string, object?>("outcome", "success")
    );
}

// ❌ WRONG: High-cardinality dimensions (unbounded values cause cardinality explosion)
public void OrderFailed(string orderId, string exceptionMessage)
{
    failureCount.Add(1,
        new KeyValuePair<string, object?>("order_id", orderId),               // ❌ Unbounded
        new KeyValuePair<string, object?>("exception_message", exceptionMessage) // ❌ Unbounded
    );
}
```

**Rules**:
- Dimensions MUST be predefined at instrument creation
- Avoid dynamic/unbounded values (causes cardinality explosion: each unique value creates a new time series row)
- High-cardinality dimensions MUST be opt-in configuration
- Use low-cardinality identifiers: item type, queue name, outcome
- Consistent dimension names across components: `myapp.region` means same thing everywhere
- Avoid sensitive data
- Consider [metric enrichment alternatives](https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/metrics#metrics-enrichment)
- Users can enable [metric exemplars](https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/metrics#metrics-correlation) for correlation (not through dimensions)

## Performance Requirements

Instrumentation MUST be cheap by default. Follow these rules to minimize overhead:

### Zero-Allocation Fast Path

```csharp
// ✅ CORRECT: Guard with cheap checks
if (ActivitySource.HasListeners())
{
    using var activity = ActivitySource.StartActivity("Operation");
    // ... expensive work
}

// ✅ CORRECT: Use TagList (struct) for metrics
var tags = new TagList
{
    { "myapp.order_type", orderType },
    { "outcome", "success" }
};
counter.Add(1, tags);
```

### Timing

```csharp
// ✅ CORRECT: Timestamp math (no allocation)
var startTime = Stopwatch.GetTimestamp();
try
{
    await ProcessAsync();
}
finally
{
    var duration = Stopwatch.GetElapsedTime(startTime);
    metrics.OrderProcessingSucceeded(orderType, duration);
}

// ❌ WRONG: Allocates Stopwatch object
var stopwatch = Stopwatch.StartNew(); // ❌ Allocates

// ❌ WRONG: IDisposable timing class (allocates per use)
using (new MetricScope(metrics, "ProcessOrder")) // ❌ BAD
{
    ProcessOrder();
}
```

### Avoid Hidden Allocations

```csharp
// ❌ WRONG: String interpolation allocates
activity?.SetTag("item", $"Processing {itemId}"); // ❌ Allocates

// ✅ CORRECT: Check IsAllDataRequested first
if (activity?.IsAllDataRequested == true)
{
    activity.SetTag("item", $"Processing {itemId}");
}

// ❌ WRONG: LINQ allocates enumerators
activity?.SetTag("handlers", handlers.Select(h => h.Name).ToArray()); // ❌ Bad

// ✅ CORRECT: Manual construction or check first
if (activity?.IsAllDataRequested == true)
{
    activity.SetTag("handlers", string.Join(",", handlers.Select(h => h.Name)));
}
```

**Rules**:
- No `Stopwatch.StartNew()` (use timestamp math)
- No timing `IDisposable` wrappers as classes
- Prefer `TagList` (struct) over arrays/dictionaries
- No hidden work: avoid LINQ, string interpolation, async state machines in hot paths

## Testing Requirements

### Span Tests

```csharp
[Test]
public async Task Should_create_processing_span_with_correct_parent()
{
    // Arrange
    using var parent = new Activity("Parent").Start();

    // Act
    await handler.Handle(item);

    // Assert
    var processingSpan = recordedActivities.Single(a => a.OperationName == "ProcessItem");
    Assert.AreEqual(parent.Id, processingSpan.ParentId);
    Assert.AreEqual("myapp.item_type", processingSpan.Tags.First().Key);
}

[Test]
public void Should_not_introduce_breaking_changes_to_span_names()
{
    // Ensures string values in span names are under test
    Assert.AreEqual("ProcessItem", MyFeature.SpanName);
}
```

**Rules**:
- Test which spans activities connect to
- Test string values (span names, tag names) to prevent breaking changes
- Remember: telemetry is part of public API

## Versioning

- Telemetry versioning decoupled from package version
- Use SemVer semantics
- Traces and Metrics use separate versions (evolve independently)
- Start with pre-1.0.0 version until adoption/usefulness proven

```csharp
private static readonly ActivitySource ActivitySource = new("MyApp.MyComponent", "0.9.0");
private readonly Meter meter = new("MyApp.MyComponent", "0.8.0");
```

## References

- [OpenTelemetry .NET Trace Documentation](https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/trace)
- [OpenTelemetry .NET Metrics Documentation](https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/metrics)
- [OpenTelemetry Semantic Conventions](https://opentelemetry.io/docs/concepts/semantic-conventions/)
- [Microsoft Distributed Tracing Instrumentation](https://docs.microsoft.com/en-us/dotnet/core/diagnostics/distributed-tracing-instrumentation-walkthroughs)
- [ASP.NET Core Metrics Examples](https://github.com/search?q=repo%3Adotnet%2Faspnetcore+Metrics&type=code)
- [OpenTelemetry Trace API Span Definition](https://opentelemetry.io/docs/specs/otel/trace/api/#span)
- [OpenTelemetry Exception Conventions](https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/exceptions/)
- [OpenTelemetry Attribute Specification](https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/common#attribute)
- [OpenTelemetry Cardinality Limits](https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/docs/metrics/README.md#cardinality-limits)

Overview

This skill provides pragmatic guidance for implementing OpenTelemetry instrumentation in .NET projects. It covers trace (Activity/Span) and metric design, naming conventions, error handling, performance guardians, and telemetry API surface considerations. The guidance is focused on safe, low-overhead instrumentation that integrates with OpenTelemetry semantic conventions.

How this skill works

The skill inspects common .NET telemetry patterns and prescribes concrete rules: how to create and name ActivitySources, when to start Activities, how to set attributes and events, and how to design Meter-based metrics. It emphasizes cheap fast-path checks (e.g., ActivitySource.HasListeners, activity.IsAllDataRequested), null-safety, and low-cardinality metric dimensions. It also outlines error recording, status semantics, and API ergonomics for telemetry surfaces.

When to use it

Adding OpenTelemetry instrumentation to an existing .NET codebase
Creating or reviewing ActivitySource and Meter definitions
Designing public telemetry APIs for libraries or components
Hardening telemetry for production performance and resiliency
Defining metric dimensions and ensuring low cardinality

Best practices

Guard all diagnostic code so it cannot throw or affect application logic; assume Activity can be null
Check ActivitySource.HasListeners() and activity.IsAllDataRequested before doing allocations or expensive work
Name ActivitySources and Activities with stable, compile-time identifiers; use DisplayName for runtime specifics
Follow OpenTelemetry semantic conventions and prefer standard attributes over custom ones; attributes must be non-null, non-empty
Keep metric dimensions low-cardinality and predefined; make high-cardinality dimensions opt-in
Measure durations with timestamp math (Stopwatch.GetTimestamp) and avoid per-operation allocations

Example use cases

Instrument a library with a primary ActivitySource and a separate Detailed ActivitySource for opt-in traces
Record order processing duration with a Histogram and low-cardinality dimensions (order_type, region, outcome)
Implement safe exception recording: set ActivityStatusCode.Error, add otel.status_* tags, and add exception events
Add counters for connection lifecycle using explicit methods like ConnectionOpened/ConnectionClosed
Audit telemetry code for hidden allocations: string interpolation, LINQ, or disposable timing helpers

FAQ

Should Activities ever include runtime identifiers in the operation name?

No. Operation names must be stable and not contain runtime data; use DisplayName or tags for specifics.

How do I avoid metric cardinality explosion?

Only use low-cardinality, predefined dimensions. Make any high-cardinality label opt-in and avoid user or object IDs as metric labels.