home / skills / aaronontheweb / dotnet-skills / akka-best-practices

akka-best-practices skill

/skills/akka-best-practices

This skill helps .NET developers implement robust Akka.NET patterns for event streams, supervision, error handling, DI, and testable cluster abstractions.

npx playbooks add skill aaronontheweb/dotnet-skills --skill akka-best-practices

Review the files below or copy the command above to add this skill to your agents.

Files (4)
SKILL.md
13.0 KB
---
name: akka-net-best-practices
description: Critical Akka.NET best practices including EventStream vs DistributedPubSub, supervision strategies, error handling, Props vs DependencyResolver, work distribution patterns, and cluster/local mode abstractions for testability.
invocable: false
---

# Akka.NET Best Practices

## When to Use This Skill

Use this skill when:
- Designing actor communication patterns
- Deciding between EventStream and DistributedPubSub
- Implementing error handling in actors
- Understanding supervision strategies
- Choosing between Props patterns and DependencyResolver
- Designing work distribution across nodes
- Creating testable actor systems that can run with or without cluster infrastructure
- Abstracting over Cluster Sharding for local testing scenarios

## Reference Files

- [work-distribution-patterns.md](work-distribution-patterns.md): Database queues, Akka.Streams throttling, outbox pattern
- [cluster-local-abstractions.md](cluster-local-abstractions.md): GenericChildPerEntityParent, IPubSubMediator, execution mode wiring
- [async-cancellation-patterns.md](async-cancellation-patterns.md): Actor-scoped CancellationToken, linked CTS, timeout handling

---

## 1. EventStream vs DistributedPubSub

### Critical: EventStream is LOCAL ONLY

`Context.System.EventStream` is **local to a single ActorSystem process**. It does NOT work across cluster nodes.

```csharp
// BAD: This only works on a single server
// When you add a second server, subscribers on server 2 won't receive events from server 1
Context.System.EventStream.Subscribe(Self, typeof(PostCreated));
Context.System.EventStream.Publish(new PostCreated(postId, authorId));
```

**When EventStream is appropriate:**
- Logging and diagnostics within a single process
- Local event bus for truly single-process applications
- Development/testing scenarios

### Use DistributedPubSub for Multi-Node

For events that must reach actors across multiple cluster nodes, use `Akka.Cluster.Tools.PublishSubscribe`:

```csharp
using Akka.Cluster.Tools.PublishSubscribe;

public class TimelineUpdatePublisher : ReceiveActor
{
    private readonly IActorRef _mediator;

    public TimelineUpdatePublisher()
    {
        // Get the DistributedPubSub mediator
        _mediator = DistributedPubSub.Get(Context.System).Mediator;

        Receive<PublishTimelineUpdate>(msg =>
        {
            // Publish to a topic - reaches all subscribers across all nodes
            _mediator.Tell(new Publish($"timeline:{msg.UserId}", msg.Update));
        });
    }
}
```

### Akka.Hosting Configuration for DistributedPubSub

```csharp
builder.WithDistributedPubSub(role: null); // Available on all roles, or specify a role
```

### Topic Design Patterns

| Pattern | Topic Format | Use Case |
|---------|--------------|----------|
| Per-user | `timeline:{userId}` | Timeline updates, notifications |
| Per-entity | `post:{postId}` | Post engagement updates |
| Broadcast | `system:announcements` | System-wide notifications |
| Role-based | `workers:rss-poller` | Work distribution |

---

## 2. Supervision Strategies

### Key Clarification: Supervision is for CHILDREN

A supervision strategy defined on an actor dictates **how that actor supervises its children**, NOT how the actor itself is supervised.

```csharp
public class ParentActor : ReceiveActor
{
    // This strategy applies to children of ParentActor, NOT to ParentActor itself
    protected override SupervisorStrategy SupervisorStrategy()
    {
        return new OneForOneStrategy(
            maxNrOfRetries: 10,
            withinTimeRange: TimeSpan.FromSeconds(30),
            decider: ex => ex switch
            {
                ArithmeticException => Directive.Resume,
                NullReferenceException => Directive.Restart,
                ArgumentException => Directive.Stop,
                _ => Directive.Escalate
            });
    }
}
```

### Default Supervision Strategy

The default `OneForOneStrategy` already includes rate limiting:
- **10 restarts within 1 second** = actor is permanently stopped
- This prevents infinite restart loops

**You rarely need a custom strategy** unless you have specific requirements.

### When to Define Custom Supervision

**Good reasons:**
- Actor throws exceptions indicating irrecoverable state corruption -> Restart
- Actor throws exceptions that should NOT cause restart (expected failures) -> Resume
- Child failures should affect siblings -> Use `AllForOneStrategy`
- Need different retry limits than the default

**Bad reasons:**
- "Just to be safe" - the default is already safe
- Don't understand what the actor does - understand it first

---

## 3. Error Handling: Supervision vs Try-Catch

### When to Use Try-Catch (Most Cases)

**Use try-catch when:**
- The failure is **expected** (network timeout, invalid input, external service down)
- You know **exactly why** the exception occurred
- You can handle it **gracefully** (retry, return error response, log and continue)
- Restarting would **not help** (same error would occur again)

```csharp
public class RssFeedPollerActor : ReceiveActor
{
    public RssFeedPollerActor()
    {
        ReceiveAsync<PollFeed>(async msg =>
        {
            try
            {
                var feed = await _httpClient.GetStringAsync(msg.FeedUrl);
                var items = ParseFeed(feed);
                // Process items...
            }
            catch (HttpRequestException ex)
            {
                // Expected failure - log and schedule retry
                _log.Warning("Feed {Url} unavailable: {Error}", msg.FeedUrl, ex.Message);
                Context.System.Scheduler.ScheduleTellOnce(
                    TimeSpan.FromMinutes(5), Self, msg, Self);
            }
            catch (XmlException ex)
            {
                // Invalid feed format - log and mark as bad
                _log.Error("Feed {Url} has invalid format: {Error}", msg.FeedUrl, ex.Message);
                Sender.Tell(new FeedPollResult.InvalidFormat(msg.FeedUrl));
            }
        });
    }
}
```

### When to Let Supervision Handle It

**Let exceptions propagate (trigger supervision) when:**
- You have **no idea** why the exception occurred
- The actor's **state might be corrupt**
- A **restart would help** (fresh state, reconnect resources)
- It's a **programming error** (NullReferenceException, InvalidOperationException from bad logic)

### Anti-Pattern: Swallowing Unknown Exceptions

```csharp
// BAD: Swallowing exceptions hides problems
catch (Exception ex)
{
    _log.Error(ex, "Error processing work");
    // Actor continues with potentially corrupt state
}

// GOOD: Handle known exceptions, let unknown ones propagate
catch (HttpRequestException ex)
{
    // Known, expected failure - handle gracefully
    _log.Warning("HTTP request failed: {Error}", ex.Message);
    Sender.Tell(new WorkResult.TransientFailure());
}
// Unknown exceptions propagate to supervision
```

---

## 4. Props vs DependencyResolver

### When to Use Plain Props

**Use `Props.Create()` when:**
- Actor doesn't need `IServiceProvider` or `IRequiredActor<T>`
- All dependencies can be passed via constructor
- Actor is simple and self-contained

```csharp
// Simple actor with no DI needs
public static Props Props(PostId postId, IPostWriteStore store)
    => Akka.Actor.Props.Create(() => new PostEngagementActor(postId, store));
```

### When to Use DependencyResolver

**Use `resolver.Props<T>()` when:**
- Actor needs `IServiceProvider` to create scoped services
- Actor uses `IRequiredActor<T>` to get references to other actors
- Actor has many dependencies that are already in DI container

```csharp
// Registration with DI
builder.WithActors((system, registry, resolver) =>
{
    var actor = system.ActorOf(resolver.Props<OrderProcessorActor>(), "order-processor");
    registry.Register<OrderProcessorActor>(actor);
});
```

### Remote Deployment Considerations

**You almost never need remote deployment.** If you're not doing remote deployment (and you probably aren't):
- `Props.Create(() => new Actor(...))` with closures is fine
- The "serialization issue" warning doesn't apply

For most applications, use **cluster sharding** instead of remote deployment - it handles distribution automatically.

---

## 5. Work Distribution Patterns

When you have many background jobs (RSS feeds, email sending, etc.), don't process them all at once - this causes thundering herd problems.

**Three patterns to solve this:**
1. **Database-Driven Work Queue** - Use `FOR UPDATE SKIP LOCKED` for natural cross-node distribution
2. **Akka.Streams Rate Limiting** - Throttle processing within a single node
3. **Durable Queue (Outbox Pattern)** - Database-backed outbox for reliable processing

See [work-distribution-patterns.md](work-distribution-patterns.md) for full code samples.

---

## 6. Common Mistakes Summary

| Mistake | Why It's Wrong | Fix |
|---------|----------------|-----|
| Using EventStream for cross-node pub/sub | EventStream is local only | Use DistributedPubSub |
| Defining supervision to "protect" an actor | Supervision protects children | Understand the hierarchy |
| Catching all exceptions | Hides bugs, corrupts state | Only catch expected errors |
| Always using DependencyResolver | Adds unnecessary complexity | Use plain Props when possible |
| Processing all background jobs at once | Thundering herd, resource exhaustion | Use database queue + rate limiting |
| Throwing exceptions for expected failures | Triggers unnecessary restarts | Return result types, use messaging |

---

## 7. Quick Reference

### Communication Pattern Decision Tree

```
Need to communicate between actors?
├── Same process only? -> EventStream is fine
├── Across cluster nodes?
│   ├── Point-to-point? -> Use ActorSelection or known IActorRef
│   └── Pub/sub? -> Use DistributedPubSub
└── Fire-and-forget to external system? -> Consider outbox pattern
```

### Error Handling Decision Tree

```
Exception occurred in actor?
├── Expected failure (HTTP timeout, invalid input)?
│   └── Try-catch, handle gracefully, continue
├── State might be corrupt?
│   └── Let supervision restart
├── Unknown cause?
│   └── Let supervision restart
└── Programming error (null ref, bad logic)?
    └── Let supervision restart, fix the bug
```

### Props Decision Tree

```
Creating actor Props?
├── Actor needs IServiceProvider?
│   └── Use resolver.Props<T>()
├── Actor needs IRequiredActor<T>?
│   └── Use resolver.Props<T>()
├── Simple actor with constructor params?
│   └── Use Props.Create(() => new Actor(...))
└── Remote deployment needed?
    └── Probably not - use cluster sharding instead
```

---

## 8. Cluster/Local Mode Abstractions

For applications that need to run both in clustered production and local/test environments, use abstraction patterns to toggle between implementations:

- **`AkkaExecutionMode` enum** - Controls which implementations are used (LocalTest vs Clustered)
- **`GenericChildPerEntityParent`** - Mimics sharding behavior locally using the same `IMessageExtractor`
- **`IPubSubMediator`** - Abstracts DistributedPubSub for swappable local/cluster implementations

See [cluster-local-abstractions.md](cluster-local-abstractions.md) for complete implementation code.

---

## 9. Actor Logging

### Use ILoggingAdapter, Not ILogger<T>

In actors, use `ILoggingAdapter` from `Context.GetLogger()` instead of DI-injected `ILogger<T>`:

```csharp
public class MyActor : ReceiveActor
{
    private readonly ILoggingAdapter _log = Context.GetLogger();

    public MyActor()
    {
        Receive<MyMessage>(msg =>
        {
            _log.Info("Processing message for user {UserId}", msg.UserId);
            _log.Error(ex, "Failed to process {MessageType}", msg.GetType().Name);
        });
    }
}
```

**Why ILoggingAdapter:**
- Integrates with Akka's logging pipeline and supervision
- Supports semantic/structured logging as of v1.5.57
- Method names: `Info()`, `Debug()`, `Warning()`, `Error()` (not `Log*` variants)
- No DI required - obtained directly from actor context

**Don't inject ILogger<T> into actors** - it bypasses Akka's logging infrastructure.

### Semantic Logging (v1.5.57+)

```csharp
// Named placeholders for better log aggregation and querying
_log.Info("Order {OrderId} processed for customer {CustomerId}", order.Id, order.CustomerId);

// Prefer named placeholders over positional
// Good: {OrderId}, {CustomerId}
// Avoid: {0}, {1}
```

---

## 10. Managing Async Operations with CancellationToken

When actors launch async operations via `PipeTo`, those operations can outlive the actor if not properly managed. Key practices:

- **Actor CTS in PostStop** - Always cancel and dispose in `PostStop()`
- **New CTS per operation** - Cancel previous before starting new work
- **Pass token everywhere** - EF Core queries, HTTP calls, etc.
- **Linked CTS for timeouts** - External calls get short timeouts to prevent hanging
- **Graceful handling** - Distinguish timeout vs shutdown in catch blocks

See [async-cancellation-patterns.md](async-cancellation-patterns.md) for complete implementation code.

Overview

This skill captures critical Akka.NET best practices for building reliable, testable actor systems. It focuses on choosing the right pub/sub mechanism, supervision and error-handling patterns, Props vs DependencyResolver trade-offs, robust work-distribution approaches, and techniques to make cluster features testable locally. The guidance is pragmatic and oriented toward real production scenarios.

How this skill works

The content explains when to use Context.System.EventStream versus DistributedPubSub, how supervision strategies apply to children, and when to handle exceptions inside actors versus letting supervision restart them. It outlines Props usage and when to inject IServiceProvider via DependencyResolver. It also presents scalable work distribution patterns (database-driven queues, Akka.Streams throttling) and approaches to abstract cluster sharding for local tests.

When to use it

  • Use EventStream only for single-process logging, diagnostics, or dev/testing scenarios.
  • Use DistributedPubSub when events must be delivered across multiple cluster nodes.
  • Handle expected, recoverable errors with try-catch inside actors; let unknown/state-corrupt errors bubble for supervision.
  • Prefer Props.Create for simple actors; use DependencyResolver when you need scoped DI or IRequiredActor<T>.
  • Use database-driven queues or Akka.Streams to avoid thundering-herd problems when polling many jobs.
  • Abstract cluster features so the same logic runs in both cluster and local test mode.

Best practices

  • Remember supervision directives apply to children of the actor declaring the strategy, not to the actor itself.
  • Don’t swallow unknown exceptions—handle known, expected failures and let unexpected ones trigger supervision.
  • Prefer cluster sharding over remote-deployment; remote deployment is rarely needed.
  • Use FOR UPDATE SKIP LOCKED in the DB to claim batches and naturally distribute work across nodes.
  • Throttle and limit concurrency with Akka.Streams to control local resource usage and avoid spikes.

Example use cases

  • Publish per-user timeline updates with DistributedPubSub topic format like `timeline:{userId}` across all nodes.
  • Make an RSS-polling fleet safe by having workers claim batches from a DB and mark progress, avoiding simultaneous requests.
  • Write an OrderProcessor actor that uses DependencyResolver to create a scoped DbContext for each message.
  • Let an actor restart on unknown state-corrupt exceptions while using try-catch for transient HttpRequestException retries.
  • Run the same sharding logic locally by replacing Cluster Sharding with a lightweight in-process abstraction for unit tests.

FAQ

Can I use EventStream across cluster nodes?

No. EventStream is local to an ActorSystem process. Use DistributedPubSub for cross-node publish/subscribe.

When should I define a custom supervision strategy?

Only when you have concrete reasons—different restart/stop/resume behavior for child failure modes or sibling impact; the default strategy is safe for most cases.

Is Props.Create unsafe if I later add clustering?

Props.Create with closures is fine unless you rely on remote deployment; for clustering, prefer cluster sharding rather than remote-deploying actors.