7 Commandments for Event-Driven Architecture

Event-driven architecture is practically a requirement of the modern web stack. Coordinating microservices–nevermind serverless functions or an IOT mesh–via a stream of events minimizes interdependence and puts a clear vocabulary around service-to-service interactions. Systems get the resilience and scalability of minimally-dependent parts. Developers get a way to break big, complex problems down into reasonably isolated parts.

But with more moving parts and looser contracts between them, those nice, simple services tend to mean more complexity for the system as a whole. As you’re laying the foundation for an event-driven architecture, following a few rules can help keep things manageable.

1. Assume nothing

Event-driven architectures promise to reduce local complexity, but distributed applications are still hard. Sure, each service might be simple. The system will still surprise you. Services will drop, well, out of service. Events will be delivered out of order. And off schedule. Plan accordingly.

2. Practice ignorance

A service that can’t see its downstream brokers and consumers can’t tailor its event definitions or payloads to specific use-cases. Similarly, a consumer with no knowledge of when or where an event was produced will necessarily be better isolated, more resilient, and easier to change than one tightly coupled to a producer upstream.

If a transaction-processing service knows that a downstream service will need to send receipts, it’s tempting to tailor an event to suit:

{
  "type": "DEPOSIT",
  "user": "abc@example.org",
  "message": "$100 left with MegaBank. Thanks for your payment!"
}

This is convenient for sending receipts, but much less useful to a service responsible for flagging potentially fraudulent transactions. Better to relate what happened in relatively generic terms:

{
  "type": "DEPOSIT",
  "user": "abc@example.org",
  "amount": "100 USD",
  "institution": "MegaBank",
}

3. Once defined, always defined

Maintaining an append-only registry ensures that event types are clearly defined in the present–but also that any legacy events lingering in forgotten logs or data stores will still have meaning.

The original DEPOSIT event references users by email, but there are advantages to referencing a system-specific userId instead. It’s tempting to switch DEPOSIT to use the new field:

{
  "type": "DEPOSIT",
  "userId": "abc-123",
  ...
}

Unfortunately, any downstream processors now must be updated immediately to expect the new format. A safer option is to preserve the old definition and simply provide a new one corresponding to the changed event:

{
  "type": "DEPOSIT.1",
  "userId": "abc-123",
  ...
}

4. Define events globally

Sharing an event registry across the entire service ecosystem (either through common code or via namespaces and a clear discovery mechanism) saves collisions and simplifies discovery for interested downstream services.

5. Keep things simple

Event definitions should be clear, unambiguous, and utterly unimaginative as possible. A "DEPOSIT" event might describe a cash deposit, a credit card payment, or an ACH transfer, which could of course be explained via a metadata field:

{
  "type": "DEPOSIT.2",
  "userId": "abc-123",
  "source": "cash",
  ...
}

Are payments from each source separate events in the service lifecycle, though? If so, more nuanced event types can make it easier for brokers and consumers downstream to decide what (and how) each should be processed.

{
  "type": "DEPOSIT_CASH.0",
  "userId": "abc-123",
  ...
}

6. Uniquely identify everything

Systems fail, and a reasonable robust delivery system will almost guarantee repeats. Some consumers may not mind. But providing a unique identifier at creation time is an absolute requirement for consumers that should discard already-processed events. Besides, it’s good for auditing!

7. Minimize state in-flight

Without knowing how or when an event will be consumed, assume any state inside it is stale. Events should contain only a minimal description of what happened–references as relevant, and perhaps the magnitude of the change itself. For instance, the bank balance at the moment that cash is withdrawn may be ambiguous (or worse) when the withdrawal is processed downstream.

{
  "type": "WITHDRAW_CASH.0",
  "userId": "abc-123",
  "source": "cash",
  "amount": "100 USD",
  "balance": "1000 USD"
}

Better to only indicate the amount–or even just the userId–and let downstream services that need the balance request it.

There are situations where it may be appropriate (or in the case of event-sourcing, crucial) for events to incorporate non-trivial state. But where a reference to an ID or URL can suffice, it usually means fewer sources of “truth” for a consumer to sift through later.

These are just the ground rules! There’s still loads of nuance to event-driven systems, but striving for good event design will save on heartache down the line.


Many thanks to Charles Ruhland for early feedback on this post, and for many insightful conversations over the years!

Featured