Events as facts - trade-offs between resilience and consistency
In architecting a system, a question that you often have to ask is how much information should be included in an event? The more information you include, the more the services that receive the event will be to be able to process it without looking up information from other services, which will make your system more resilient and scalable. For example, if a user submits an order, the system might publish an order submitted event. But should it include the list of items that were ordered in that event? If it does, then services consuming that event will not need to go back to the order service to find out what items the order has, they’ll be able to process that event in isolation from the order service and in doing so be more resilient.
On the other hand, if you include too much information in your event, your event becomes very large, which will impact the throughput of your messaging infrastructure and the cost of processing events. While this is true, it’s tempting to consider the problem as primarily a resilience vs throughput problem, and this is how I used to think of it. However, I no longer think of that as the major trade off.
One way to view events is to see them as facts. The words event and fact have very similar meanings, and can often be used interchangeably. An event is something that happened (or will happen, but in the context of systems architecture we always use it to refer to something that has happened). A fact is something that happened. However they have a different focus, and consequently when we think of events vs thinking of facts, we apply slightly different thought processes and constraints. When referring to something as an event, the emphasis is placed on what happened. When referring to something as a fact, the emphasis is placed on the truth of the thing that happened.
This subtle difference can change the way we think about the messages that convey this information. A fact is indisputable, so if an event is a fact, then it should only contain information that is indisputable. The user submitted the order at this time. That is indisputable, it is a fact that the user did that. The order contains these items. That is disputable, the order may have been changed since it was submitted due to the availability of items in it changing.
The property of our system that is in question here is consistency. If we treat all events as facts, containing only indisputable information, then we improve the consistency of our system. A service can’t make the mistake of trusting information in an event that might later become false, since the event won’t contain that information. And this consistency issue is, in general, a bigger issue than the throughput of your message broker. There are many things that can be tuned to increase throughput, but addressing inconsistency requires a lot more than just tuning.
And so I’ve realised that the biggest concerns when deciding how much information to include in events are resilience and consistency. To include more information beyond the fact that happened is to increase resilience at the cost of consistency.
Of course, it is a trade-off, and one that only the business requirements can decide where the appropriate balance lies. If I have an email notification service that is subscribing to order events, and it needs the list of order items in order to render an email notification, it is fine for it to simply read the items from the event to generate the email to send.
On the other hand this depending on the items from the event will cause a problem for the inventory service, which needs to adjust its inventory levels according to what the order contains when it’s eventually dispatched. There are two ways this could be addressed while still maintaining resilience - the inventory items could receive all events regarding changes to items in the order from the start, through to when the order is submitted, and right through to dispatch. Or it may consider the order submitted event with its list the starting state of the order, and subscribe to subsequent change events for the order.