The saga pattern attempts to coordinate or manage the notion of transactional consistency across multiple services when a transaction is comprised of several distributed steps completed by each of the services in question.
For the purposes of the Saga Pattern, a transaction is an operation that may be comprised of smaller components or operations. The entirety of the transaction, including any operations comprising the transaction, must adhere to ACID properties – specifically:
- (A)tomic – the transaction including subordinate transactions must complete in its entirety or not all.
- (C)onsistent – data that is written is consistent with rules and constraints. This does NOT mean that replicas of the data need to be consistent – just that we haven’t violated any rules or constraints in writing the data within our transaction.
- (I)solated – concurrent transactions must ensure that data results in the same state as if all transactions (including this one) were written sequentially.
- (D)urability – completed transactions remain committed even in the case of failures.
With distributed systems, it is very difficult (verging on impossible) for a parent transaction comprised of multiple child transactions to ensure atomicity AND consistency of the complete parent transaction. There are two primary reasons for this:
- When employing a distributed architecture (e.g. microservices), services don’t share databases or process space; they are incapable of performing inter-process communication for coordination (or do so at great cost in terms of latency)
- Two phase commit (something AKF suggests you never use) isn’t available natively between the transactions and may not be available even when processes write to the same persistence tier.
By slightly relaxing the notion of consistency at the parent transaction level, and possibly employing an additional service to either compose or orchestrate the transaction, we can approximate ACID compliance. Below we show a high level example of a Saga pattern as both an example of what we are trying to achieve and as an anti-pattern of how to implement Saga.
Saga In Series Example
- The receiving service (SA) receives transactions, performs its portion of any calculation, prepares data to be written to DA and sends the remainder to the next service SB
- SB receives transactions from SA, performs its portion of calculations, prepares data to be written to DB and sends a request to service SC
- SN receives transactions from SN-1, performs its portion of calculations, prepares data to be written to DN
- Upon completion, SN “commits” its transaction and responds to the prior services which each also commit the transaction.
The above approach should never be implemented. It suffers from multiple problems including:
- High probability of failure vis-à-vis the calls in series anti-pattern
- High latency as a result of multiple calls in series
- Probability of inconsistency if, once SN completes, any antecedent services fail to commit
One could eliminate some, but not all these challenges by making interactions asynchronous. But inconsistency due to failure is still a large problem.
Some concerns can be fixed with either the Choreographer or Orchestrator sub-patterns.
Saga Choreographer Pattern
The Choreographer pattern for Saga attempts to leverage concepts from the event sourcing pattern to allow distributed coordination between services for transactions.
Choreographer leverages an asynchronous messaging solution, and some shared library used by each service to help coordinate (or choreograph) commits.
Benefits of Saga Choreographer
- Enables (but does not in itself create) the capability for isolation (microservices bulkheads) between services and domains.
- Eliminates issues associated with conflicting updates, but at the cost of needing to handle certain ACID characteristics (e.g. Idempotency and Consistency) within the application/service.
- Leverages asynchronous interactions, increasing a solution’s response to highly elastic demand without complete failure.
Drawbacks to Saga Choreographer
- As drawn above, the solution lacks a persistence solution (unless it exists in the messaging system) for message replay to ensure atomicity should one or more services and associated databases fai
- A library (or potentially additional service running in the background) needs to be in place to coordinate final commits through either a distributed voting algorithm or coordinator process. Cost to develop increases, ease of troubleshooting decreases
Key Usage Advice for Saga Choreographer
Given the difficulty of coordination and distributed commit/voting algorithms, we strongly recommend against Choreographer for most solutions. Solutions that may work for choreographer include:
- Lightweight transactions where the degree of distribution across services is very small (two or three services) and
- Solutions that are resilient to “some” failure of transactions being committed – where atomicity is a nice to have but not a must have.
- High frequency update systems where “updates” happen frequently relative to “reads”, thereby decreasing the cost of incorrectness and relaxing the consistency requirement.
For all other transactions, the Saga Orchestrator is likely a better solution.
Saga Orchestrator Pattern
The Saga Orchestrator Pattern adds the notion of an orchestrator or coordinator to the transaction itself. Unfortunately, this addition also includes the fan out anti-pattern. As such, it’s usage should be limited in favor of redefining domain/fault isolation boundaries such that transactions must not be coordinated across isolation zones.
- SA, the “orchestrator” service, receives requests and applies them in a rapid fashion using the event-sourcing append-only database approach.
- Simultaneously or in series, SA, the “orchestrator” service, publishes the necessary message on the “Dumb Bus”
- Subscribers, the services (B through N) accept from the bus and apply locally.
- Subscribers publish, upon completion, local commit.
- Orchestrator publishes final commit upon receipt of all local commits. Orchestrator further “cleans up” the global commits by intermittently checking for completion.
Benefits of Saga Orchestrator
- Retains benefits from the choreographer pattern for more complex solutions.
- Overcomes drawbacks associated with greater cost of development for distributed voting algorithms.
Drawbacks to Saga Orchestrator
- Introduces the fan out anti-pattern. Caution – use this approach sparingly. Consider realigning your domains and splits rather than implement this pattern.
- Requires development of the orchestrator service and a localized notion of distributed commits. This costs some additional development, though less intellectually difficult than distributed voting algorithms.
- Requires the addition of a ledger-like database for distribution of requests.
- May create a notion of synchronous behavior between the orchestrator and the client making requests, indicating that this is most useful for client requests that do not require a response.
- We strongly encourage you to never introduce blocking synchronous fan out behavior to multiple service end points given the impact to availability.
- When scale is necessary, and the number of service fan outs are high, this solution can help. Use it sparingly and attempt to never use it synchronously to the client.
Key Usage Advice for Saga Orchestrator
Whenever possible, attempt to eliminate the notion of distributed commits through realignment of service boundaries. If, after significant effort realignment isn’t possible:
- Consider Saga Choreographer for small degree service splits (N<=3) and where some data loss is acceptable.
- Consider Saga Orchestrator for larger services splits (N>3), but prefer it where interactions with the end client are “fire and forget” or responses aren’t necessary. In all other cases, understand the impact to availability given fan out.
Want to learn more? Read our Microservices Saga Example.
We have helped dozens of companies of all sizes plan and execute a strategy to remove anti-patterns in their architecture.