In todayās world, scalability is a common challenge that most of us face when developing applications. To scale out and build easily manageable services, we often break down a system’s responsibilities into multiple microservices. In a microservices architecture, each service manages its own database, and the type of database can differ between services. This diversity complicates implementing a two-phase commit, and in many cases, services donāt always require strong consistency.
Letās explore this issue using an example of an e-commerce platform, where we might have an order service and an inventory service. When a user places an order, the order service creates an entry in its database and needs to update the product inventory once the payment is successfully processed.
Since these are two separate services, potentially managed by different teams, one might use a relational database like PostgreSQL, while the other could rely on a NoSQL database like MongoDB.
When placing an order, we know the operation must be handled as a transaction. The order cannot be placed without updating the inventory, and the inventory cannot be updated without placing the order.
Before going further, we need to understand what is transaction? Well, transaction is a sequence of operations performed as a single logical unit of work, ensuring atomicity, consistency, isolation and durability, either complete success or full rollback. These properties are available in relational databases like MySQL and PostgreSQL to maintain data consistency. ACID supported relational databases generally uses 2PC (two phase commit) to ensure strong consistency. However, in distributed systems this is more complex and harder to achieve.
To manage transactions in distributed systems, we can utilize the Saga pattern. In our previous article, we explored how distributed services interact through Choreography and Orchestration , as well as how to ensure data integrity and consistency using the Outbox pattern.
The Saga pattern can be implemented in two ways: one approach involves a central orchestrator managing the transaction lifecycle, while the other relies on choreography. Letās delve into both approaches of the Saga pattern using real-life examples.
Saga Orchestration
Saga orchestration is a pattern used to manage transactions that span multiple microservices. Instead of relying on traditional distributed transactions (which are difficult to implement in microservices due to their independence), a saga splits the transaction into smaller, local transactions. Each service performs its task and then informs a central Orchestrator, which coordinates the workflow.
If one service fails, the orchestrator triggers compensating actions to undo the work of previous services, ensuring consistency across the system. This rollback mechanism is essential in ensuring the system does not leave the platform in an inconsistent state when something goes wrong.
In an e-commerce system, the process to place an order spans multiple services. The Order Service handles the order placement, followed by the Payment Service for processing payment. Once the payment is successful, the Inventory Service updates stock, and finally, the Notification Service sends an email to inform the user about the order status.
These services need to interact in a sequence to complete an order, and if something fails (for example: Inventory service failed), the system needs to gracefully roll back the transaction. Hereās how saga orchestration ensures smooth operation.
Step-by-Step Workflow:
- Order Creation
When a customer places an order, the first step in the workflow is creating the order in the system. The Order Service receives the request, creates a new order record, and marks the order as “PENDING” until the payment is processed.
Once the order is created, the Orchestrator is notified and takes control of the workflow. It then instructs the next microservice, the Payment Service, to process the payment for the order.
- Payment Processing
The Payment Service is responsible for charging the customerās payment method. This could involve processing a credit card, using a third-party payment gateway, or another form of transaction.
If the payment is successful, the Payment Service informs the Orchestrator, and the transaction continues to the next step. However, if the payment failsāperhaps due to insufficient funds or a payment gateway errorāthe orchestrator is immediately notified, and the saga begins its compensation process.
- Update inventory
Once the payment is successful, the orchestrator will update the inventory and reduce the product stock quantity.
- Sending Notifications
Assuming the previous steps succeeded, the next step is to notify the customer that their order has been successfully placed. The Orchestrator instructs the Notification Service to send an order confirmation email or SMS to the customer.
This step completes the transaction. Once the notification is sent, the orchestrator updates the status of the order from “PENDING” to “COMPLETED,” and the saga ends successfully. Note: we can update the state to ‘COMPLETED’ based on the previous step as notification can be optional in terms of this transaction.
Handling Failures and Rollbacks
Failures in any distributed system are inevitable. With the saga orchestration pattern, handling these failures becomes much more manageable. Letās explore what happens when things donāt go as planned.
Payment Failure: If the Payment Service fails to process the payment (due to a technical issue or insufficient funds), the orchestrator will initiate the compensation process. This means the Order Service will be asked to cancel the order, update its status to “CANCELED,” this leads to trigger compensationary transaction C2->C1
.
The customer is not charged, and no notification is sent since the order did not go through. The orchestrator logs the failure, ensuring that the platform is aware of the unsuccessful transaction.
Inventory Failure: In the event that the Inventory Service fails to reduce the stock, it triggers a compensation process (C3) that cascades through the transaction, leading to C2 (refund payment) and C1 (cancel order). However, one crucial point to keep in mind is that each service must implement a retry mechanism. This ensures that temporary issues, such as network glitches or momentary downtime, do not result in immediate failure. By retrying, services can attempt to complete their tasks before reporting a failure, minimizing unnecessary rollbacks and ensuring smoother transaction flow.
Notification Failure: If the Notification Service fails (e.g., due to an issue with the email provider), the orchestrator might not need to roll back the entire transaction. Instead, it can log the failure and notify the system administrator that the customer wasnāt informed of the order. This is a non-critical error that can be handled separately from the core transaction.
FSM allows us to define each stateāsuch as order placement, payment processing, inventory update, and notificationāand map the transitions between them. These transitions are triggered by inputs like Success or Failure at each step. For example, if payment is successful, the FSM moves to the inventory update step; if a failure occurs, it triggers a transition to the appropriate compensation actions. This structured approach helps manage complex workflows efficiently.Can we effectively manage these
states
and their transitions
based on different inputs? Yes, By modeling the process using a Finite State Machine (FSM).Current State Input/Condition Next State Action Order Created (T1)
Success Payment Processing (T2)
Proceed to payment processing Order Created (T1)
Failure Compensation (C1)
Cancel the order (C1) Payment Processing (T2)
Success Inventory Update (T3)
Proceed to inventory update Payment Processing (T2)
Failure Compensation (C2, C1)
Refund payment (C2) and cancel the order (C1) Inventory Update (T3)
Success Completion
Complete the order Inventory Update (T3)
Failure Compensation (C3, C2, C1)
Restore inventory (C3), refund payment (C2), cancel the order (C1)
Saga Choreography
In the Saga choreography pattern, there is no central orchestrator or coordinator to control the flow of transactions. Instead, services communicate through a message queue or an event bus. Each service listens for specific events or topics and reacts accordingly. Once a service completes its task, it publishes an event or command to signal the next service to continue the process.
This decentralized approach allows each service to handle its own part of the transaction independently. For instance, after the Order Service creates an order, it publishes an event. The Payment Service listens for that event, processes the payment, and then publishes another event for the Inventory Service to update stock. The flow continues in this manner, with each service both reacting to and publishing events to move the transaction forward.
Letās break down the step-by-step flow of choreography using the Order, Payment, and Inventory services. Each service communicates through events without a central orchestrator, making this an event-driven transaction management system.
Step-by-Step Workflow:
- Order Creation
Customer places an order and the Order Service processes the order and creates an entry for it. Once the order is successfully created, the service publishes an event called ORDER_CREATED
to notify other services.
- Payment Processing
The Payment Service listens for the ORDER_CREATED
event. Upon receiving it, the Payment Service initiates the payment process (e.g., charging the customerās credit card). If the payment is successful, the Payment Service publishes a PAYMENT_COMPLETED
event, otherwise the service publishes a PAYMENT_FAILED
event, which can trigger a rollback (e.g., cancel the order).
- Inventory Update
The Inventory Service listens for the PAYMENT_COMPLETED
event. When it receives this event, it reduces the stock of the items in the order.
If the stock is successfully updated, the Inventory Service publishes an STOCK_UPDATED
event to continue the transaction flow. If the stock update fails (e.g., insufficient stock), it publishes a STOCK_UPDATE_FAILED
event. This event can trigger compensating actions like issuing a refund and canceling the order.
- Sening Notifications
The Notification Service listens for the STOCK_UPDATED
event. When it receives the event, it sends a confirmation email to the customer, notifying them that their order is complete and ready for shipment.
If there were earlier failures (e.g., payment or inventory update failures), the Notification Service can also listen to failure events like PAYMENT_FAILED
, STOCK_UPDATE_FAILED
or only ORDER_FAILED
, notifying the customer about the failure and status of their order.
Handling Failures and Rollbacks
If a failure occurs at any step, such as payment failure or inventory update failure, compensating transactions are triggered via published failure events:
If the Payment Service fails to process the payment, it publishes a PAYMENT_FAILED
event. The Order Service listens to this event and cancels the order.
If the Inventory Service cannot update the stock, it publishes an STOCK_UPDATE_FAILED
event, which triggers a refund in the Payment Service and order cancellation in the Order Service.
Trade-offs Between Saga Orchestration and Choreography
When designing distributed systems, choosing between saga orchestration and saga choreography depends on various factors such as complexity, performance, and the flexibility of your architecture.
Orchestration offers centralized control, making it easier to manage complex workflows, but this can lead to tighter coupling and potential bottlenecks.
Choreography promotes loose coupling and flexibility, which allows for better scalability and resilience but increases the complexity of managing distributed events and tracking the workflow. Your decision should be based on the specific requirements of your system, including how critical centralized control, flexibility, and scalability are to your applicationās success.
For deeper understanding of the interaction mechanisms in distributed systems, please go through my previous article on choreography and orchestration