🏠

How To Achieve Bulletproof Microservice Communication

April 24, 2025

Utilizing microservices in order to decouple a backend system into individually scalable and customizable applications is an effective method to build sustainable infrastructure. Pair this with Dockerized services and an orchestration platform such as Kubernetes, and the infrastructure is robust.

With all these benefits, why would anyone ever decide to go in another direction? One of the biggest challenges when developing microservices is data integrity and consistency. Microservices that utilize simple HTTP requests and isolated database transactions are bound to end up in an inconsistent state sooner or later, and I want to discuss how to avoid it.

The Setup

Consider an eCommerce business where you can place orders for various different items, and have them shipped to you around the world. This business might have services for keeping track of orders, managing inventory, and validating shipping requirements. These can be called the Orders, Inventory, and Shipping microservices, respectively. When a user places an order, an Order Entity is entered into the database as a record in a pending state. Afterwards, a message is sent to the Inventory service to check whether there is enough inventory of the particular product. The Inventory service then creates a Reservation Entity, and stores it in the database to keep track of the current status of the warehouse, after which it notifies the Orders service of its success. Then, the Orders service must reach out to the Shipping service to validate that the customer is within the allowed shipping range for the specific product. Only after which, can the Orders service notify the Inventory application to delete the Reservation Entity and corresponding inventory, and set the Order Entity status to approved. All of these steps together form a Saga, a long running, eventually consistent business process.

Queues

There are many different ways that computers can communicate with one another. One of the most popular is the HTTP protocol. With HTTP, failed requests due to networking or service outages are hard to overcome. Exponential Backoff can be used to improve upon standard usage, although it achieves little of what could be done by a message broker. A message broker typically utilizes queues in order to route requests between services. Some of the biggest improvements seen when switching over to using message queues for communication are decoupling of applications and increased durability. A message queue can be thought of like a buffer. It holds messages until the receiver is ready. In the event of a message consumption failure, it can also re-queue messages. Below is a comparison between inter-service communication between two services, the Orders service and Inventory service. Orders is attempting to send a message to Inventory, although Inventory is currently unavailable.

1. Direct HTTP Communication

Orders --❌--> Inventory

Request to reserve inventory fails, order cannot be placed

2. Queue-Based Communication

Orders --✓--> Message Queue --⏳--> Inventory

Service A continues working, queue holds message

Orders Message Queue --✓--> Inventory

Message delivered when Service B comes back online

As a side note, messages between services should be "thin", meaning they should not contain a lot of data. One of the huge benefits of microservices is that the Orders service should know a lot about order details, but the Inventory service might only need to know the product ID associated with the order. This results in a nice separation of concerns, leading to messages which only need to hold foreign references to entities owned by other services.

With queues in between services, the resilience of the entire process has been greatly improved, but it is not perfect. Consider the following scenario. The Orders service receives a message to place a pending order. It writes an Order to the database, but crashes before it can send an inventory reservation message to the queue. The order service will come back online, and have a pending Orders in the database, but no message sent out to kick off the Saga.

Outbox Pattern

An outbox is a holding station for any outgoing messages that a service needs to eventually send. This is best implemented as a database table, so that sending a message and performing necessary business logic can all occur atomically. To get an idea of how this is structured, we can use the Orders and Inventory services once again. To kick off the process, an HTTP request is routed to the Orders service to place an order. The service starts a database transaction, writes the Order Entity to the database, and then writes a message to the InventoryReservationOutbox table. The service then commits the transaction, so that both the message and the Entity are both written, or neither are. On startup, the Orders service polls the InventoryReservationOutbox table for outgoing messages every couple of seconds. Once this polling callback is triggered, all messages are queried from the outbox table, converted into JSON, and sent into the queue for the Inventory service to process at a later point. Once the message is in the queue, it is safe to be deleted from the InventoryReservationOutbox table. Most queues have configurable durability, meaning if they crash with messages inside them they are able to requeue them after coming back up again. Next, we have to take a look at something known as "at least once" delivery. This is typical of message brokers, meaning messages might be sent to consumers more than once.

Inbox Pattern

The inbox pattern, better known as the idempotent consumer pattern, utilizes a nearly identical approach to the outbox pattern, just in reverse. When using this pattern, incoming messages are written into a database table within a transaction which handles the work associated with the message itself. If the work (for example, creating an Entity and inserting into the database) succeeds, then the message is written to the inbox table. If the work fails, then we can reprocess the message at a later point. When a new message comes in, the inbox table is first checked to determine if an identical message has already been processed to avoid duplicate work. To detect duplicate messages, a unique key must be assigned to each new message. To give a concrete example, the Orders service includes the specific, auto-incrementing primary key for each specific Order Entity to the message it sends to the Inventory service. With this identifier, the Inventory service is able to detect duplicated messages. It receives the message, begins a transaction, reserves inventory, writes the incoming message to the outbox, and a response message to the InventoryReservation outbox table, to later be sent to the Order service to continue on with the Saga.

It is important to know that without this unique ID associated with each Order Entity attached to the message, it's impossible to differentiate two messages with the same contents. For example, if two customers place orders for the same product and quantity, the Inventory service would have no way to know that this message is not a duplicate.

Compensating Actions

What happens if one of the steps of our Saga fails? For example, after reserving inventory, the Shipping service says that we do not ship this particular product to the requested destination. We currently have InventoryReservation and pending Order entities in our other services, and they need to be removed. To do so, messages can be sent to each respective service to remove the corresponding resources. These messages are better known as compensating actions. For each step in a Saga that persists data or changes state, there must be a compensating action. If any step in the Saga fails, then all previously successful steps must be compensated.

It might at first seem advantageous to send out all Saga steps in parallel, to speed up the execution, although there is a problem with doing so. Let's say that the Orders service sends out a message to reserve inventory and validate shipping. At the same time. The shipping validation message comes back first, and is unsuccessful. What is there to do? The inventory reservation message cannot yet be compensated, because we don't know if it was successful or not!

Failed Implementations

Let's talk about the journey to get to the solution described above. For the code examples, I'll be using NodeJS, and NestJS, a backend framework which wraps Express. When sitting down to implement this saga, my first thought was to implement a representation of a saga in a class. It would implement a ISagaOrchestrator interface and have a list of steps, which implemented an ISagaStep interface. Below are the example classes

ISagaOrchestrator Interface

export interface ISagaOrchestrator {
  begin(): Promise;
}

ISagaStep Interface

export interface ISagaStep {
  invoke(): Promise;
  compensate(): Promise;
}

OrderSagaOrchestrator Class

export class OrderSagaOrchestrator implements ISagaOrchestrator {
  // List of steps in the saga
  constructor(private steps: ISagaStep[]) {}
  
  async begin(): Promise {
    const succeededSteps: ISagaStep[] = []
    try {
      for (const step of steps) {
        const response = await step.invoke()
        if (!response.successful) {
          await compensate()
          return
        }
        succeededSteps.push(step)
      }
    } catch (e) {
      console.error(e)
      await compensate()
    }
  }
  
  private async compensate() {
    await Promise.all(succeededSteps.map(async (step) => {
      return await step.compensate()
    }))
  }
}

ReserveInventoryStep Class

export class ReserveInventoryStep implements ISagaStep {
  // Example step, constructed with necessary data to send to queue 
  constructor(
    private rabbitMQService: RMQService,
    private productId: number,
    private quantity: number
  ) {}
  
  /* Send reserve inventory message, and wait for response. 
  Do not advance Saga until we get a response */
  invoke(): Promise {
    await this.rabbitMQService.sendReserveInventoryMessage(productId, quantity) 
    return await this.rabbitMQService.listenForReserveInventoryResponse()
  }
  
  compensate(): Promise {
    await this.rabbitMQService.compensateReserveInventoryMessage(productId, quantity)
  }
}
... other steps

At a first glance, I felt pretty good about this. Once a message came in, I would write the Order Entity and the InventoryReserve message to the database. Then, every 5 seconds I would poll the outbox for new messages. If any were found, I would create a Saga with the needed steps, add it to the activeSagas array in the service class, and simply call the begin() method to kick off the first message and delete the message from the outbox. The saga itself listened on the corresponding response queues for incoming messages. This seemed great, until I noticed a pretty big flaw.

To explain the issue, I'll give an example. Let's say that we receive two incoming HTTP requests to the Orders service, which kicks off two sagas: Saga1 and Saga2. Saga1 and Saga2 both send out messages to reserve inventory, and call the this.rabbitMQService.listenForReserveInventoryResponse() method, meaning that they both are waiting for a message to come back on the same queue. This is known as the competing consumer pattern, and can be great in some cases. A response will come to the queue, and either Saga1 or Saga2 can get the message, but only one of them. In some cases, this allows for essentially multithreading functionality. But in our scenario, it creates the possibility for a Saga to receive the responses meant for another Saga. If Saga1's inventory request was denied but Saga2's wasn't, Saga1 could receive a successful message while Saga2 received the rejected message.

How can we remedy this? We need one sort of "global" listener per response queue, and we need to be able to route it to the specific Saga it belongs to. I decided to create a map, from the unique ID of each Order Entity to saga and attach the ID to the message. When a message comes back, we will simply look up the saga in the map and call the next step.

ISagaOrchestrator Interface

export interface ISagaOrchestrator<T> {
  invokeStep(step: T): Promise<void>
  compensateStep(step: T): Promise<void>
 }

ISagaStep Interface

export interface ISagaStep {
  invoke(): Promise<void>;
  compensate(): Promise<void>;
}

OrderSagaStep Enum

export enum OrderSagaStep {
  INVENTORY_RESERVE,
  SHIPPING_VALIDATION,
  INVENTORY_REMOVE
}

OrderSagaOrchestrator Class

export class OrderSagaOrchestrator implements ISagaOrchestrator<OrderSagaStep> {
  // Map to keep track of steps in the Saga
  constructor(
    private steps: Map<OrderSagaStep, ISagaStep>, 
    private successfulSteps: ISagaStep[]
  ) {}
  
  // Invoke a singular step in the Saga, one at a time
  async invokeStep(step: OrderSagaStep): Promise<void> {
    const response = await steps.get(step).invoke()
    if (!response.successful) {
      await this.compensate()
    }
    successfulSteps.push(step)
  }
  
  // Compensate a singular step in the Saga, one at a time
  async compensateStep(step: OrderSagaStep): Promise<void> {
    const response = await steps.get(step).compensate()
    if (!response.successful) {
      throw new Error('Failed to compensate')
    }
  }
}

ReserveInventoryStep Class

export class ReserveInventoryStep implements ISagaStep {
  constructor(
    private rabbitMQService: RMQService,
    private productId: number,
    private quantity: number
  ) {}
  
  invoke(): Promise<void> {
    await this.rabbitMQService.sendReserveInventoryMessage(productId, quantity) 
    // do not await response here!
  }
  
  compensate(): Promise<void> {
    await this.rabbitMQService.compensateReserveInventoryMessage(productId, quantity)
  }
}

OrdersController Class

export class OrdersController implements ISagaStep {
  constructor(private rabbitmqService: RMQService) {}
  
  // All currently running Sagas
  private activeSagas: Map<number, ISagaOrchestrator>
  
  /* Await all InventoryReserve message responses in a "global" place, 
  and route them to appropriate saga based on orderId */
  public reserveShippingResponseHandler() {
    const response = this.rabbitmqService.getInventoryResponse()
    const relatedSaga = this.runningSagas.get(response.orderId);
    
    if (!relatedSaga) {
      throw new InternalServerErrorException(
        `No running saga is related to the most recent message received!`,
      );
    }
    
    if (response.successful) {
      // The reserve inventory step succeeded, start the next step in the saga
      relatedSaga.invokeStep(OrderSagaStep.SHIPPING_VALIDATION);
    } else {
      toRollback.forEach((step) => relatedSaga.compensateStep(OrderSagaStep.INVENTORY_RESERVE));
    }
  }
}

Finally, we reach the problem which made me completely redesign the code: An orchestrator server crash. If the orchestration server (in this case, the server hosting the Orders service) crashes with active sagas, then we have messages that are going to eventually come back to the Orders service, with no corresponding saga. This means that we have to reconstruct the Sagas each time the application restarts, so that we can handle crashes properly. This could be done by looking at all currently pending orders upon server restart and constructing a saga from them, but it was beginning to feel over-engineered. Rather, I ended up going a much simpler route. I created message handlers for each response queue, and I attached the ID of each Order Entity to every message to every service. This way, when the order service got a message, it could simply look up the Order Entity details from the database, craft the next message with the data it got from the database, and then send it to the corresponding channel.

Code

Below is a link to a GitHub repository with the code implementation for the final solution. It utilizes NestJS, and Dockerized Postgres and RabbitMQ instances. I did not configure a .env file for convenience, so please ensure the following ports are free: 3000, 3001, 3002, 1000, 1001, and 5672 to run it. I have also provided a .http file for ease of use. Repo Link