System operation

The way an order is made in a restaurant

Imagine you are a customer coming to a 5-star luxury restaurant to enjoy a beautiful evening. The waiter will pick you up from the door, taking you to the free table. Once settled, the waiter will bring you a menu. You take a look over the best dishes from the list, call the waiter after you're done. The waiter will then take note of what you order. Later, he passes the order to the best staff behind the kitchen, people you may never know the name. After a few minutes of waiting, you receive what you booked, enjoy everything, and pay for a pleasant experience.

At OMS

At OMS, we also serve our customers as if they are the VIPs of 5-star users. Now, let's track down a user's request to create an order. Our users operate on the client system, and a single creation request is forwarded as follows:

{
    "items": [
        {
            "sku": "1805330",
            "name": "Bộ nhớ DDR4 G.Skill 8GB (2666) F4-2666C19S-8GIS",
            "price": 1190000,
            "quantity": 1,
        }
    ],
    "service": {
        "install": false,
        "delivery": false,
        "technicalSupport": false
    },
    "customer": {
        "id": 178316,
        "name": "Le Xuan Vy",
    },
}

Our HTTP framework receives the request and calls directly to the Command
Service according to the parameters it receives. Command Service compiles the above information into Command CreateOrder and passes it into CommandGateway.

{
    "commandType": "CreateOrder",
    "aggregateType": "order",
    "data": {
        "items": [
            {
                "sku": "1805330",
                "name": "Bộ nhớ DDR4 G.Skill 8GB (2666) F4-2666C19S-8GIS",
                "price": 1190000,
                "quantity": 1,
            }
        ],
        "service": {
            "install": false,
            "delivery": false,
            "technicalSupport": false
        },
        "customer": {
            "id": 178316,
            "name": "Le Xuan Vy",
        },
    }
}

CommandGateway receives and sends data to OrderAggregate (the business process entry that deals with the aggregate Order). OrderAggregate proceeds to load the entire event equivalent to the id passed and replays the entire events to generate the order's current status. But because this is a single Command creation, OrderAggregate will only validate the content of the application is valid or not with its own Command Validator and handle converting the above command into the OrderCreated event.

{
    "eventType": "CreateOrder",
    "timestamp": 1575103613,
    "aggregateType": "order",
    "aggregateId": "1c7c9f61-542e-49ca-8d06-524b71e63866",
    "data": {
        "code": "19A31SA031",
        "items": [
            {
                "sku": "1805330",
                "name": "Bộ nhớ DDR4 G.Skill 8GB (2666) F4-2666C19S-8GIS",
                "price": 1190000,
                "quantity": 1,
            }
        ],
        "service": {
            "install": false,
            "delivery": false,
            "technicalSupport": false
        },
        "customer": {
            "id": 178316,
            "name": "Le Xuan Vy",
        },
    }
}

The generated event will be stored in EventStore. The next time you process another command, the aggregate with the identifier 1c7c9f61-542e-49ca-8d06-524b71e63866, if loaded, will have enough necessary information to handle validation for the following command. If the above command succeeds, a new event is persisted and added to aggregate.

Response to users can be at any time because the handling mechanism of the phrases is quite independent. We can enhance the power by immediately responding to users in the form of promises to continue performing the work, whether in a different thread or complete all and then return a response is possible.

Transform the system into a 5-star production line

Main components of Teko OMS

As described in how the system works above. The primary data object of the system consists of commands and events. The command is a user action to the domain object that has been compiled by the application layer, ready for the command handlers to execute. The Command Handler receives a command and decides how to deal with that command, generating one or more corresponding events.

The event is the product of processing a command, an entity that stores data changes on a system's domain object (or OMS, in particular, an order). When an event is persisted, a message is broadcast to the event handlers that know to do their part.

Events and Commands encapsulate the necessary data. And the business logic for interacting with the event and the system command is placed in the Event Handlers and Command Handlers. This makes our logic clearly organized, with less effort in tuning.

As we all know, our domain object, instead of being organized into one as CRUD application, replaces a unified set of events. Each order will now be considered as an Aggregate.

Aggregate, according to DDD, is a collection of entities with the same characteristics of the business. External objects that want to manipulate internally within an Aggregate must go through the root of the Aggregate (according to the command processing system). According to CQRS, Aggregate has no data-preserving properties for reading, and cannot be used for reading.

From the above theories, an order will now consist of many events grouped and named OrderAggregate and have its own identifier. All operations on the order (or affecting the order's event set) must go through the order's aggregate root. And we cannot directly retrieve order data via Aggregate. Because of the nature of the Aggregate, we will consider the Aggregate as a Command Handler. Aggregate's task is to receive a command and choose to handle that command or not, create a new event or not.

Form a pipeline

Besides the command and event data processing core, we also need some components to make things working as smoothly as possible. When an action (request) of a user is received, this action will be compiled by the User Interface (HTTP Framework) first, collecting data and pushing to the Command Service. Using the Command Service as an entry point for our application allows us to add/switch multiple framework interfaces such as gRPC / RestfulAPI / message queue. Our system then doesn't need to change much, because all the communication is passing through a unique Command Service.

Command Service then compiles the data from the request to generate Command and passes it to the Command Gateway. Command Gateway directs Command to the right Command Handler. As the procedure described on the Command Handler will use the Command to generate the corresponding Events. The event will be saved to the Event Store.

With the above information, storing event data is pretty simple, just basic information such as:

  • aggregateId: identifier of aggregate
  • aggregateType: aggregate type to identify the right object
  • eventType: event type determined what happened
  • eventData: data of the event
  • version: ordering of the event
    Some more info:
  • timestamp: event time - for traceback
  • eventId: id of the event - to traceback

After the event is successfully persisted, it is considered to be committed and distributed to the event handlers for post-processing. This is done via the Event Bus. Similar to the Command Bus (Command Gateway), the Event Bus will put the event in the right place where it can be handled. It consists of the main components: the Projector (View Handler) and the SAGA (Transaction Handler). Because the projector has built a fully complete order view, Query side only needs to maintain a simple Query Service to transform this data in accordance with what suits the user requests most.

Event Handler - Silent warriors

When an event is accepted, the system will have to notify the other components of the system to know and post-process the event.

Projector

First and foremost, we must create a view on the Query Side of the system to provide the user with the order's latest status.

Projector's business logic may seem to overlap with the Aggregate event replay logic. Still, they were developed to better suit the needs of users' queries, while Aggregate's replay logic focused on collecting enough data for order data and command validation phrase.

After generating these views, they will be persisted in optimal storage systems for reading, such as ElasticSearch or databases.

SAGA

SAGA was born to solve the problem of processing cross aggregate requests of orders. It is also used to address the side effect of events.

Take the example for ease of understanding:

  1. Event OrderCreated requires us to contact the sales promotion department to validate the order. Our Aggregate cannot use information outside of its managed system to handle this request for us. So SAGA is bound to the above event and makes a call to PPM
  2. Event PaymentRedistributed requires us to reallocate the payment amount on the two orders accordingly. With this request, our Aggregate cannot use the information outside its managed system to handle this request for us. So SAGA is bound to the above event and reallocates the money.

Through these two examples, we have drawn: Saga is an event handler responsible for supporting the implementation of cross aggregate requests and side effect management. Saga's interaction with the Aggregates remains through the Command like any other manipulation method with the Aggregates. It means:

  1. Generate Command ConfirmPrice & Promotion to update promotion information into the order
  2. Generate Command AddPayment and ReducePayment to update payments information of the corresponding orders and control transactions for this process.

Recommended technical stacks - God is in the details

In order to build a complete system, besides the designation and construction planning, we should also consider selecting the right materials and tools. Because the system is designed based on the entire hierarchical architecture, it is fully compatible with different infrastructures. The HTTP framework or message broker choices is mostly depending on the views and benefit evaluation of each individual. We only discuss the technologies used to store and transfer events in the system.

According to the design we have proposed, we can state some technical requirements of EventStore and OrderViewStore as follows:

  • EventStore
    1. Must optimize writing, optimize transactions to ensure writing data in errors, control data saved so that the system runs stably
    2. Ensure throughput and filter record by id and type as quickly as possible
  • OrderViewStore
    1. The reading speed must be breakneck and support a variety of query formats
    2. The write speed does not need to be very fast and is easy to change in terms of structure

Based on the above requirements, we propose:

  • EventStore uses SQL technology: MySQL, PostgreSQL, MariaDB, ... to ensure transactional
  • OrderViewStore uses a combination of NoSQL databases: MongoDB, Cassandra, ElasticSearch, ... to get the best effect

Meanwhile, the communication of CommandBus and EventBus is related to setting the main application's connection to the workers who perform the corresponding tasks. There are many types of buses that can be referred to as:

  • Direct coroutine init and pass data init to run: must manage coroutine well, fallback management
  • Use message broker
    • Redis pub/sub: Fast, convenient, worry about loss and recovery method
    • RabbitMQ, Kafka: Stable, no worries about data loss because of sustainable ack mechanism, but resource consumption

Our innovations on improving CQRS/ES pattern

Event & State Machine

OMS's orientation in the eCommerce ecosystem is comprehensive order management, including the synchronization and management of the dispersed status of orders from all other client and back-end systems. With such a big mission, the product design orientation is something that cannot be ignored. From the view that every order activity depends heavily on its state system, the canonicalization of all documentation and the OMS State Machine is a must.

Setting up Order State Machine

The order state is evenly spread across all orders business from all systems. Hence the business team must unite the states and constructs the state transitions into the State Machine Diagram. In that, we describe the operations associated with each shift, corresponding to an event. State Machine ensures orders always follow a particular flow of processing, providing a universal insight between the business team and the development team. Now every problem is easy to understand when addressing using this diagram. The State Machine acts as a document describing the developer system, and the features are modeled straight on this model, transformed directly into commands.

State Machine Operation

State Machine is powerful but is a complicated logic to apply. Basically, the state of a calculated order belongs to the business logic of the aggregate. After each event applying to the order, we will re-determine the status of the order according to the current attribute of the order.

With every new version of the state machine, a new data field is added or worse than an added event making the state machine move a bit further. This change may affect the compatibility of the system with the old version if used. This logic is applied directly to orders that are routed through the old event system.

State Updated Event

To accommodate this change, we must generate an event to record this change for each change in the state of the order. When handling the command, we just need to apply the new event to determine the latest state of the order and record it. With this improvement, orders can partially fit with the State Machine and adapt directly to the current version immediately

Side Effect Trigger

Consider a practical case, with the OrderConfirmed event, after successfully acknowledging, we must send a notification to other systems to notify. Where should this business logic be located?

  • In the event handler:
    • Cannot recover if something goes wrong
    • Event data may not be enough to generate the above message
  • In the command handler of the ConfirmOrder command:
    • Cannot recover if something goes wrong

In this case, we need the Saga trigger to send this notification to other systems via a new command. These commands are intended to trigger side effects from the application and do not affect the business flow. Recovery from the incident is straightforward.

Conclusion

In order to bring the best user experience, OMS has spent a long time researching and developing and producing its own version of CQRS / ES. Certainly, there will be many imperfections, but through the process of working and learning, we have drawn many valuable lessons, which is the most valuable thing.

All information, we have summarized and announced in the TEKO Summit 2020 event. To follow the information for the next event, please subscribe to the TEKO Vietnam fan page.

Slide attachment: https://docs.google.com/presentation/d/1YHfX8FanNH1Mo0K6X9q-vDb6Qmh9FKJzDgh2fS2HT-E/edit?usp=sharing