Workflow Engine is a tool for you

Key Points:

  1. Workflow Engine is a necessary tool that is used to create reliable algorithms
  2. Using wrong tools to achieve reliability leads to severe compromise of code readability

Famously, “when all you have is a hammer, everything looks like a nail”. It is therefore a duty of a good software engineer to extend the repertoire of tools he or she can use to solve different kinds of problems. As the title suggests, in this article I’m going to argue that Workflow Engine is a tool you should at least know about.

Instead of starting with a tool, let’s start with a problem. Take a look at the following algorithm:

Let’s assume that this simple algorithm cannot be executed using a single atomic transaction. Further, for our purposes, let’s say that accountA does in fact have the required amount of funds. What could go wrong?

If we have taken some funds from accountA and then the algorithm failed for any number of technical reasons then the money will simply be lost. Some examples of such technical failures may include:
- networking problems (e.g. accountService.GiveTo performs HTTP request)
- a loss of power by the server executing our algorithm
- a sudden termination of the process (e.g. out of memory exception)
- etc

The point is — some algorithms should be resilient in the face of temporary technical problems that could arise at any moment in an algorithm execution. This was always a problem, but the highly distributed world of microservices cannot help but exacerbate it. So how could this resiliency be achieved, if using an atomic transaction for some reason is not an option?

Let’s look in our toolbox. What’s this? Why, it is our trusty Database — certainly we can use it to solve the problem at hand:

This code is of course a simplification. It does not account for some mechanism that has to watch for failures and restart the algorithm, perhaps using some backoff strategy. But it does illustrate the point I wanted to make: even for the simplest of algorithms this approach complicates things fast! Imagine what a pain in the butt it would be to implement if-else branches or cycles this way.

The problem with this approach is that it mixes different levels of concerns. Only 2 lines of code are there to perform the actual business logic; the rest of it is a technical code for managing the state of execution. Thus, we could provide resiliency using this approach, but we have to pay for it with another very important aspect of any codebase — readability.

Maybe Database is not the right tool for the job at hand after all. Let’s look in our toolbox again. Well, we in luck — Message Broker is widely used to provide resiliency for such algorithms:

This code is also a simplification. I assume that there are mechanisms that would read messages from the Message Broker and forward them to the FinishFundsTransfer method. Further, I assume that if some message was not processed successfully that it will be processed again and not simply deleted from the Message Broker.

Is this approach better in terms of readability than the previous approach using a Database? The answer is - No. The initial algorithm is now divided into 2 independent methods. This code snippet may lead you to believe that the methods will be close to each other. In fact, nothing stops them from being defined in different source code files, which can even reside in different repositories.

Whereas the Database approach with explicit execution state management diminished the readability of business logic by mixing it with technical noise, the Message Broker approach with asynchronous message processing breaks the algorithm apart and completely destroys its readability in the process. Oftentimes developers do not even understand the whole algorithm based on asynchronous messages because in their day-to-day they only ever see bits and pieces of it, but never the whole.

Thus, using standard tools we basically have to choose either bad or horrible readability for our reliable algorithms. So let’s take a look at the specialized tool for this particular problem — Workflow Engine.

Workflow Engines do have their own well-established terminology. An algorithm that should be reliably executed is called a Workflow. A single step or statement of a workflow is called an Activity. A standalone application or an embedded framework responsible for reliably executing workflows is called a Workflow Engine.

Workflow Engine ensures the reliability of workflows by persisting their state and automatically re-executing the last failed activity until the whole workflow is completed.

Additionally, some Workflow Engines may provide a UI for enhanced observability of your workflows. You could find any workflow execution, look at its state, activities, inputs and outputs of activities, how long did an activity take to execute, etc.

For example, if we apply a Workflow Engine to our initial algorithm, it could look something like this in C#:

The algorithm is now exactly as readable as in the beginning because Workflow Engine hides and abstracts away everything that is needed to achieve reliability.

In conclusion, Workflow Engine is a tool with which you can create reliable algorithms without sacrificing code readability. I will not go into details of particular Workflow Engines here. This article is meant to be only a short introduction to the concept. For Java and Go you may take a look at Temporal (which is based on Uber’s Cadence); there are examples of workflows resembling the ones used here for Java and Go.

For .NET you can read my article .NET Workflow Engines

Software engineer at Kaspersky Lab, Moscow. Opinions are my own.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store