Flow45
Flow45 is a workflow management tool for software engineers, including data
engineers. It is designed for batch oriented workflows, and human-in-the-loop
workflows. In any case, human operators can request an undo of tasks. As long
as the volume of task instances does not require a distributed data processing
framework, Flow45 can be a good fit.
You define and deploy your workflow templates, tasks, dependencies, and
optional schedules as configuration, not as code. You give each task a function
name, and you create a struct data type for its output. The upstream
dependencies of your task define the data types of your function its input
parameters, thus defining the signature of your function. One function name can
be used in many tasks, and in many workflows.
Your job is to manage processes that look for available task instances for
particular function names, pick them up, complete the work, report on its
progress, and eventually submit a struct with details on the outcome (e.g., the
physical location of a document your process produced).
Flow45 is a non-distributed, passive, storage layer for metadata about the
outcomes of your task instances. Because of its non-distributed nature, it is
simple, and the distributed processes you use to complete task instances can
use it to establish a single version of the truth about which tasks have been
completed (or skipped) and which tasks can therefore be picked up next.
A single Flow45 installation can be used by different teams. This makes it
easier and cheaper to scale the use of strum in an organisation. Installation
of Flow45 means installing a database, managing a few long living processes to
keep track of which task instances should be made available, and a few services
behind APIs that allow developers to deploy workflows, data types, pick up
tasks, and submit results. Since Flow45 is fairly lightweight in terms of what
it needs to process, allowing for multiple teams will allow you to scale out
more easily. It is important to stress again that the outcomes of taks
instances should contain only operational, technical metadata, not actual data,
and certainly not secrets. As soon as you decide to produce a massive volume
of task instances, Flow45 starts to be less of a good fit. The number of task
instances should not constitute big data.
- Created
- Sat 24 Feb 2024 20:16:21 CET
- Updated
- Sat 24 Feb 2024 20:16:21 CET