Flow45

Flow45 is a workflow management tool for software engineers, including data engineers. It is designed for batch oriented workflows, and human-in-the-loop workflows. In any case, human operators can request an undo of tasks. As long as the volume of task instances does not require a distributed data processing framework, Flow45 can be a good fit. You define and deploy your workflow templates, tasks, dependencies, and optional schedules as configuration, not as code. You give each task a function name, and you create a struct data type for its output. The upstream dependencies of your task define the data types of your function its input parameters, thus defining the signature of your function. One function name can be used in many tasks, and in many workflows. Your job is to manage processes that look for available task instances for particular function names, pick them up, complete the work, report on its progress, and eventually submit a struct with details on the outcome (e.g., the physical location of a document your process produced). Flow45 is a non-distributed, passive, storage layer for metadata about the outcomes of your task instances. Because of its non-distributed nature, it is simple, and the distributed processes you use to complete task instances can use it to establish a single version of the truth about which tasks have been completed (or skipped) and which tasks can therefore be picked up next. A single Flow45 installation can be used by different teams. This makes it easier and cheaper to scale the use of strum in an organisation. Installation of Flow45 means installing a database, managing a few long living processes to keep track of which task instances should be made available, and a few services behind APIs that allow developers to deploy workflows, data types, pick up tasks, and submit results. Since Flow45 is fairly lightweight in terms of what it needs to process, allowing for multiple teams will allow you to scale out more easily. It is important to stress again that the outcomes of taks instances should contain only operational, technical metadata, not actual data, and certainly not secrets. As soon as you decide to produce a massive volume of task instances, Flow45 starts to be less of a good fit. The number of task instances should not constitute big data.

Created: Sat 24 Feb 2024 20:16:21 CET
Updated: Sat 24 Feb 2024 20:16:21 CET