stram: Snowflake data applications using just Python

Created
Fri Oct 28 15:58:40 CEST 2022
Yesterday we went live with a complex stateful batch data processing flow. In our current team, we use Snowflake primarily. Most use cases are for reporting. We would normally use data build tool (DBT) for building data marts. For this use case, however, we could not oversee if DBT would limit us, since:

We wondered how much work would it be, really, to work with Python scripts directly. It turned out to be pretty straightforward. After going live, we released a few simple Python helper functions we rely on in our flow publicly, under the name "stram": https://github.com/rwberendsen/stram. In the `example_use.py` file, you can see that the function `run_task` is doing the actual SQL work. We would call this function in production from a simple Python function that is used as `python_callable` in an Airflow PythonOperator. But our module itself has no dependency on Airflow whatsoever! It could be deployed in any way you want, giving us maximal flexibility and control.

Using the helper functions in "stram", we were able to build our complex flow. We did use directory structure ideas borrowed from DBT, similar to this guide. But for ideas about the layout of the code base, you don't need DBT itself :-)

Go ahead and play with the Python code, improve it, and use it in your work :-)