Top 10 Python workflow-automation Projects

dagster

47 10,468 10.0 Python

An orchestration platform for the development, production, and observation of data assets.

Project mention: AI Strategy Guide: How to Scale AI Across Your Business | dev.to | 2024-05-11

Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.

doit

20 1,795 0.0 Python

task management & automation tool

Project mention: How do you deal with CI, project config, etc. falling out of sync across repos? | /r/ExperiencedDevs | 2023-12-06

I like mage for Go and doit for Python.

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
obsei

9 1,148 5.0 Python

Obsei is a low code AI powered automation tool. It can be used in various business flows like social listening, AI based alerting, brand image analysis, comparative study and more .
couler

1 892 4.8 Python

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Project mention: (Not) to Write a Pipeline | news.ycombinator.com | 2023-06-27

author seems to be describing the kind of patterns you might make with https://argoproj.github.io/argo-workflows/ . or see for example https://github.com/couler-proj/couler , which is an sdk for describing tasks that may be submitted to different workflow engines on the backend.
it's a little confusing to me that the author seems to object to "pipelines" and then equate them with messaging-queues. for me at least, "pipeline" vs "workflow-engine" vs "scheduler" are all basically synonyms in this context. those things may or may not be implemented with a message-queue for persistence, but the persistence layer itself is usually below the level of abstraction that $current_problem is really concerned with. like the author says, eventually you have to track state/timestamps/logs, but you get that from the beginning if you start with a workflow engine.
i agree with author that message-queues should not be a knee-jerk response to most problems because the LoE for edge-cases/observability/monitoring is huge. (maybe reach for a queue only if you may actually overwhelm whatever the "scheduler" can handle.) but don't build the scheduler from scratch either.. use argowf, kubeflow, or a more opinionated framework like airflow, mlflow, databricks, aws lamda or step-functions. all/any of these should have config or api that's robust enough to express rate-limit/retry stuff. almost any of these choices has better observability out-of-the-box than you can easily get from a queue. but most importantly.. they provide idioms for handling failure that data-science folks and junior devs can work with. the right way to structure code is just much more clear and things like structuring messages/events, subclassing workers, repeating/retrying tasks, is just harder to mess up.

gh-action-pypi-publish

5 857 8.2 Python

The blessed :octocat: GitHub Action, for publishing your :package: distribution files to PyPI: https://github.com/marketplace/actions/pypi-publish
covalent

4 711 8.4 Python

Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments. (by AgnostiqHQ)

Project mention: Remote execution of code | /r/Python | 2023-12-05

Pretty interesting request, if SSH is not used, i would try using something like dask which uses tcp to connect and execute assuming your workers are in another machine.I also think something like covalent can be used to extend your own custom plugin in their ecosystem to connect how you want. We have a very custom private plugin written on top of covalent's to have a custom protocol to connect our central on-prem GPU machines to our local laptops that is rpc based, mostly for high performance as well as some mandate security from where the GPU machines are. Once done it is pretty much something like

ck

9 582 9.9 Python

Collective Mind (CM) is a small, modular, cross-platform and decentralized workflow automation framework with a human-friendly interface and reusable automation recipes to make it easier to build, run, benchmark and optimize AI, ML and other applications and systems across diverse and continuously changing models, data, software and hardware (by mlcommons)
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
hera

1 503 9.3 Python

Hera is an Argo Python SDK. Hera aims to make construction and submission of various Argo Project resources easy and accessible to everyone! Hera abstracts away low-level setup details while still maintaining a consistent vocabulary with Argo. ⭐️ Remember to star!
YALCST

1 7 10.0 Python

YALCST is a GITHUB ACTION to sync LeetCode submissions into GITHUB REPO automatically, written in Python.
JDR

2 3 2.4 Python

Job Dependency Runner

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python workflow-automation related posts

Experience with Dagster.io?

1 project | news.ycombinator.com | 25 Jul 2023
Inngest raises $3M seed to build the reliable workflow platform for every dev

5 projects | news.ycombinator.com | 12 Jul 2023
Dagster tutorials

1 project | /r/dataengineering | 26 Jun 2023
The Dagster Master Plan

2 projects | /r/dataengineering | 16 Jun 2023
Prefect alternatives meant for Slurm (HPC)

1 project | /r/dataengineering | 11 Jun 2023
The Why and How of Dagster User Code Deployment Automation

1 project | dev.to | 1 May 2023
dbt Cloud Alternatives?

2 projects | /r/dataengineering | 23 Jan 2023
A note from our sponsor - Scout Monitoring
www.scoutapm.com | 6 Jun 2024

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today. Learn more →

Index

What are some of the best open-source workflow-automation projects in Python? This list will help you:

	Project	Stars
1	dagster	10,468
2	doit	1,795
3	obsei	1,148
4	couler	892
5	gh-action-pypi-publish	857
6	covalent	711
7	ck	582
8	hera	503
9	YALCST	7
10	JDR	3

Python workflow-automation

Top 10 Python workflow-automation Projects

Python workflow-automation related posts

Experience with Dagster.io?

Inngest raises $3M seed to build the reliable workflow platform for every dev

Dagster tutorials

The Dagster Master Plan

Prefect alternatives meant for Slurm (HPC)

The Why and How of Dagster User Code Deployment Automation

dbt Cloud Alternatives?

Index