SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Pipeline Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
Kedro
Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
-
Mage
🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
-
mara-pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
-
toil
A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.
-
NeumAI
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
-
pypyr automation task runner
pypyr task-runner cli & api for automation pipelines. Automate anything by combining commands, different scripts in different languages & applications into one pipeline process.
-
aws-lambda-handler-cookbook
This repository provides a working, deployable, open source-based, serverless service template with an AWS Lambda function and AWS CDK Python code with all the best practices and a complete CI/CD pipeline.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Prefect: A workflow orchestration tool for data pipelines | news.ycombinator.com | 2024-03-13
Project mention: How to Build a Chat App with Your Postgres Data using Agent Cloud | dev.to | 2024-05-13AgentCloud uses Airbyte to build data pipelines, which allow us to split, chunk, and embed data from over 300 data sources, including Postgres.
Interesting, thanks for sharing. I'll definitely take a look, although at this point I am so comfortable with Snakemake, it is a bit hard to imagine what would convince me to move to another tool. But I like the idea of composable pipelines: I am building a tool (too early to share) that would allow to lay Snakemake pipelines on top of each other using semi-automatic data annotations similar to how it is done in kedro (https://github.com/kedro-org/kedro).
Project mention: Python Day 9: Building Interactive Web Apps without HTML/CSS and JavaScript | dev.to | 2024-04-26Taipy is an open-source Python library that enables data scientists and developers to build robust end-to-end data pipelines.
Project mention: Spreadsheet errors can have disastrous consequences – yet we keep making them | news.ycombinator.com | 2024-01-25Pandas docs > Comparison with spreadsheets: https://pandas.pydata.org/docs/getting_started/comparison/co...
Pandas docs > I/O > Excel files: https://pandas.pydata.org/docs/user_guide/io.html#excel-file...
nteract/papermill: https://github.com/nteract/papermill :
> papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. [...]
> This opens up new opportunities for how notebooks can be used. For example:
> - Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.
"The World Excel Championship is being broadcast on ESPN" (2022) https://news.ycombinator.com/item?id=32420925 :
> Computational notebook speedrun ideas:
Project mention: Python: Uncovering the Overlooked Core Functionalities | news.ycombinator.com | 2023-07-24If you actually think this code is better there's a real library that does this: https://github.com/EntilZha/PyFunctional.
Maybe it would help you to look at the galaxy project: GitHub main site
Project mention: Show HN: Hatchet – Open-source distributed task queue | news.ycombinator.com | 2024-03-08a little late now, but I wonder if https://github.com/DataBiosphere/toil might meet your requirements
Project mention: Show HN: Neum AI – Open-source large-scale RAG framework | news.ycombinator.com | 2023-11-21Interesting to see that the semantic chunking in the tools library is a wrapper around GPT-4. Asks GPT for the python code and executes it: https://github.com/NeumTry/NeumAI/blob/main/neumai-tools/neu...
Project mention: Serverless APIs: Auto-Generate OpenAPI Docs & CI/CD Protections | dev.to | 2024-03-04In case you didn’t know, the Cookbook is a template project that allows you to get started with serverless with three clicks, and it has all the best practices and utilities that a production-grade serverless service requires.
Python Pipeline related posts
-
How to Build a Chat App with Your Postgres Data using Agent Cloud
-
Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres
-
Simple task runner for automation pipelines
-
25 million Creative Commons image dataset released!
-
Nextflow: Data-Driven Computational Pipelines
-
Airbyte API and Terraform Provider – available in open source
-
Need help moving 16gb of mongodb data to tableau
-
A note from our sponsor - SaaSHub
www.saashub.com | 22 May 2024
Index
What are some of the best open-source Pipeline projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | jina | 20,177 |
2 | Prefect | 14,829 |
3 | airbyte | 14,296 |
4 | great_expectations | 9,526 |
5 | Kedro | 9,398 |
6 | Taipy | 8,847 |
7 | Mage | 7,171 |
8 | papermill | 5,656 |
9 | pipelines | 3,457 |
10 | towhee | 3,015 |
11 | PyFunctional | 2,342 |
12 | mara-pipelines | 2,056 |
13 | pytorch-toolbelt | 1,488 |
14 | MLBox | 1,477 |
15 | galaxy | 1,320 |
16 | sematic | 947 |
17 | toil | 874 |
18 | NeumAI | 788 |
19 | pypyr automation task runner | 571 |
20 | aws-lambda-handler-cookbook | 464 |
21 | versatile-data-kit | 412 |
22 | karton | 370 |
23 | fluids | 340 |
Sponsored