Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 15 Python data-pipeline Projects
-
airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
covalent
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments. (by AgnostiqHQ)
-
premier-league
A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
alto
Alto is a versatile data integration tool that allows you to easily run Singer plugins, build and cache PEX files encapsulating those plugins, and create a data reservoir whereby you can extract once and replay to as many destinations as you want. (by z3z1ma)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: How to Build a Chat App with Your Postgres Data using Agent Cloud | dev.to | 2024-05-13AgentCloud uses Airbyte to build data pipelines, which allow us to split, chunk, and embed data from over 300 data sources, including Postgres.
Project mention: How do you deal with CI, project config, etc. falling out of sync across repos? | /r/ExperiencedDevs | 2023-12-06I like mage for Go and doit for Python.
Pretty interesting request, if SSH is not used, i would try using something like dask which uses tcp to connect and execute assuming your workers are in another machine.I also think something like covalent can be used to extend your own custom plugin in their ecosystem to connect how you want. We have a very custom private plugin written on top of covalent's to have a custom protocol to connect our central on-prem GPU machines to our local laptops that is rpc based, mostly for high performance as well as some mandate security from where the GPU machines are. Once done it is pretty much something like
Project mention: Show HN: PipeRider – open-source Data Impact Analysis for dbt changes | news.ycombinator.com | 2023-09-06
I have a data engineering project that uses BigQuery, Cloud Run, Compute Engine, Cloud SQL, Artifact Registry, Firestore, and Datastream.
Project mention: Show HN: VQASynth – pipelines to synthesize VQA datasets | news.ycombinator.com | 2024-02-23
Python data-pipeline related posts
-
Ingestr: CLI tool to copy data between any databases with a single command
-
Show HN: PipeRider – open-source Data Impact Analysis for dbt changes
-
Open source data observability tools with UI?
-
Data profiling as part of a data reliability strategy?
-
Show HN: PipeRider, data reliability automated tool
-
A simple lazy Python Calculation Engine (with spreadsheet demo)
-
Build and deploy a serverless data pipeline on AWS with no effort.
-
A note from our sponsor - InfluxDB
www.influxdata.com | 17 May 2024
Index
What are some of the best open-source data-pipeline projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | airbyte | 14,217 |
2 | ingestr | 2,341 |
3 | doit | 1,788 |
4 | DataEngineeringProject | 985 |
5 | covalent | 702 |
6 | piperider | 469 |
7 | premier-league | 154 |
8 | datajob | 108 |
9 | patterns-devkit | 106 |
10 | airflow-testing-ci-workflow | 84 |
11 | VQASynth | 76 |
12 | alto | 48 |
13 | datatap-python | 34 |
14 | data-engineer-challenge | 25 |
15 | pyDag | 24 |
Sponsored