Python data-pipeline

Open-source Python projects categorized as data-pipeline

Top 15 Python data-pipeline Projects

  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

  • Project mention: How to Build a Chat App with Your Postgres Data using Agent Cloud | dev.to | 2024-05-13

    AgentCloud uses Airbyte to build data pipelines, which allow us to split, chunk, and embed data from over 300 data sources, including Postgres.

  • ingestr

    ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

  • Project mention: FLaNK 04 March 2024 | dev.to | 2024-03-04
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • doit

    task management & automation tool

  • Project mention: How do you deal with CI, project config, etc. falling out of sync across repos? | /r/ExperiencedDevs | 2023-12-06

    I like mage for Go and doit for Python.

  • DataEngineeringProject

    Example end to end data engineering project.

  • covalent

    Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments. (by AgnostiqHQ)

  • Project mention: Remote execution of code | /r/Python | 2023-12-05

    Pretty interesting request, if SSH is not used, i would try using something like dask which uses tcp to connect and execute assuming your workers are in another machine.I also think something like covalent can be used to extend your own custom plugin in their ecosystem to connect how you want. We have a very custom private plugin written on top of covalent's to have a custom protocol to connect our central on-prem GPU machines to our local laptops that is rpc based, mostly for high performance as well as some mandate security from where the GPU machines are. Once done it is pretty much something like

  • piperider

    Code review for data in dbt

  • Project mention: Show HN: PipeRider – open-source Data Impact Analysis for dbt changes | news.ycombinator.com | 2023-09-06
  • premier-league

    A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.

  • Project mention: Google Cloud Portfolio Projects? | /r/googlecloud | 2023-12-09

    I have a data engineering project that uses BigQuery, Cloud Run, Compute Engine, Cloud SQL, Artifact Registry, Firestore, and Datastream.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • datajob

    Build and deploy a serverless data pipeline on AWS with no effort.

  • patterns-devkit

    Data pipelines from re-usable components

  • airflow-testing-ci-workflow

    (project & tutorial) dag pipeline tests + ci/cd setup

  • VQASynth

    Compose multimodal datasets 🎹

  • Project mention: Show HN: VQASynth – pipelines to synthesize VQA datasets | news.ycombinator.com | 2024-02-23
  • alto

    Alto is a versatile data integration tool that allows you to easily run Singer plugins, build and cache PEX files encapsulating those plugins, and create a data reservoir whereby you can extract once and replay to as many destinations as you want. (by z3z1ma)

  • datatap-python

    Focus on Algorithm Design, Not on Data Wrangling

  • data-engineer-challenge

    Challenge Data Engineer

  • pyDag

    Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-pipeline related posts

  • Ingestr: CLI tool to copy data between any databases with a single command

    1 project | news.ycombinator.com | 27 Feb 2024
  • Show HN: PipeRider – open-source Data Impact Analysis for dbt changes

    3 projects | news.ycombinator.com | 6 Sep 2023
  • Open source data observability tools with UI?

    4 projects | /r/dataengineering | 18 Mar 2023
  • Data profiling as part of a data reliability strategy?

    2 projects | /r/dataengineering | 15 Sep 2022
  • Show HN: PipeRider, data reliability automated tool

    2 projects | news.ycombinator.com | 23 Jun 2022
  • A simple lazy Python Calculation Engine (with spreadsheet demo)

    5 projects | news.ycombinator.com | 7 Jul 2021
  • Build and deploy a serverless data pipeline on AWS with no effort.

    1 project | /r/serverless | 23 Jun 2021
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 17 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source data-pipeline projects in Python? This list will help you:

Project Stars
1 airbyte 14,217
2 ingestr 2,341
3 doit 1,788
4 DataEngineeringProject 985
5 covalent 702
6 piperider 469
7 premier-league 154
8 datajob 108
9 patterns-devkit 106
10 airflow-testing-ci-workflow 84
11 VQASynth 76
12 alto 48
13 datatap-python 34
14 data-engineer-challenge 25
15 pyDag 24

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com