Python data-integration

Open-source Python projects categorized as data-integration

Top 11 Python data-integration Projects

  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

  • Project mention: AI Strategy Guide: How to Scale AI Across Your Business | dev.to | 2024-05-11

    Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.

  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

  • Project mention: How to Build a Chat App with Your Postgres Data using Agent Cloud | dev.to | 2024-05-13

    AgentCloud uses Airbyte to build data pipelines, which allow us to split, chunk, and embed data from over 300 data sources, including Postgres.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • dagster

    An orchestration platform for the development, production, and observation of data assets.

  • Project mention: AI Strategy Guide: How to Scale AI Across Your Business | dev.to | 2024-05-11

    Level 1 of MLOps is when you've put each lifecycle stage and their intefaces in an automated pipeline. The pipeline could be a python or bash script, or it could be a directed acyclic graph run by some orchestration framework like Airflow, dagster or one of the cloud-provider offerings. AI- or data-specific platforms like MLflow, ClearML and dvc also feature pipeline capabilities.

  • Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

  • Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • ingestr

    ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

  • Project mention: FLaNK 04 March 2024 | dev.to | 2024-03-04
  • mara-pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

  • recap

    Work with your web service, database, and streaming schemas in a single format.

  • Project mention: Recap: A python library for describing database tables and serialization formats with minimal type coercion. | /r/dataengineering | 2023-07-12

    The Github Repo: https://github.com/recap-build/recap

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • prism

    Prism is the easiest way to develop, orchestrate, and execute data pipelines in Python. (by runprism)

  • Project mention: Prism: the easiest way to create robust data workflows. Accessible via CLI | /r/coolgithubprojects | 2023-09-21
  • nfcompose

    Build REST APIs/Integrations in minutes instead of hours - NF Compose is a (data) integration platform that allows developers to define REST APIs in seconds instead of hours. Generated REST APIs are backed by postgres and support automatic consumer webhook notifications on data changes out of the box.

  • Project mention: Implementing system-versioned tables in Postgres | news.ycombinator.com | 2024-02-07

    I have implemented this for our tool NF Compose that allows us to build REST APIs without writing a single line of code [0]. I didn't go the route of triggers because we generate database tables automatically and we used to have a crazy versioning scheme that was inspired by data vault and anchor modelling where we stored every change on every attribute as a new record.

    Sounded cool, but in practice it was really slow. The techniques that are usually employed by Data Vault to fix this issue seemed too complex. Over time we moved to an implementation that handles the historization dynamically at runtime by generating SQL queries ourselves [1]. On a sidenote: Generating SQL in python sounds dangerous, but we spent a lot of time on making it secure. We even have a linter that checks that everything is escaped properly whenever we are in dev mode [2]

    [0] https://github.com/neuroforgede/nfcompose/

  • UniFuncNet

    A multi-reference network annotation tool to support omics analysis

  • JDR

    Job Dependency Runner

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-integration related posts

  • Ingestr: CLI tool to copy data between any databases with a single command

    1 project | news.ycombinator.com | 27 Feb 2024
  • Show HN: Retake – Open-Source Hybrid Search for Postgres

    2 projects | news.ycombinator.com | 10 Aug 2023
  • We created an open-source semantic search Python package on top of Postgres

    1 project | /r/Python | 31 Jul 2023
  • Mage Battlegrounds: Craft insights from real-time customer behavior analysis

    2 projects | dev.to | 10 Apr 2023
  • JDR Tool Introduction (Job Dependency Runner)

    1 project | /r/madeinpython | 19 Mar 2023
  • Looking for an open-source project

    2 projects | /r/dataengineering | 13 Feb 2023
  • Daskqueue: Dask-based distributed task queue

    1 project | /r/dataengineering | 5 Feb 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 20 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source data-integration projects in Python? This list will help you:

Project Stars
1 Airflow 34,705
2 airbyte 14,296
3 dagster 10,382
4 Mage 7,131
5 ingestr 2,341
6 mara-pipelines 2,053
7 recap 305
8 prism 79
9 nfcompose 32
10 UniFuncNet 10
11 JDR 3

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com