Python dbt

Open-source Python projects categorized as dbt

Top 23 Python dbt Projects

  • Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

  • Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • data-diff

    Discontinued Compare tables within or across databases

  • Project mention: How to Check 2 SQL Tables Are the Same | news.ycombinator.com | 2023-07-26

    If the issue happen a lot, there is also: https://github.com/datafold/data-diff

    That is a nice tool to do it cross database as well.

    I think it's based on checksum method.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • soda-core

    :zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

  • sqlmesh

    Efficient data transformation and modeling framework that is backwards compatible with dbt.

  • Project mention: Launch HN: Serra (YC S23) – Open-source, Python-based dbt alternative | news.ycombinator.com | 2023-08-14

    There is also sqlmesh (https://sqlmesh.com/). Pretty new as well. It introduces some interesting concepts. For smaller dbt projects it could be a drop-in replacement as it allows importing dbt projects.

  • dbt-duckdb

    dbt (http://getdbt.com) adapter for DuckDB (http://duckdb.org)

  • streamify

    A data engineering project with Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP and much more!

  • piperider

    Code review for data in dbt

  • Project mention: Show HN: PipeRider – open-source Data Impact Analysis for dbt changes | news.ycombinator.com | 2023-09-06
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • astronomer-cosmos

    Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code

  • dbt-metabase

    dbt + Metabase integration

  • airflow-dbt

    Apache Airflow integration for dbt

  • dbt-data-reliability

    dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

  • grai-core

  • Project mention: Launch HN: Grai (YC S22) – Open-Source Data Observability Platform | news.ycombinator.com | 2023-07-17

    Elastic v2 if one is interested in such things: https://github.com/grai-io/grai-core/blob/v0.1.33/LICENSE

  • recs-at-resonable-scale

    Recommendations at "Reasonable Scale": joining dataOps with recSys through dbt, Merlin and Metaflow

  • Project mention: When writing ML software - how do you use TDD? | /r/mlops | 2023-06-25

    Good paper, and in response to that one a team from Coveo, wrote this paper on behavioral tests for recommender systems... and also this repo.

  • dbt-clickhouse

    The Clickhouse plugin for dbt (data build tool)

  • dbt-coves

    CLI tool for dbt users to simplify creation of staging models (yml and sql) files

  • dbt-athena

    The athena adapter plugin for dbt (https://getdbt.com) (by dbt-athena)

  • dbt-databricks

    A dbt adapter for Databricks.

  • post-modern-stack

    Joining the modern data stack with the modern ML stack

  • Project mention: [Advice] MLOps Course recommendations | /r/datascience | 2023-06-24

    End-to-end stuff, full-fledge stacks: https://github.com/jacopotagliabue/post-modern-stack

  • dbt-ml-preprocessing

    A SQL port of python's scikit-learn preprocessing module, provided as cross-database dbt macros.

  • dbt-coverage

    One-stop-shop for docs and test coverage of dbt projects.

  • dbt2looker

    Generate lookml for views from dbt models

  • dbterd

    Generate the ERD as a code from dbt artifacts

  • valmi-activation

    âš¡ valmi.io reverse ETL (data activation) is the open source ( OSS ) data activation platform to load data from warehouses into Webhooks and SaaS tools like Klaviyo, Facebook Ads, Salesforce, Braze etc. Valmi.io Customer Data Platform (CDP) helps track and ingest user activity events from websites, shopify, serverside events. https://cloud.valmi.io

  • Project mention: Show HN: Valmi.io Open Source Reverse-ETL Engine | news.ycombinator.com | 2023-06-21
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python dbt related posts

  • Launch HN: Grai (YC S22) – Open-Source Data Observability Platform

    3 projects | news.ycombinator.com | 17 Jul 2023
  • When writing ML software - how do you use TDD?

    2 projects | /r/mlops | 25 Jun 2023
  • [Advice] MLOps Course recommendations

    3 projects | /r/datascience | 24 Jun 2023
  • Run dbt projects as Apache Airflow DAGs and Task Groups with a few lines of code

    1 project | news.ycombinator.com | 1 May 2023
  • Curious if anyone has adopted a stack to do raw data ingestion in Databricks?

    2 projects | /r/dataengineering | 25 Apr 2023
  • Running dbt core on airflow

    1 project | /r/dataengineering | 19 Apr 2023
  • dolly-v2-12b

    3 projects | /r/LocalLLM | 13 Apr 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 2 Jun 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source dbt projects in Python? This list will help you:

Project Stars
1 Mage 7,202
2 data-diff 2,899
3 soda-core 1,786
4 sqlmesh 1,383
5 dbt-duckdb 754
6 streamify 474
7 piperider 471
8 astronomer-cosmos 476
9 dbt-metabase 432
10 airflow-dbt 382
11 dbt-data-reliability 349
12 grai-core 271
13 recs-at-resonable-scale 218
14 dbt-clickhouse 222
15 dbt-coves 216
16 dbt-athena 194
17 dbt-databricks 190
18 post-modern-stack 181
19 dbt-ml-preprocessing 176
20 dbt-coverage 174
21 dbt2looker 171
22 dbterd 171
23 valmi-activation 130

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com