Python Databricks

Open-source Python projects categorized as Databricks

Top 11 Python Databrick Projects

  • Redash

    Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

  • Project mention: Redash: Connect to data source, easily visualize, dashboard and share your data | news.ycombinator.com | 2024-03-20
  • dolly

    Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

  • Project mention: "[D]" Using data from Alpaca for a commercial version of a Open LLM | /r/MachineLearning | 2023-07-02
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • sqlglot

    Python SQL Parser and Transpiler

  • Project mention: The Future of MySQL is PostgreSQL: an extension for the MySQL wire protocol | news.ycombinator.com | 2024-04-26

    This is probably referring to "zero changes to your driver code" and not "zero changes to the SQL you send over this driver".

    Translating between SQL dialects is notoriously hard and attempts to translate [1] are working in 95% of cases. But the last 5% would require 5x amount of work. That's because "SQL dialect" also includes weird edge cases of type inference of things like COALESCE(5, FALSE) and emulation of system catalogs (pg_catalog, information_schema).

    [1] https://github.com/tobymao/sqlglot

  • dbrx

    Code examples and resources for DBRX, a large language model developed by Databricks

  • Project mention: Hello OLMo: A Open LLM | news.ycombinator.com | 2024-04-08

    One thing I wanted to add and call attention to is the importance of licensing in open models. This is often overlooked when we blindly accept the vague branding of models as “open”, but I am noticing that many open weight models are actually using encumbered proprietary licenses rather than standard open source licenses that are OSI approved (https://opensource.org/licenses). As an example, Databricks’s DBRX model has a proprietary license that forces adherence to their highly restrictive Acceptable Use Policy by referencing a live website hosting their AUP (https://github.com/databricks/dbrx/blob/main/LICENSE), which means as they change their AUP, you may be further restricted in the future. Meta’s Llama is similar (https://github.com/meta-llama/llama/blob/main/LICENSE ). I’m not sure who can depend on these models given this flaw.

  • optscale

    FinOps and MLOps platform to run ML/AI and regular cloud workloads with optimal performance and cost.

  • Project mention: Profile and instrument ML experiments and optimize their performance expenses | news.ycombinator.com | 2023-09-27
  • dbx

    🧱 Databricks CLI eXtensions - aka dbx is a CLI tool for development and advanced Databricks workflows management.

  • databricks-sdk-py

    Databricks SDK for Python (Beta)

  • Project mention: CI/CD for Databricks | /r/dataengineering | 2023-07-11

    To build custom deployment scripts, that go beyond declarative definitions, you are welcome to use https://github.com/databricks/databricks-sdk-py, https://github.com/databricks/databricks-sdk-jvm, and https://github.com/databricks/databricks-sdk-go.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • nutter

    Testing framework for Databricks notebooks

  • dbt-databricks

    A dbt adapter for Databricks.

  • xonai-dashboard

    A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver

  • Project mention: Show HN: Open sourcing a Big Data monitoring tool | news.ycombinator.com | 2024-03-29
  • fastdbfs

    fastdbfs - An interactive command line client for Databricks DBFS.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Databricks related posts

  • Hello OLMo: A Open LLM

    3 projects | news.ycombinator.com | 8 Apr 2024
  • DBRX: A New Open LLM

    6 projects | news.ycombinator.com | 27 Mar 2024
  • Databricks SDK for Python

    1 project | /r/datascience | 28 Jun 2023
  • Official Python SDK for Databricks

    1 project | news.ycombinator.com | 28 Jun 2023
  • How much object orienteered do you use in your projects? Bonus points for integration and unit tests

    1 project | /r/dataengineering | 19 Mar 2023
  • how/where do you define your databricks jobs, tasks and workflows?

    1 project | /r/dataengineering | 1 Nov 2022
  • Any suggestions for building DBT project on DataBricks?

    1 project | /r/dataengineering | 8 Oct 2022
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 23 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Databrick projects in Python? This list will help you:

Project Stars
1 Redash 25,106
2 dolly 10,792
3 sqlglot 5,679
4 dbrx 2,428
5 optscale 1,018
6 dbx 434
7 databricks-sdk-py 304
8 nutter 264
9 dbt-databricks 186
10 xonai-dashboard 11
11 fastdbfs 4

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com