data-warehouse

Open-source projects categorized as data-warehouse

Top 23 data-warehouse Open-Source Projects

  • awesome-bigdata

    A curated list of awesome big data frameworks, ressources and other awesomeness.

  • Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13
  • Greenplum

    Greenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parallel data platform for analytics, machine learning and AI.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • materialize

    The data warehouse for operational workloads. (by MaterializeInc)

  • Project mention: The Notifier Pattern for Applications That Use Postgres | news.ycombinator.com | 2024-05-14

    Those updates are not retroactive. They apply on a go forward basis. Each day's changes become Apache 2.0 licensed on that day four years in the future.

    For example, v0.28 was released on October 18, 2022, and becomes Apache 2.0 licensed four years after that date (i.e., 2.5 years from today), on October 18, 2026.

    [0]: https://github.com/MaterializeInc/materialize/blob/76cb6647d...

  • Rudderstack

    Privacy and Security focused Segment-alternative, in Golang and React

  • Project mention: Rudderstack Switches to Elastic License | news.ycombinator.com | 2023-09-08
  • hydra

    Hydra: Column-oriented Postgres. Add scalable analytics to your project in minutes. (by hydradatabase)

  • Project mention: Pg_lakehouse: Query Any Data Lake from Postgres | news.ycombinator.com | 2024-05-13

    How does this compare to Hydra? https://www.hydra.so/

  • DXY-COVID-19-Data

    2019新型冠状病毒疫情时间序列数据仓库 | COVID-19/2019-nCoV Infection Time Series Data Warehouse

  • Project mention: DXY-COVID-19-Data: NEW Data - star count:2218.0 | /r/algoprojects | 2023-10-17
  • elementary

    The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • dlt

    data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

  • Project mention: Ask HN: Freelancer? Seeking freelancer? (December 2023) | news.ycombinator.com | 2023-12-03

    SEEKING FREELANCER | REMOTE | GERMANY

    dltHub is looking for a freelance help in the following repos:

    - https://github.com/dlt-hub/dlt

  • Cubes

    [NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis

  • tensorbase

    TensorBase is a new big data warehousing with modern efforts.

  • Udacity-Data-Engineering-Projects

    Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

  • Project mention: Pitanje za data engineering? | /r/programiranje | 2023-06-30
  • bigquery-utils

    Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.

  • Project mention: Swirl: An open-source search engine with LLMs and ChatGPT to provide all the answers you need 🌌 | dev.to | 2023-09-06

    Using the Galaxy UI, knowledge workers can systematically review the best results from all configured services including Apache Solr, ChatGPT, Elastic, OpenSearch, PostgreSQL, Google BigQuery, plus generic HTTP/GET/POST with configurations for premium services like Google's Programmable Search Engine, Miro and Northern Light Research.

  • scratchdata

    Scratch is a swiss army knife for big data.

  • Project mention: Debugging a Golang Bug with Non-Blocking Reads | news.ycombinator.com | 2024-03-12

    Go team does acknowledge [1] it as a bug, so there is some point here

    However, that said, I wonder if OP (duckdb) could have written their solution [2] differently. Shouldn't they be able to select from a Pipe as well as Error channel simultaneously? (similar to how they are doing it inside here [3]). If not, I would have create a go-routine that does blocking read on the Pipe and then pass it on to another channel to select on.

    [1] https://github.com/golang/go/issues/66239

    [2] https://github.com/scratchdata/scratchdata/blob/7c1a0fcd0e20...

    [3] https://github.com/scratchdata/scratchdata/blob/7c1a0fcd0e20...

  • optimus

    Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management. (by raystack)

  • Data-Engineering-Projects

    Personal Data Engineering Projects

  • Project mention: Pitanje za data engineering? | /r/programiranje | 2023-06-30
  • multiwoven

    🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack.

  • Project mention: Multiwoven Reverse ETL (0.2.0) – Open-Source Alternative to Hightouch and Census | news.ycombinator.com | 2024-04-19

    Multiwoven is now a leading Open Source Alternative to Hightouch, Census, and Rudderstack.

    It's been a great journey so far, and we are excited to announce a major update to Multiwoven - our new release, Multiwoven 0.2.0, is now available!

    Repo: https://github.com/Multiwoven/multiwoven

    This release brings a host of new features, enhancements, and bug fixes to streamline data syncs and user experience.

    From new connectors to advanced reporting dashboards, as a team, we have been working hard on these updates based on the feedback and requests from our customers and the community.

    - 10+ new connectors added to Multiwoven, including

  • vulcan-sql

    Data API Framework for AI Agents and Data Apps

  • Project mention: Shout out to Appsmith developers to check out this new tool! | /r/lowcode | 2023-07-09

    I am one of the members of an open-source project VulcanSQL, a Data API Framework for data applications that helps data folks create and share data APIs faster.

  • DomainMOD

    DomainMOD is an open source application written in PHP & MySQL used to manage your domains and other internet assets in a central location. DomainMOD also includes a Data Warehouse framework that allows you to import your web server data so that you can view, export, and report on your live data.

  • Project mention: Self-hosted nameserver for Domain management | /r/selfhosted | 2023-05-29

    DomainMOD - Application to manage your domains and other internet assets in a central location. DomainMOD includes a Data Warehouse framework that allows you to import your WHM/cPanel web server data so that you can view, export, and report on your data.

  • versatile-data-kit

    One framework to develop, deploy and operate data workflows with Python and SQL.

  • space

    Unified storage framework for the entire machine learning lifecycle (by google)

  • Project mention: Unified storage framework for the entire machine learning lifecycle | news.ycombinator.com | 2024-02-28
  • data-engineering-project-template

    This is a template you can use for your next data engineering portfolio project.

  • beneath

    Beneath is a serverless real-time data platform ⚡️

  • pgwarehouse

    Easily sync your Postgres database to a Snowflake, ClickHouse, or DuckDB warehouse.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

data-warehouse related posts

Index

What are some of the best open-source data-warehouse projects? This list will help you:

Project Stars
1 awesome-bigdata 12,845
2 Greenplum 6,213
3 materialize 5,608
4 Rudderstack 3,947
5 hydra 2,651
6 DXY-COVID-19-Data 2,175
7 elementary 1,746
8 dlt 1,792
9 Cubes 1,490
10 tensorbase 1,429
11 Udacity-Data-Engineering-Projects 1,295
12 bigquery-utils 1,042
13 scratchdata 1,041
14 optimus 737
15 Data-Engineering-Projects 722
16 multiwoven 654
17 vulcan-sql 596
18 DomainMOD 454
19 versatile-data-kit 412
20 space 137
21 data-engineering-project-template 112
22 beneath 81
23 pgwarehouse 64

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com