Reference Data Stack for Data-Driven Startups

This page summarizes the projects mentioned and recommended in the original post on dev.to

SurveyJS - Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App
With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
surveyjs.io
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • Rudderstack

    Privacy and Security focused Segment-alternative, in Golang and React

  • We also have telemetry set up on our Monosi product which is collected through Snowplow,. As with Airbyte, we chose Snowplow because of its open source offering and because of their scalable event ingestion framework. There are other open source options to consider including Jitsu and RudderStack or closed source options like Segment. Since we started building our product with just a CLI offering, we didn’t need a full CDP solution so we chose Snowplow.

  • PostgreSQL

    Mirror of the official PostgreSQL GIT repository. Note that this is just a *mirror* - we don't work with pull requests on github. To contribute, please see https://wiki.postgresql.org/wiki/Submitting_a_Patch

  • Technologies include Snowplow, Airbyte, PostgreSQL, dbt, Snowflake, Metabase, and Monosi itself. All of it is hosted on AWS running in a VPC.

  • SurveyJS

    Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.

    SurveyJS logo
  • grouparoo

    Discontinued 🦘 The Grouparoo Monorepo - open source customer data sync framework

  • There are other tools that we will have to adopt in the future but haven’t yet due to lack of necessity. Specifically, one category that is popular in modern data stacks is Reverse ETL (Hightouch, Census, or Grouparoo). We currently don’t have a usecase for piping data back into 3rd party tools but it will definitely come up in the future.

  • superset

    Apache Superset is a Data Visualization and Data Exploration Platform

  • To analyze the data stored in Snowflake and Postgres, we use Metabase. We chose Metabase because of it’s open source offering and easy to use interface. Other open source tools like Lightdash and Superset exist which we may add to the stack as our data team grows.

  • Snowplow

    The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP

  • We also have telemetry set up on our Monosi product which is collected through Snowplow,. As with Airbyte, we chose Snowplow because of its open source offering and because of their scalable event ingestion framework. There are other open source options to consider including Jitsu and RudderStack or closed source options like Segment. Since we started building our product with just a CLI offering, we didn’t need a full CDP solution so we chose Snowplow.

  • Metabase

    The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:

  • To analyze the data stored in Snowflake and Postgres, we use Metabase. We chose Metabase because of it’s open source offering and easy to use interface. Other open source tools like Lightdash and Superset exist which we may add to the stack as our data team grows.

  • jitsu

    Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days

  • We also have telemetry set up on our Monosi product which is collected through Snowplow,. As with Airbyte, we chose Snowplow because of its open source offering and because of their scalable event ingestion framework. There are other open source options to consider including Jitsu and RudderStack or closed source options like Segment. Since we started building our product with just a CLI offering, we didn’t need a full CDP solution so we chose Snowplow.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

  • First, we have data coming in from various third party tools such as Google Analytics, MailChimp, Github, Slack, etc. For extraction we use Airbyte.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Show HN: Open-source warehouse native product analytics

    1 project | news.ycombinator.com | 19 Apr 2023
  • What tools do you use/prefer for data analysis?

    1 project | /r/dataengineering | 12 Apr 2023
  • Show HN: Excel to Python Compiler

    3 projects | news.ycombinator.com | 23 May 2024
  • Ask HN: Founders who offer free/OS and paid SaaS, how do you manage your code?

    17 projects | news.ycombinator.com | 13 May 2024
  • Pg_lakehouse: Query Any Data Lake from Postgres

    1 project | news.ycombinator.com | 12 May 2024