Five Apache projects you probably didn't know about

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Apache Spark

101 38,530 10.0 Scala

Apache Spark - A unified analytics engine for large-scale data processing

Apache SeaTunnel is a data integration platform that offers the three pillars of data pipelines: sources, transforms, and sinks. It offers an abstract API over three possible engines: the Zeta engine from SeaTunnel or a wrapper around Apache Spark or Apache Flink. Be careful, as each engine comes with its own set of features.

skywalking

23 23,321 9.5 Java

APM, Application Performance Monitoring System

Apache SkyWalking is an APM tool, focusing on microservices, Cloud Native apps, and Kuernetes architectures. It builds its architecture on four kinds of components:

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
shardingsphere-elasticjob-ui

8 156 0.0 Java

Administrator console of ElasticJob

ShardingSphere claims to offer an ecosystem able to transform any database into a distributed database system. It acts as a proxy between your code and your database(s). It comes in two flavors:

seatunnel

31 7,431 9.8 Java

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.

Apache SeaTunnel is a data integration platform that offers the three pillars of data pipelines: sources, transforms, and sinks. It offers an abstract API over three possible engines: the Zeta engine from SeaTunnel or a wrapper around Apache Spark or Apache Flink. Be careful, as each engine comes with its own set of features.

Nginx

100 20,354 8.8 C

An official read-only mirror of http://hg.nginx.org/nginx/ which is updated hourly. Pull requests on GitHub cannot be accepted and will be automatically closed. The proper way to submit changes to nginx is via the nginx development mailing list, see http://nginx.org/en/docs/contributing_changes.html

APISIX is an API Gateway. It builds upon OpenResty, a Lua layer built on top of the famous nginx reverse-proxy. APISIX adds abstractions to the mix, e.g., Route, Service, Upstream, and offers a plugin-based architecture.

flink-kubernetes-operator

8 729 9.2 Java

Apache Flink Kubernetes Operator

Apache SeaTunnel is a data integration platform that offers the three pillars of data pipelines: sources, transforms, and sinks. It offers an abstract API over three possible engines: the Zeta engine from SeaTunnel or a wrapper around Apache Spark or Apache Flink. Be careful, as each engine comes with its own set of features.

doris

42 11,504 10.0 Java

Apache Doris is an easy-to-use, high performance and unified analytics database.

Apache Doris is a real-time data warehouse.

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
apisix-ingress-controller

33 954 8.7 Go

APISIX Ingress Controller for Kubernetes

In early 2021, I started to work on the Apache APISIX project. I have to admit that I had never heard about it before. In this post, I'd like to introduce some Apache projects that are less well-known than HTTPD or Kafka.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Apache Iceberg as storage for on-premise data store (cluster)

3 projects | /r/dataengineering | 16 Mar 2023
Uber Interview Experience/Asking Suggestions

4 projects | /r/dataengineering | 1 Feb 2023
What is the separation of storage and compute in data platforms and why does it matter?

3 projects | dev.to | 29 Nov 2022
What are your favourite GitHub repos that shows how data engineering should be done?

4 projects | /r/dataengineering | 18 Nov 2022
5 Reasons Your Data Lakehouse should Embrace Dremio Cloud

2 projects | dev.to | 9 Aug 2022

Five Apache projects you probably didn't know about

This page summarizes the projects mentioned and recommended in the original post on dev.to
Big Data Java Real-time SQL Spark
Post date: 21 Dec 2023

Apache Spark

skywalking

InfluxDB

shardingsphere-elasticjob-ui

seatunnel

Nginx

flink-kubernetes-operator

doris

SaaSHub

apisix-ingress-controller

Related posts

Apache Iceberg as storage for on-premise data store (cluster)

Uber Interview Experience/Asking Suggestions

What is the separation of storage and compute in data platforms and why does it matter?

What are your favourite GitHub repos that shows how data engineering should be done?

5 Reasons Your Data Lakehouse should Embrace Dremio Cloud

Five Apache projects you probably didn't know about

This page summarizes the projects mentioned and recommended in the original post on dev.to Big Data Java Real-time SQL Spark Post date: 21 Dec 2023

Related posts

Apache Iceberg as storage for on-premise data store (cluster)

Uber Interview Experience/Asking Suggestions

What is the separation of storage and compute in data platforms and why does it matter?

What are your favourite GitHub repos that shows how data engineering should be done?

5 Reasons Your Data Lakehouse should Embrace Dremio Cloud

This page summarizes the projects mentioned and recommended in the original post on dev.to
Big Data Java Real-time SQL Spark
Post date: 21 Dec 2023