Java ETL

Open-source Java projects categorized as ETL

Top 9 Java ETL Projects

  • doris

    Apache Doris is an easy-to-use, high performance and unified analytics database.

  • Project mention: SQL Convertor for Easy Migration from Presto, Trino, ClickHouse, and Hive to Apache Doris | dev.to | 2024-05-27

    Apache Doris is an all-in-one data platform that is capable of real-time reporting, ad-hoc queries, data lakehousing, log management and analysis, and batch data processing. As more and more companies have been replacing their component-heavy data architecture with Apache Doris, there is an increasing need for a more convenient data migration solution. That's why the Doris SQL Convertor is made.

  • kestra

    Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

  • Project mention: A High-Performance, Java-Based Orchestration Platform | /r/java | 2023-10-11

    Kestra's communication is asynchronous and based on a queuing mechanism. It leverages the Micronaut framework and offers two runners: one that uses a database (JDBC) for both the message queue and resource storage, and another that uses Kafka as the message queue and Elasticsearch as the resource storage. The platform is fully extensible and plugin-based, providing a rich set of plugins for various workflow tasks, triggers, and data storage options. For those interested, the GitHub repository is available here: https://github.com/kestra-io/kestra

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • zingg

    Scalable identity resolution, entity resolution, data mastering and deduplication using ML

  • Smooks

    Extensible data integration Java framework for building XML and non-XML fragment-based applications

  • ReplicaDB

    ReplicaDB is open source tool for database replication, designed for efficiently transferring bulk data between relational and non-relational databases

  • kafka-connect-file-pulse

    🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka

  • Project mention: Kafka Connect Filepulse 2.13.0 is now available! This version includes support for SFTP and Alibaba OSS. It also contains many bug fixes and improvements. 🚀 | /r/apachekafka | 2023-09-15
  • neo4j-jdbc

    Official Neo4j JDBC Driver

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • contube

    ConTube: A scalable data connector framework that facilitates efficient data transfer between diverse systems.

  • Project mention: Show HN: ConTube – A Scalable Data Connect Framework for Pulsar/Kafka Ecosystems | news.ycombinator.com | 2023-12-04
  • dcc-import

    Reference data importers

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Java ETL related posts

  • Kafka Connect Filepulse 2.13.0 is now available! This version includes support for SFTP and Alibaba OSS. It also contains many bug fixes and improvements. 🚀

    1 project | /r/apachekafka | 15 Sep 2023
  • Best ‘E’TL tools for extracting data from on-prem SQL databases

    2 projects | /r/snowflake | 28 Mar 2022
  • Maven unable to resolve a dependency given in pom.xml. I've instead tried manually downloading installing the jar, but now maven cannot find the package.

    1 project | /r/learnjava | 30 Aug 2021
  • Download json and csv file from github repository with apache kafka

    1 project | /r/apachekafka | 29 Jul 2021
  • Streaming data into Kafka S01/E04 — Loading Log files using Grok Expression

    5 projects | dev.to | 5 Jan 2021
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 1 Jun 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source ETL projects in Java? This list will help you:

Project Stars
1 doris 11,547
2 kestra 6,803
3 zingg 895
4 Smooks 386
5 ReplicaDB 371
6 kafka-connect-file-pulse 308
7 neo4j-jdbc 127
8 contube 10
9 dcc-import 1

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com