Java ETL

Open-source Java projects categorized as ETL

Top 9 Java ETL Projects

  • doris

    Apache Doris is an easy-to-use, high performance and unified analytics database.

  • Project mention: Variant in Apache Doris 2.1.0: a new data type 8 times faster than JSON for semi-structured data analysis | dev.to | 2024-03-27

    As an open-source real-time data warehouse, Apache Doris provides semi-structured data processing capabilities, and the newly-released version 2.1.0 makes a stride in this direction. Before V2.1, Apache Doris stores semi-structured data as JSON files. However, during query execution, the real-time parsing of JSON data leads to high CPU and I/O consumption in addition to high query latency, especially when the dataset is huge and complicated. Moreover, the lack of a pre-defined schema means there is no handle for query optimization.

  • kestra

    Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

  • Project mention: A High-Performance, Java-Based Orchestration Platform | /r/java | 2023-10-11

    Kestra's communication is asynchronous and based on a queuing mechanism. It leverages the Micronaut framework and offers two runners: one that uses a database (JDBC) for both the message queue and resource storage, and another that uses Kafka as the message queue and Elasticsearch as the resource storage. The platform is fully extensible and plugin-based, providing a rich set of plugins for various workflow tasks, triggers, and data storage options. For those interested, the GitHub repository is available here: https://github.com/kestra-io/kestra

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • zingg

    Scalable identity resolution, entity resolution, data mastering and deduplication using ML

  • Smooks

    Extensible data integration Java framework for building XML and non-XML fragment-based applications

  • ReplicaDB

    ReplicaDB is open source tool for database replication, designed for efficiently transferring bulk data between relational and non-relational databases

  • kafka-connect-file-pulse

    🔗 A multipurpose Kafka Connect connector that makes it easy to parse, transform and stream any file, in any format, into Apache Kafka

  • Project mention: Kafka Connect Filepulse 2.13.0 is now available! This version includes support for SFTP and Alibaba OSS. It also contains many bug fixes and improvements. 🚀 | /r/apachekafka | 2023-09-15
  • neo4j-jdbc

    Official Neo4j JDBC Driver

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • contube

    ConTube: A scalable data connector framework that facilitates efficient data transfer between diverse systems.

  • Project mention: Show HN: ConTube – A Scalable Data Connect Framework for Pulsar/Kafka Ecosystems | news.ycombinator.com | 2023-12-04
  • dcc-import

    Reference data importers

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Java ETL related posts

  • Kafka Connect Filepulse 2.13.0 is now available! This version includes support for SFTP and Alibaba OSS. It also contains many bug fixes and improvements. 🚀

    1 project | /r/apachekafka | 15 Sep 2023
  • Best ‘E’TL tools for extracting data from on-prem SQL databases

    2 projects | /r/snowflake | 28 Mar 2022
  • Maven unable to resolve a dependency given in pom.xml. I've instead tried manually downloading installing the jar, but now maven cannot find the package.

    1 project | /r/learnjava | 30 Aug 2021
  • Download json and csv file from github repository with apache kafka

    1 project | /r/apachekafka | 29 Jul 2021
  • Streaming data into Kafka S01/E04 — Loading Log files using Grok Expression

    5 projects | dev.to | 5 Jan 2021
  • A note from our sponsor - SaaSHub
    www.saashub.com | 17 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source ETL projects in Java? This list will help you:

Project Stars
1 doris 11,452
2 kestra 6,605
3 zingg 889
4 Smooks 385
5 ReplicaDB 367
6 kafka-connect-file-pulse 305
7 neo4j-jdbc 126
8 contube 10
9 dcc-import 1

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com