Apache Spark, Hive, and Spring Boot — Testing Guide

This page summarizes the projects mentioned and recommended in the original post on dev.to

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • apache-spark-integration-testing-example

    An example of integration tests for Apache Spark Spring Boot application that transfers data from Apache Hive table to Aerospike Database.

  • In this article, I'm showing you how to create a Spring Boot app that loads data from Apache Hive via Apache Spark to the Aerospike Database. More than that, I'm giving you a recipe for writing integration tests for such scenarios that can be run either locally or during the CI pipeline execution. The code examples are taken from this repository.

  • shadow

    Gradle plugin to create fat/uber JARs, apply file transforms, and relocate packages for applications and libraries. Gradle version of Maven's Shade plugin.

  • The result .jar is going to submitted to Apache Spark cluster (e.g. spark-submit command). So, it should contain all runtime artefacts. Unfortunately, the standard Spring Boot packaging does not put the dependencies in the way Apache Spark expects it. So, we'll use shadow-jar Gradle plugin. Take a look at the example below.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • initializr

    A quickstart generator for Spring projects

  • The project is bootstrapped with Spring Initializr. Nothing special here. But the dependencies list should be clarified.

  • Apache Spark

    Apache Spark - A unified analytics engine for large-scale data processing

  • In this article, I'm showing you how to create a Spring Boot app that loads data from Apache Hive via Apache Spark to the Aerospike Database. More than that, I'm giving you a recipe for writing integration tests for such scenarios that can be run either locally or during the CI pipeline execution. The code examples are taken from this repository.

  • Apache Hive

    Apache Hive

  • In this article, I'm showing you how to create a Spring Boot app that loads data from Apache Hive via Apache Spark to the Aerospike Database. More than that, I'm giving you a recipe for writing integration tests for such scenarios that can be run either locally or during the CI pipeline execution. The code examples are taken from this repository.

  • Aerospike

    Aerospike Database Server – flash-optimized, in-memory, nosql database

  • In this article, I'm showing you how to create a Spring Boot app that loads data from Apache Hive via Apache Spark to the Aerospike Database. More than that, I'm giving you a recipe for writing integration tests for such scenarios that can be run either locally or during the CI pipeline execution. The code examples are taken from this repository.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Apache Iceberg as storage for on-premise data store (cluster)

    3 projects | /r/dataengineering | 16 Mar 2023
  • DeWitt Clause, or Can You Benchmark %DATABASE% and Get Away With It

    21 projects | dev.to | 2 Jun 2022
  • Does Java have an open source package that can execute SQL on txt/csv?

    3 projects | /r/programming | 22 Sep 2021
  • Best library for CSV to XML or JSON.

    2 projects | /r/javahelp | 1 Jul 2021
  • 5 Best Big Data Frameworks You Can Learn in 2021

    3 projects | dev.to | 18 Jun 2021