SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Scala Spark Projects
-
delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs (by delta-io)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
deequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
-
kyuubi
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
LearningSparkV2
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
-
adam
ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed.
-
incubator-livy
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
-
sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
-
spark-solr
Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Delta is pretty great, let's you do upserts into tables in DataBricks much easier than without it.
I think the website is here: https://delta.io
Project mention: Spark NLP 5.1.0: Introducing state-of-the-art OpenAI Whisper speech-to-text, OpenAI Embeddings and Completion transformers, MPNet text embeddings, ONNX support for E5 text embeddings, new multi-lingual BART Zero-Shot text classification, and much more! | /r/Python | 2023-09-06
https://github.com/zio/zio-quill
This library does exactly what you prescribe. Pretty sure under the hood it's using macros with string templates
Scala Spark related posts
-
Spark NLP 5.1.0: Introducing state-of-the-art OpenAI Whisper speech-to-text, OpenAI Embeddings and Completion transformers, MPNet text embeddings, ONNX support for E5 text embeddings, new multi-lingual BART Zero-Shot text classification, and much more!
-
Azure data lake - Data Share
-
Pandas was faster and less memory intensive then crealytics pyspark. How is it possible?
-
The "Big Three's" Data Storage Offerings
-
Medallion/lakehouse architecture data modelling
-
How to build a data pipeline using Delta Lake
-
PySpark for NLP Workshop - Materials and Jupyter Notebooks
-
A note from our sponsor - SaaSHub
www.saashub.com | 6 May 2024
Index
What are some of the best open-source Spark projects in Scala? This list will help you:
Project | Stars | |
---|---|---|
1 | Apache Spark | 38,414 |
2 | delta | 6,919 |
3 | SynapseML | 4,970 |
4 | spark-nlp | 3,695 |
5 | deequ | 3,134 |
6 | Quill | 2,136 |
7 | kyuubi | 1,941 |
8 | spark-cassandra-connector | 1,930 |
9 | Jupyter Scala | 1,564 |
10 | mleap | 1,494 |
11 | LearningSparkV2 | 1,095 |
12 | adam | 967 |
13 | H2O | 952 |
14 | tispark | 878 |
15 | frameless | 870 |
16 | incubator-livy | 856 |
17 | spark-daria | 742 |
18 | spark-rapids | 722 |
19 | delta-sharing | 676 |
20 | sparkMeasure | 642 |
21 | spline | 582 |
22 | metorikku | 576 |
23 | spark-solr | 445 |
Sponsored