Data science in Scala

This page summarizes the projects mentioned and recommended in the original post on /r/scala

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • Breeze

    Breeze is a numerical processing library for Scala.

  • You can use https://github.com/scalanlp/breeze. A Scala library that's sorta a numpy/plotting equivalent. Unlike Spark which covers more use cases than just the classic Data Science workflow, Breeze is built specifically for "Data Science in Scala". The drawback is a classic one in Scala land where some major libraries abruptly get abandoned. Breeze's commits seem to have slowed down significantly and their website on their github page www.scalanlp.org is broken.

  • spark-nlp

    State of the Art Natural Language Processing

  • I am not aware of common open frameworks like Tensorflow, PyTorch or Scikit-learn for Scala. But specifically for natural language processing, there's SparkNLP from John Snow Labs.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • SynapseML

    Simple and Distributed Machine Learning

  • b) There are libraries around e.g. Microsoft SynapseML, LinkedIn Photon ML

  • photon-ml

    A scalable machine learning library on Apache Spark

  • b) There are libraries around e.g. Microsoft SynapseML, LinkedIn Photon ML

  • saddle

    SADDLE: Scala Data Library (by pityka)

  • You might be interested in the saddle library which is a dataframe manipulation library similar to python pandas.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Spark NLP 5.1.0: Introducing state-of-the-art OpenAI Whisper speech-to-text, OpenAI Embeddings and Completion transformers, MPNet text embeddings, ONNX support for E5 text embeddings, new multi-lingual BART Zero-Shot text classification, and much more!

    1 project | /r/Python | 6 Sep 2023
  • PySpark for NLP Workshop - Materials and Jupyter Notebooks

    2 projects | /r/dataengineering | 14 May 2023
  • Spark-NLP 4.4.0: New BART for Text Translation & Summarization, new ConvNeXT Transformer for Image Classification, new Zero-Shot Text Classification by BERT, more than 4000+ state-of-the-art models, and many more! · JohnSnowLabs/spark-nlp

    1 project | /r/apachespark | 11 Apr 2023
  • Spark-NLP 4.4.0: New BART for Text Translation & Summarization, new ConvNeXT Transformer for Image Classification, new Zero-Shot Text Classification by BERT, more than 4000+ state-of-the-art models, and many more! · JohnSnowLabs/spark-nlp

    1 project | /r/java | 11 Apr 2023
  • Spark-NLP 4.4.0: New BART for Text Translation & Summarization, new ConvNeXT Transformer for Image Classification, new Zero-Shot Text Classification by BERT, more than 4000+ state-of-the-art models, and many more! · JohnSnowLabs/spark-nlp

    1 project | /r/Python | 11 Apr 2023