Python apache-spark

Open-source Python projects categorized as apache-spark

Top 12 Python apache-spark Projects

  • MLflow

    Open source platform for the machine learning lifecycle

  • Project mention: Mlflow: Open-source platform for the machine learning lifecycle | news.ycombinator.com | 2024-05-16
  • flintrock

    A command-line tool for launching Apache Spark clusters.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • quinn

    pyspark methods to enhance developer productivity 📣 👯 🎉 (by MrPowers)

  • PySpark-Boilerplate

    A boilerplate for writing PySpark Jobs

  • sparktorch

    Train and run Pytorch models on Apache Spark.

  • dataproc-templates

    Dataproc templates and pipelines for solving simple in-cloud data tasks

  • Apache-Spark-Guide

    Apache Spark Guide

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • covid-19-data-engineering-pipeline

    A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.

  • Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data

    Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average speed, occupancy and density were produced.

  • xonai-dashboard

    A Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver

  • Project mention: Show HN: Open sourcing a Big Data monitoring tool | news.ycombinator.com | 2024-03-29
  • livyc

    Apache Spark as a Service with Apache Livy Client

  • transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue

    Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming using Amazon MSK and MSK Connect (Debezium)

  • Project mention: Writing simple Python scripts faster with Amazon Q | dev.to | 2024-01-11

    transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue 2024-01-10T01:26:56Z https://github.com/aws-samples/transactional-datalake-using-amazon-msk-serverless-and-apache-iceberg-on-aws-glue aws-msk-serverless-cdc-data-pipeline-with-debezium 2024-01-09T01:03:38Z https://github.com/aws-samples/aws-msk-serverless-cdc-data-pipeline-with-debezium aws-healthlake-smart-on-fhir 2024-01-08T23:05:17Z https://github.com/aws-samples/aws-healthlake-smart-on-fhir aws-greengrass-custom-components 2024-01-08T11:34:12Z https://github.com/aws-samples/aws-greengrass-custom-components graviton-developer-workshop 2024-01-08T03:30:31Z https://github.com/aws-samples/graviton-developer-workshop msk-flink-streaming-cdk 2024-01-08T02:25:39Z https://github.com/aws-samples/msk-flink-streaming-cdk rag-with-amazon-postgresql-using-pgvector 2024-01-06T04:47:41Z https://github.com/aws-samples/rag-with-amazon-postgresql-using-pgvector queueTransfer_ContactTraceRecordSupport-for-Service-Cloud-Voice 2024-01-05T20:34:14Z https://github.com/aws-samples/queueTransfer_ContactTraceRecordSupport-for-Service-Cloud-Voice amazon-chime-sdk-voice-voice-translator 2024-01-05T17:25:54Z https://github.com/aws-samples/amazon-chime-sdk-voice-voice-translator private-s3-vpce 2024-01-05T06:38:52Z https://github.com/aws-samples/private-s3-vpce bedrock-contact-center-tasks-eval 2024-01-04T21:46:51Z https://github.com/aws-samples/bedrock-contact-center-tasks-eval clickstream-sdk-samples 2024-01-04T07:21:52Z https://github.com/aws-samples/clickstream-sdk-samples aws-msk-cdc-data-pipeline-with-debezium 2024-01-04T04:09:22Z https://github.com/aws-samples/aws-msk-cdc-data-pipeline-with-debezium transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue 2024-01-04T03:39:04Z https://github.com/aws-samples/transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue ..

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python apache-spark related posts

  • Mlflow: Open-source platform for the machine learning lifecycle

    1 project | news.ycombinator.com | 16 May 2024
  • Observations on MLOps–A Fragmented Mosaic of Mismatched Expectations

    1 project | dev.to | 26 Apr 2024
  • Explain me how websites like Dall-E, chatgpt, thispersondoesntexit process the user data so quickly

    1 project | /r/dataengineering | 17 Jun 2023
  • [D] What licensed software do you use for machine learning experimentation tracking?

    1 project | /r/MachineLearning | 11 Jun 2023
  • [Q] Is there a tool to keep track of my ML experiments?

    1 project | /r/datascience | 13 May 2023
  • Remote file access vulnerability in `mlflow server` and `mlflow ui` CLIs

    1 project | /r/LanguageTechnology | 24 Mar 2023
  • Critical CVE in `mlflow` 2.2.0 and under: Remote file access vulnerability in `mlflow server` and `mlflow ui` CLIs; possible lateral movement into aws creds

    1 project | /r/MachineLearning | 24 Mar 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 30 May 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source apache-spark projects in Python? This list will help you:

Project Stars
1 MLflow 17,475
2 flintrock 633
3 quinn 583
4 PySpark-Boilerplate 391
5 sparktorch 335
6 dataproc-templates 112
7 Apache-Spark-Guide 28
8 covid-19-data-engineering-pipeline 22
9 Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data 10
10 xonai-dashboard 11
11 livyc 3
12 transactional-datalake-using-amazon-msk-and-apache-iceberg-on-aws-glue 1

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com