Bigdata

Open-source projects categorized as Bigdata

Top 23 Bigdata Open-Source Projects

  • TDengine

    TDengine is an open source, high-performance, cloud native time-series database optimized for Internet of Things (IoT), Connected Cars, Industrial IoT and DevOps.

  • Project mention: TDengine: NEW Data - star count:22190.0 | /r/algoprojects | 2023-11-14
  • shardingsphere

    Distributed SQL transaction & query engine for data sharding, scaling, encryption, and more - on any database.

  • Project mention: Managing Data Residency - the demo | dev.to | 2023-05-25

    Opposite to what the documentation tells, the full prefix is jdbc:shardingsphere:absolutepath. I've opened a PR to fix the documentation.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • awesome-bigdata

    A curated list of awesome big data frameworks, ressources and other awesomeness.

  • Project mention: Good coding groups for black women? | news.ycombinator.com | 2024-01-13
  • juicefs

    JuiceFS is a distributed POSIX file system built on top of Redis and S3.

  • Project mention: Data Sync in JuiceFS 1.2: Enhanced Selective Sync and Performance Optimizations | dev.to | 2024-05-17

    In JuiceFS 1.2, we introduced several new features for juicefs sync. We also optimized performance for multiple scenarios to improve users' data synchronization efficiency when dealing with large directories and complex migrations.

  • vaex

    Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second šŸš€

  • hudi

    Upserts, Deletes And Incremental Processing on Big Data.

  • Project mention: Getting Started with Flink SQL, Apache Iceberg and DynamoDB Catalog | dev.to | 2023-12-18

    Apache Iceberg is one of the three types of lakehouse, the other two are Apache Hudi and Delta Lake.

  • OpenMetadata

    Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.

  • Project mention: How to Dynamically Adjust the Height of a Textarea in ReactJS | dev.to | 2023-10-25

    In this blog post, I have demonstrated how I addressed the challenge of dynamically adjusting the height of a textarea element based on its content, preventing the need for vertical scrolling in the title section of the OpenMetadata Knowledge article page.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • volcano

    A Cloud Native Batch System (Project under CNCF)

  • Apache Avro

    Apache Avro is a data serialization system.

  • Project mention: Open Table Formats Such as Apache Iceberg Are Inevitable for Analytical Data | news.ycombinator.com | 2024-01-18

    Apache AVRO [1] is one but it has been largely replaced by Parquet [2] which is a hybrid row/columnar format

    [1] https://avro.apache.org/

  • dpark

    Python clone of Spark, a MapReduce alike framework in Python

  • griddb

    GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.

  • Project mention: griddb: NEW Data - star count:2133.0 | /r/algoprojects | 2023-07-31
  • spark

    .NET for ApacheĀ® Sparkā„¢ makes Apache Sparkā„¢ easily accessible to .NET developers. (by dotnet)

  • Optimus

    :truck: Agile Data Preparation Workflows madeĀ easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark (by ironmussa)

  • tensorbase

    TensorBase is a new big data warehousing with modern efforts.

  • odd-platform

    First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

  • Project mention: OpenDataDiscovery 0.15 with Data Deprecation and Metadata Stale | news.ycombinator.com | 2023-08-04
  • cds

    Data syncing in golang for ClickHouse. (by zeromicro)

  • Mobius: C# API for Spark

    C# and F# language binding and extensions to Apache Spark (by microsoft)

  • tispark

    TiSpark is built for running Apache Spark on top of TiDB/TiKV

  • incubator-livy

    Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.

  • visualpython

    GUI-based Python code generator for data science, extension to Jupyter Lab, Jupyter Notebook and Google Colab.

  • Gearpump

    Lightweight real-time big data streaming engine over Akka

  • WeDataSphere

    WeDataSphere is a financial grade, one-stop big data platform suite.

  • spline

    Data Lineage Tracking And Visualization Solution (by AbsaOSS)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Bigdata related posts

Index

What are some of the best open-source Bigdata projects? This list will help you:

Project Stars
1 TDengine 22,870
2 shardingsphere 19,475
3 awesome-bigdata 12,845
4 juicefs 9,881
5 vaex 8,180
6 hudi 5,114
7 OpenMetadata 4,271
8 volcano 3,805
9 Apache Avro 2,780
10 dpark 2,691
11 griddb 2,324
12 spark 2,000
13 Optimus 1,447
14 tensorbase 1,429
15 odd-platform 1,124
16 cds 956
17 Mobius: C# API for Spark 939
18 tispark 877
19 incubator-livy 857
20 visualpython 811
21 Gearpump 765
22 WeDataSphere 640
23 spline 583

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com