C++ Big Data

Open-source C++ projects categorized as Big Data

Top 11 C++ Big Data Projects

  • ClickHouse

    ClickHouse® is a real-time analytics DBMS

  • Project mention: Universal Data Migration: Using Slingdata to Transfer Data Between Databases | dev.to | 2024-05-24

    ClickHouse installed and running.

  • NebulaGraph Database

    A distributed, fast open-source graph database featuring horizontal scalability and high availability (by vesoft-inc)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • GraphScope

    🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统

  • kudu

    Mirror of Apache Kudu (by apache)

  • Project mention: FLaNK Stack Weekly for 14 Aug 2023 | dev.to | 2023-08-14

    https://github.com/apache/kudu/blob/master/examples/quickstart/impala/README.adoc https://medium.com/@nifi.notes/building-an-effective-nifi-flow-replacetext-60a6016d378c https://community.cloudera.com/t5/Community-Articles/Running-DNS-and-Domain-Scanning-Tools-From-Apache-NiFi/ta-p/248484 https://community.cloudera.com/t5/Community-Articles/Using-Cloudera-Data-Science-Workbench-with-Apache-NiFi-and/ta-p/249469 https://community.cloudera.com/t5/Community-Articles/Scanning-Documents-into-Data-Lakes-via-Tesseract-MQTT-Python/ta-p/248492 https://community.cloudera.com/t5/Community-Articles/Adding-Stanford-CoreNLP-To-Big-Data-Pipelines-Apache-NiFi-1/ta-p/249378 https://community.cloudera.com/t5/Community-Articles/Using-Apache-NiFi-for-Speech-Processing-Speech-to-Text-with/ta-p/249242 https://community.cloudera.com/t5/Community-Articles/Ingesting-Flight-Data-ADS-B-USB-Receiver-with-Apache-NiFi-1/ta-p/247940 https://community.cloudera.com/t5/Community-Articles/Integrating-lucene-geo-gazetteer-For-Geo-Parsing-with-Apache/ta-p/247993 https://community.cloudera.com/t5/Community-Articles/Creating-WordClouds-From-DataFlows-with-Apache-NiFi-and/ta-p/246605 https://community.cloudera.com/t5/Community-Articles/NIFI-1-x-For-Automatic-Music-Playing-Pipelines/ta-p/247994 https://community.cloudera.com/t5/Community-Articles/Using-Apache-NiFi-with-Apache-MXNet-GluonCV-for-YOLO-3-Deep/ta-p/248979 https://community.cloudera.com/t5/Community-Articles/Tracking-Air-Quality-with-HDP-and-HDF-Part-1-Apache-NiFi/ta-p/248265 https://community.cloudera.com/t5/Community-Articles/Monitoring-Energy-Usage-Utilizing-Apache-NiFi-Python-Apache/ta-p/247525 https://community.cloudera.com/t5/Community-Articles/Using-Command-Line-Security-Tools-from-Apache-NiFi/ta-p/248158 https://community.cloudera.com/t5/Community-Articles/Apache-NiFi-Processor-for-Apache-MXNet-SSD-Single-Shot/ta-p/249240 https://community.cloudera.com/t5/Community-Articles/Ingesting-Apache-MXNet-Gluon-Deep-Learning-Results-Via-MQTT/ta-p/248544 https://community.cloudera.com/t5/Community-Articles/Updating-The-Apache-OpenNLP-Community-Apache-NiFi-Processor/ta-p/248398 https://community.cloudera.com/t5/Community-Articles/Integration-Apache-OpenNLP-1-8-4-into-Apache-NiFi-1-5-For/ta-p/248010 https://community.cloudera.com/t5/Community-Articles/Tracking-Phone-Location-for-Android-and-IoT-with-OwnTracks/ta-p/244875 https://community.cloudera.com/t5/Community-Articles/Ingesting-Drone-Data-From-Ryze-Tello-Part-1-Setup-and/ta-p/249422 https://community.cloudera.com/t5/Community-Articles/Ingesting-RDBMS-Data-As-New-Tables-Arrive-Automagically-into/ta-p/246214 https://community.cloudera.com/t5/Community-Articles/Incrementally-Streaming-RDBMS-Data-to-Your-Hadoop-DataLake/ta-p/247927 https://community.cloudera.com/t5/Community-Articles/Ingesting-and-Analyzing-Street-Camera-Data-from-Major-US/ta-p/249194 https://community.cloudera.com/t5/Community-Articles/Basic-Image-Processing-and-Linux-Utilities-As-Part-of-a-Big/ta-p/249121 https://community.cloudera.com/t5/Community-Articles/Hosting-and-Ingesting-Data-From-Web-Pages-Desktop-and-Mobile/ta-p/244575 https://community.cloudera.com/t5/Community-Articles/QADCDC-Our-how-to-ingest-some-database-tables-to-Hadoop-Very/ta-p/245229 https://community.cloudera.com/t5/Community-Articles/Tracking-Air-Quality-with-HDP-and-HDF-Part-2-Indoor-Air/ta-p/249471 https://community.cloudera.com/t5/Community-Articles/Streaming-Ingest-of-Google-Sheets-with-HDF-2-0/ta-p/247764 https://community.cloudera.com/t5/Community-Articles/Ingesting-Golden-Gate-Records-From-Apache-Kafka-and/ta-p/247557 https://community.cloudera.com/t5/Community-Articles/Data-Processing-Pipeline-Parsing-PDFs-and-Identifying-Names/ta-p/249105 https://community.cloudera.com/t5/Community-Articles/Using-A-TensorFlow-quot-Person-Blocker-quot-With-Apache-NiFi/ta-p/248141 https://community.cloudera.com/t5/Community-Articles/Su-Su-Sussudio-Sudoers-Log-Parsing-with-Apache-NiFi/ta-p/249461 https://community.cloudera.com/t5/Community-Articles/Integrating-IBM-Watson-Machine-Learning-APIs-with-Apache/ta-p/247545 https://community.cloudera.com/t5/Community-Articles/Simple-Change-Data-Capture-CDC-with-SQL-Selects-via-Apache/ta-p/308376 https://community.cloudera.com/t5/Community-Articles/Deep-Learning-IoT-Workflows-with-Raspberry-Pi-MQTT-MXNet/ta-p/249456 https://community.cloudera.com/t5/Community-Articles/Parsing-Web-Pages-for-Images-with-Apache-NiFi/ta-p/248415 https://community.cloudera.com/t5/Community-Articles/Trigger-SonicPi-Music-Via-Apache-NiFi/ta-p/248587 https://community.cloudera.com/t5/Community-Articles/Using-Parsey-McParseFace-Google-TensorFlow-Syntaxnet-From/ta-p/246337 https://community.cloudera.com/t5/Community-Articles/Ingesting-osquery-Into-Apache-Phoenix-using-Apache-NiFi/ta-p/249308 https://community.cloudera.com/t5/Community-Articles/Converting-PowerPoint-Presentations-into-French-from-English/ta-p/248974 https://community.cloudera.com/t5/Community-Articles/Posting-Images-with-Apache-NiFi-1-7-and-a-Custom-Processor/ta-p/249017 https://community.cloudera.com/t5/Community-Articles/Parsing-Any-Document-with-Apache-NiFi-1-5-with-Apache-Tika/ta-p/247672

  • ytsaurus

    YTsaurus is a scalable and fault-tolerant open-source big data platform.

  • ArcticDB

    ArcticDB is a high performance, serverless DataFrame database built for the Python Data Science ecosystem.

  • Project mention: Speed Test - ArcticDB, HDF, Feather, Parquet | /r/algotrading | 2023-11-21

    ArcticDB is a new data store for pandas DataFrames (https://arcticdb.io/). I have no affiliation with the project but wanted to see how it would compare on speed versus the other file format storage options available in Pandas: HDF, Feather, and Parquet. I could not find much on-line about how Arctic compares to the other options in terms of speed, so I ran some tests myself.

  • PGM-index

    🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • oneDAL

    oneAPI Data Analytics Library (oneDAL)

  • ustore

    Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️

  • incubator-graphar

    An open source, standard data file format for graph data storage and retrieval.

  • nebula

    A distributed block-based data storage and compute engine (by varchar-io)

  • Project mention: Show HN: Interactive Graph by LLM (GPT-4o) | news.ycombinator.com | 2024-05-19

    So it's a pretty simple wrapper of LLM model in use (currently gpt-4o), it does not add much technical stuff in it.

    It does not use database for any "random search", but yes, columns.ai is a data analytics tool that allows you to connect supported live data sources like Google Spreadsheet, Airtable, Notion Database to create visual stories.

    The analytics engine is home built (https://github.com/varchar-io/nebula) but it is not a database. And I don't use LLM agents, just build logic how to purify data returned by LLM, and fit them into an optimized visualization.

    Hope I answered your question!

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

C++ Big Data related posts

  • Fair Benchmarking Considered Difficult (2018) [pdf]

    2 projects | news.ycombinator.com | 10 Mar 2024
  • Ask HN: Where to Store Logs?

    3 projects | news.ycombinator.com | 21 Oct 2023
  • Show HN: Hydra 1.0 – open-source column-oriented Postgres

    12 projects | news.ycombinator.com | 3 Aug 2023
  • GraphAr Release v0.5.0

    1 project | news.ycombinator.com | 16 May 2023
  • Show HN: GraphAr – An File Format for Graph Data Archiving/Exchanging

    1 project | news.ycombinator.com | 3 May 2023
  • Show HN: GraphAr – Open-source file format for archiving/exchanging graph data

    1 project | news.ycombinator.com | 27 Apr 2023
  • Show HN: GraphAr – Open-source file format for archiving/exchanging graph data

    1 project | news.ycombinator.com | 18 Apr 2023
  • A note from our sponsor - InfluxDB
    www.influxdata.com | 2 Jun 2024
    Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Big Data projects in C++? This list will help you:

Project Stars
1 ClickHouse 34,836
2 NebulaGraph Database 10,263
3 GraphScope 3,131
4 kudu 1,815
5 ytsaurus 1,776
6 ArcticDB 1,177
7 PGM-index 768
8 oneDAL 593
9 ustore 497
10 incubator-graphar 183
11 nebula 151

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com