Jupyter Notebook Datasets

Open-source Jupyter Notebook projects categorized as Datasets

Top 11 Jupyter Notebook Dataset Projects

  • indonlu

    The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)

  • cleora

    Cleora AI is a general-purpose model for efficient, scalable learning of stable and inductive entity embeddings for heterogeneous relational data.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • SKAB

    SKAB - Skoltech Anomaly Benchmark. Time-series data for evaluating Anomaly Detection algorithms.

  • Project mention: SKAB: NEW Data - star count:238.0 | /r/algoprojects | 2023-09-25
  • Tegridy-MIDI-Dataset

    Tegridy MIDI Dataset for precise and effective Music AI models creation.

  • ekya

    Source code and datasets for Ekya, a system for continuous learning on the edge.

  • artificial-self-AMLD-2020

    Workshop material for the AMLD 2020 workshop on "Meet your Artificial Self: Generate text that sounds like you"

  • openfema-samples

    Code, dataset, and analysis samples that utilize the OpenFEMA API.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • intel-processors

    Datasets for All Processors Maufactured By Intel

  • Project mention: Get CPU max turbo freq? API for CPU specs? | /r/PowerShell | 2023-06-22
  • parsee-datasets

    Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai

  • Project mention: FinRAG Datasets and Study | news.ycombinator.com | 2024-05-07

    To test this, we created 3 different datasets, all based on the same selection of 1,156 randomly selected annual reports for the year 2023 of publicly listed US companies.

    The resulting (fully labeled) datasets contain a combined total of 10,404 rows, 37,536,847 tokens and 1,156 images and can be found on Github and Huggingface: https://github.com/parsee-ai/parsee-datasets/tree/main/datas...

    For our study, we are evaluating 8 state-of-the-art (M)LLMs on a subset of 100 reports with some interesting results.

  • Data-Science-Data-Analystics-Contribution---Hacktoberfest-2022

    About Submit Just 4 PRs to earn Tshirts🔥 in Hacktoberfest 2022

  • ProTaska-GPT

    Unleash the Potential of Datasets with Intelligent Tasks, Tutorials, and Algorithm Recommendations.

  • Project mention: Learn Data Science with a GPT-powered Tutor: ProTaska-GPT | /r/learnpython | 2023-06-19
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Jupyter Notebook Datasets related posts

  • FinRAG Datasets and Study

    1 project | news.ycombinator.com | 7 May 2024
  • Mastering ROUGE Matrix: Your Guide to Large Language Model Evaluation for Summarization with Examples

    2 projects | dev.to | 8 Oct 2023
  • On Data Quality

    3 projects | dev.to | 24 May 2022

Index

What are some of the best open-source Dataset projects in Jupyter Notebook? This list will help you:

Project Stars
1 indonlu 490
2 cleora 477
3 SKAB 296
4 Tegridy-MIDI-Dataset 127
5 ekya 94
6 artificial-self-AMLD-2020 80
7 openfema-samples 21
8 intel-processors 16
9 parsee-datasets 61
10 Data-Science-Data-Analystics-Contribution---Hacktoberfest-2022 5
11 ProTaska-GPT 2

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com