Jupyter Notebook Data Analysis

Open-source Jupyter Notebook projects categorized as Data Analysis

Top 23 Jupyter Notebook Data Analysis Projects

  • Data-Science-For-Beginners

    10 Weeks, 20 Lessons, Data Science for All!

  • Project mention: Welcome to 14 days of Data Science! | dev.to | 2024-03-07

    Get started with Data Science in the Data Science for Beginners curricula.

  • pandas_exercises

    Practice your pandas skills!

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • machine_learning_complete

    A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

  • Data-science

    Collection of useful data science topics along with articles, videos, and code (by khuyentran1401)

  • ML-Workspace

    🛠 All-in-one web-based IDE specialized for machine learning and data science.

  • Linear-Algebra-With-Python

    Lecture Notes for Linear Algebra Featuring Python. This series of lecture notes will walk you through all the must-know concepts that set the foundation of data science or advanced quantitative skillsets. Suitable for statistician/econometrician, quantitative analysts, data scientists and etc. to quickly refresh the linear algebra with the assistance of Python computation and visualization.

  • Project mention: Python for Econometrics for Practitioners [Free Online Courses] | /r/CompSocial | 2023-08-24

    Linear Algebra with Python: This training will walk you through all the must-know concepts that set the foundation of data science or advanced quantitative skill sets. Suitable for statisticians, econometricians, quantitative analysts, data scientists, etc. to quickly refresh linear algebra with the assistance of Python computation and visualization. Core concepts covered are: linear combination, vector space, linear transformation, eigenvalues and -vector, diagnolization, singular value decomposition, etc.

  • 100-pandas-puzzles

    100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • pymc-resources

    PyMC educational resources

  • Project mention: Bayesian Analysis with Python | news.ycombinator.com | 2024-02-10

    As it happens, there's a PyMC implementation of the 1st and 2nd editions of Statistical Rethinking here:

    https://github.com/pymc-devs/pymc-resources

    (I think the author of the book discussed above, Osvaldo Martin, is the primary or sole contributor for the Rethinking implementations, in fact -- he had a full implementation in his own repo [here](https://github.com/aloctavodia/Statistical-Rethinking-with-P...) before deprecating it in favor of the above-linked one.)

  • hyperlearn

    2-2000x faster ML algos, 50% less memory usage, works on all hardware - new and old.

  • Project mention: 80% faster, 50% less memory, 0% loss of accuracy Llama finetuning | news.ycombinator.com | 2023-12-01

    Good point - the main issue is we encountered this exact issue with our old package Hyperlearn (https://github.com/danielhanchen/hyperlearn).

    I OSSed all the code to the community - I'm actually an extremely open person and I love contributing to the OSS community.

    The issue was the package got gobbled up by other startups and big tech companies with no credit - I didn't want any cash from it, but it stung and hurt really bad hearing other startups and companies claim it was them who made it faster, whilst it was actually my work. It hurt really bad - as an OSS person, I don't want money, but just some recognition for the work.

    I also used to accept and help everyone with their writing their startup's software, but I never got paid or even any thanks - sadly I didn't expect the world to be such a hostile place.

    So after a sad awakening, I decided with my brother instead of OSSing everything, we would first OSS something which is still very good - 5X faster training is already very reasonable.

    I'm all open to other suggestions on how we should approach this though! There are no evil intentions - in fact I insisted we OSS EVERYTHING even the 30x faster algos, but after a level headed discussion with my brother - we still have to pay life expenses no?

    If you have other ways we can go about this - I'm all ears!! We're literally making stuff up as we go along!

  • hamilton

    Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.

  • Project mention: Building an Email Assistant Application with Burr | dev.to | 2024-04-26

    Note that this uses simple OpenAI calls — you can replace this with Langchain, LlamaIndex, Hamilton (or something else) if you prefer more abstraction, and delegate to whatever LLM you like to use. And, you should probably use something a little more concrete (E.G. instructor) to guarantee output shape.

  • kangas

    🦘 Explore multimedia datasets at scale

  • Project mention: Kangas: Pandas for Multimedia Datasets | news.ycombinator.com | 2023-05-03
  • qs_ledger

    Quantified Self Personal Data Aggregator and Data Analysis

  • machine-learning

    Practical Full-Stack Machine Learning

  • datacamp

    🍧 DataCamp data-science and machine learning courses

  • tempo

    API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation (by databrickslabs)

  • rust-data-analysis

    Rust for data analysis encyclopedia (WIP).

  • Project mention: Ask HN: Rust Viable for Data Analytics? | news.ycombinator.com | 2024-02-01

    Rust still has some key pieces missing, but looks promising, see: https://github.com/wiseaidev/rust-data-analysis

    F# has a very decent data community: https://datascienceinfsharp.com

    And obviously Julia is also something to consider.

  • RasgoQL

    Write python locally, execute SQL in your data warehouse

  • Econometrics-With-Python

    Tutorials of econometrics featuring Python programming. This is a crash course for reviewing the most important concepts and techniques of basic econometrics, the theories are presented lightly without hustles of derivation and Python codes are straightforward.

  • Project mention: Python for Econometrics for Practitioners [Free Online Courses] | /r/CompSocial | 2023-08-24

    Econometrics with Python: This is a crash course for reviewing the most important concepts and techniques of econometrics. The theories are presented lightly without hustles of mathematical derivation and Python codes are mostly procedural and straightforward. Core concepts covered: multi- linear regression, logistic model, dummy variable, simultaneous equations model, panel data model and time series.

  • covid19-severity-prediction

    Extensive and accessible COVID-19 data + forecasting for counties and hospitals. 📈

  • DataScienceWithPython

    Learn Data Science with focus on adding value with the most efficient tech stack.

  • PANDAS-TUTORIAL

    Jupyter Notebooks and Data Sets for Pandas Library (by TirendazAcademy)

  • Data-Visualization

    Collection of interactive Jupiter Notebook widgets and graphs. (by pierpaolo28)

  • Project mention: Plotly Dash for Financial Data Analysis | dev.to | 2024-01-23

    All the code used as part of this article (and more!) is available on my Github profile.

  • daru-view

    daru-view is for easy and interactive plotting in web application & IRuby notebook. daru-view is a plugin gem to the existing daru gem.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Jupyter Notebook Data Analysis related posts

Index

What are some of the best open-source Data Analysis projects in Jupyter Notebook? This list will help you:

Project Stars
1 Data-Science-For-Beginners 26,392
2 pandas_exercises 10,188
3 machine_learning_complete 4,501
4 Data-science 3,950
5 ML-Workspace 3,324
6 Linear-Algebra-With-Python 2,160
7 100-pandas-puzzles 2,154
8 pymc-resources 1,882
9 hyperlearn 1,510
10 hamilton 1,312
11 kangas 1,027
12 qs_ledger 948
13 machine-learning 658
14 datacamp 303
15 tempo 294
16 rust-data-analysis 281
17 RasgoQL 267
18 Econometrics-With-Python 250
19 covid19-severity-prediction 227
20 DataScienceWithPython 170
21 PANDAS-TUTORIAL 158
22 Data-Visualization 151
23 daru-view 90

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com