Datascience

Open-source projects categorized as Datascience

Top 23 Datascience Open-Source Projects

  • ds-cheatsheets

    List of Data Science Cheatsheets to rule the world

  • ludwig

    Low-code framework for building custom LLMs, neural networks, and other AI models

  • Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07

    This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.

    questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?

    Would love to see more progress toward this area!

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • modin

    Modin: Scale your Pandas workflows by changing a single line of code

  • Project mention: The Distributed Tensor Algebra Compiler (2022) | news.ycombinator.com | 2023-06-15
  • Taipy

    Turns Data and AI algorithms into production-ready web applications in no time.

  • Project mention: Python Day 9: Building Interactive Web Apps without HTML/CSS and JavaScript | dev.to | 2024-04-26

    Taipy is an open-source Python library that enables data scientists and developers to build robust end-to-end data pipelines.

  • metaflow

    :rocket: Build and manage real-life ML, AI, and data science projects with ease!

  • Project mention: FLaNK Stack 05 Feb 2024 | dev.to | 2024-02-05
  • machine_learning_complete

    A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

  • Mimesis

    Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • panel

    Panel: The powerful data exploration & web app framework for Python (by holoviz)

  • Project mention: This Week In Python | dev.to | 2024-04-12

    panel – data exploration & web app framework for Python

  • OpenMetadata

    OpenMetadata is a unified platform for discovery, observability, and governance powered by a central metadata repository, in-depth lineage, and seamless team collaboration.

  • Project mention: How to Dynamically Adjust the Height of a Textarea in ReactJS | dev.to | 2023-10-25

    In this blog post, I have demonstrated how I addressed the challenge of dynamically adjusting the height of a textarea element based on its content, preventing the need for vertical scrolling in the title section of the OpenMetadata Knowledge article page.

  • datascience

    Curated list of Python resources for data science.

  • sql-translator

    SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.

  • awesome-conformal-prediction

    A professionally curated list of awesome Conformal Prediction videos, tutorials, books, papers, PhD and MSc theses, articles and open-source libraries.

  • Project mention: Dive Deep into Conformal Prediction with This Ultimate Resource Compilation | news.ycombinator.com | 2024-04-15
  • PyFunctional

    Python library for creating data pipelines with chain functional programming

  • Project mention: Python: Uncovering the Overlooked Core Functionalities | news.ycombinator.com | 2023-07-24

    If you actually think this code is better there's a real library that does this: https://github.com/EntilZha/PyFunctional.

  • An-Introduction-to-Statistical-Learning

    This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.

  • Fast-F1

    FastF1 is a python package for accessing and analyzing Formula 1 results, schedules, timing data and telemetry

  • DataScienceR

    a curated list of R tutorials for Data Science, NLP and Machine Learning

  • ggstatsplot

    Enhancing {ggplot2} plots with statistical analysis 📊📣

  • openllmetry

    Open-source observability for your LLM application, based on OpenTelemetry

  • Project mention: FLaNK-AIM Weekly 13 May 2024 | dev.to | 2024-05-13
  • vscode-jupyter

    VS Code Jupyter extension

  • Project mention: Multiple Notepad++ Flaws Let Attackers Execute Arbitrary Code | news.ycombinator.com | 2023-09-04

    https://github.com/microsoft/vscode/issues/4490

    It looks like there are a number of vscode extensions for recording macros:

    - https://www.google.com/search?q=vscode+macro+recorder

    - https://marketplace.visualstudio.com/search?term=Macro&targe...

    - the macro-commander README explains its JSON-based macro language. YAML might be easier to maintain than JSON. https://github.com/jeff-hykin/macro-commander#what-are-some-...

    For teams with multiple editors, you can specify workflow automation scripts with shell scripts or ci container/cmd YAML, and/or pre-commit.yml instead of with an IDE-specific tool.

    Isn't there native real-time collaboration functionality in vscode/vscodium that would be useful for a native macro recording feature? (Edit) Live Share can't be installed in vscodium. https://github.com/VSCodium/vscodium/issues/128

    Support for jupyter-collaboration Y.js CRDT could be added to vscode-jupyter and/or a more generic extension: "Support for real-time collaboration in the extension?" https://github.com/microsoft/vscode-jupyter/discussions/1293...

    jupyterlab/jupyter-collaboration:

  • CleverCSV

    CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

  • easystats

    :milky_way: The R easystats-project

  • code

    Compilation of R and Python programming codes on the Data Professor YouTube channel. (by dataprofessor)

  • streamlit-geospatial

    A multi-page streamlit app for geospatial

  • Project mention: how i can create a timelapse of a specfic region | /r/remotesensing | 2023-07-05
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Datascience related posts

  • Python Day 9: Building Interactive Web Apps without HTML/CSS and JavaScript

    1 project | dev.to | 26 Apr 2024
  • Dive Deep into Conformal Prediction with This Ultimate Resource Compilation

    1 project | news.ycombinator.com | 15 Apr 2024
  • +10 Resources to Empower Women in Technology

    1 project | dev.to | 6 Mar 2024
  • Show HN: Building data and AI apps, an alternative to Streamlit

    1 project | news.ycombinator.com | 12 Feb 2024
  • Our open-source project for building AI / Data full-stack apps got funded! 🎉 🎉

    1 project | dev.to | 15 Jan 2024
  • Plotting 1,000,000 points on a webpage using only Python

    1 project | /r/bigdata | 11 Dec 2023
  • Forecasts need to have error bars

    1 project | news.ycombinator.com | 4 Dec 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 3 Jun 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Datascience projects? This list will help you:

Project Stars
1 ds-cheatsheets 13,894
2 ludwig 10,893
3 modin 9,524
4 Taipy 9,282
5 metaflow 7,688
6 machine_learning_complete 4,529
7 Mimesis 4,315
8 panel 4,308
9 OpenMetadata 4,343
10 datascience 4,130
11 sql-translator 4,025
12 awesome-conformal-prediction 24
13 PyFunctional 2,347
14 An-Introduction-to-Statistical-Learning 2,285
15 Fast-F1 2,238
16 DataScienceR 1,959
17 ggstatsplot 1,939
18 openllmetry 1,391
19 vscode-jupyter 1,232
20 CleverCSV 1,226
21 easystats 1,040
22 code 881
23 streamlit-geospatial 814

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com