Top 23 Python Science and Data analysis Projects

Pandas

399 42,217 10.0 Python

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

Project mention: The ultimate guide to creating a secure Python package | dev.to | 2024-05-08

It's also possible for you to give a package an alias by using the as keyword. For instance, you could use the pandas package as pd like this:

NumPy

273 26,567 10.0 Python

The fundamental package for scientific computing with Python.

Project mention: Taming Floating-Point Sums | news.ycombinator.com | 2024-05-25

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
NetworkX

61 14,278 9.6 Python

Network Analysis in Python

Project mention: Routes to LANL from 186 sites on the Internet | news.ycombinator.com | 2024-03-04

Built from this data... https://github.com/networkx/networkx/blob/main/examples/grap...

SciPy

50 12,532 9.9 Python

SciPy library main repository

Project mention: What Is a Schur Decomposition? | news.ycombinator.com | 2024-03-04

I guess it is a rite of passage to rewrite it. I'm doing it for SciPy too together with Propack in [1]. Somebody already mentioned your repo. Thank you for your efforts.
[1]: https://github.com/scipy/scipy/issues/18566

SymPy

34 12,474 10.0 Python

A computer algebra system written in pure Python

Project mention: AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite | news.ycombinator.com | 2024-04-09

Thank you for your interest. There are some interesting examples in the SWE-bench-lite benchmark which are resolved by AutoCodeRover:
- From sympy: https://github.com/sympy/sympy/issues/13643. AutoCodeRover's patch for it: https://github.com/nus-apr/auto-code-rover/blob/main/results...
- Another one from scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/13070. AutoCodeRover's patch (https://github.com/nus-apr/auto-code-rover/blob/main/results...) modified a few lines below (compared to the developer patch) and wrote a different comment.
There are more examples in the results directory (https://github.com/nus-apr/auto-code-rover/tree/main/results).

Dask

32 12,078 9.6 Python

Parallel computing with task scheduling

Project mention: The Distributed Tensor Algebra Compiler (2022) | news.ycombinator.com | 2023-06-15

pygwalker

22 10,213 9.6 Python

PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis

Project mention: Show HN: Use an "eraser" to clean data on flight without breaking your workflow | news.ycombinator.com | 2024-03-15

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
statsmodels

8 9,621 9.4 Python

Statsmodels: statistical modeling and econometrics in Python
Numba

124 9,512 9.9 Python

NumPy aware dynamic Python compiler using LLVM

Project mention: Mojo🔥: Head -to-Head with Python and Numba | dev.to | 2023-09-27

Around the same time, I discovered Numba and was fascinated by how easily it could bring huge performance improvements to Python code.

PyMC

3 8,219 9.5 Python

Bayesian Modeling and Probabilistic Programming in Python
BigDL

5 6,099 9.9 Python

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, etc.

Project mention: LLaMA Now Goes Faster on CPUs | news.ycombinator.com | 2024-03-31

Any performance benchmark against intel's 'IPEX-LLM'[0] or others?
[0] - https://github.com/intel-analytics/ipex-llm

orange

27 4,648 9.6 Python

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Project mention: Hierarchical Clustering | news.ycombinator.com | 2024-04-20

I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.
Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.
https://orangedatamining.com/
https://orange3.readthedocs.io/projects/orange-visual-progra...

astropy

26 4,250 9.9 Python

Astronomy and astrophysics core library

Project mention: Julia 1.10 Released | news.ycombinator.com | 2023-12-27

Astropy [0] lives at the heart of most work. It has a Python interface, often backed by Fortran and C++ extension modules. If you use Astropy, you're indirectly using libraries like ERFA [6] and cfitsio [7] which are in C/Fortran.
I personally end up doing a lot of work that uses the HEALPix sky tesselation, so I use healpy [2] as well.
Openorb is perhaps a good example of a pure-Fortran package that I use quite. frequently for orbit propagation [3].
In C, there's Rebound [4] (for N-body simulations) and ASSIST [5] (which extends Rebound to use JPL's pre-calculated positions of major perturbers, and expands the force model to account for general relativity).
There are many more, these are just ones that come to mind from frequent usage in the last few months.
[0] https://www.astropy.org/

Biopython

31 4,194 9.6 Python

Official git repository for Biopython (originally converted from CVS)

Project mention: Invitación a proyecto - Biopython en Español | /r/devsarg | 2023-07-23

blaze

1 3,180 0.0 Python

NumPy and Pandas interface to Big Data

Project mention: Blaze: Fast query execution engine for Apache Spark | news.ycombinator.com | 2023-10-19

Unfortunate name overlap with an under-loved PyData project: https://blaze.pydata.org

fugue

11 1,891 6.4 Python

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.

Project mention: FLaNK Stack Weekly 22 January 2024 | dev.to | 2024-01-22

Cubes

1 1,490 0.0 Python

[NOT MAINTAINED] Light-weight Python OLAP framework for multi-dimensional data analysis
bcbio-nextgen

2 978 6.2 Python

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
Neupy

0 740 0.0 Python

NeuPy is a Tensorflow based python library for prototyping and building neural networks
NIPY

0 738 9.0 Python

Workflows and interfaces for neuroimaging packages
bccb

0 595 4.4 Python

Incubator for useful bioinformatics code, primarily in Python and R
Bubbles

0 450 0.0 Python

[NOT MAINTAINED] Bubbles – Python ETL framework (by Stiivi)
PyDy

1 346 3.9 Python

Multibody dynamics tool kit.
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Science and Data analysis related posts

Taming Floating-Point Sums

4 projects | news.ycombinator.com | 25 May 2024
The ultimate guide to creating a secure Python package

4 projects | dev.to | 8 May 2024
The Birth of Parquet

3 projects | news.ycombinator.com | 8 May 2024
PDEP-13: The Pandas Logical Type System

1 project | news.ycombinator.com | 4 May 2024
AWS Serverless Diversity: Multi-Language Strategies for Optimal Solutions

4 projects | dev.to | 28 Apr 2024
Pandas reset_index(): How To Reset Indexes in Pandas

1 project | dev.to | 27 Apr 2024
Hierarchical Clustering

1 project | news.ycombinator.com | 20 Apr 2024
A note from our sponsor - SaaSHub
www.saashub.com | 27 May 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Science and Data analysis projects in Python? This list will help you:

	Project	Stars
1	Pandas	42,217
2	NumPy	26,567
3	NetworkX	14,278
4	SciPy	12,532
5	SymPy	12,474
6	Dask	12,078
7	pygwalker	10,213
8	statsmodels	9,621
9	Numba	9,512
10	PyMC	8,219
11	BigDL	6,099
12	orange	4,648
13	astropy	4,250
14	Biopython	4,194
15	blaze	3,180
16	fugue	1,891
17	Cubes	1,490
18	bcbio-nextgen	978
19	Neupy	740
20	NIPY	738
21	bccb	595
22	Bubbles	450
23	PyDy	346

Python Science and Data analysis

Top 23 Python Science and Data analysis Projects

Python Science and Data analysis related posts

Taming Floating-Point Sums

The ultimate guide to creating a secure Python package

The Birth of Parquet

PDEP-13: The Pandas Logical Type System

AWS Serverless Diversity: Multi-Language Strategies for Optimal Solutions

Pandas reset_index(): How To Reset Indexes in Pandas

Hierarchical Clustering

Index