SaaSHub helps you find the best software and product alternatives Learn more ā
Top 23 Python Science and Data analysis Projects
-
Pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
BigDL
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max). A PyTorch LLM library that seamlessly integrates with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, etc.
-
fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
-
bcbio-nextgen
Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
It's also possible for you to give a package an alias by using the as keyword. For instance, you could use the pandas package as pd like this:
Built from this data... https://github.com/networkx/networkx/blob/main/examples/grap...
I guess it is a rite of passage to rewrite it. I'm doing it for SciPy too together with Propack in [1]. Somebody already mentioned your repo. Thank you for your efforts.
[1]: https://github.com/scipy/scipy/issues/18566
Project mention: AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite | news.ycombinator.com | 2024-04-09Thank you for your interest. There are some interesting examples in the SWE-bench-lite benchmark which are resolved by AutoCodeRover:
- From sympy: https://github.com/sympy/sympy/issues/13643. AutoCodeRover's patch for it: https://github.com/nus-apr/auto-code-rover/blob/main/results...
- Another one from scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/13070. AutoCodeRover's patch (https://github.com/nus-apr/auto-code-rover/blob/main/results...) modified a few lines below (compared to the developer patch) and wrote a different comment.
There are more examples in the results directory (https://github.com/nus-apr/auto-code-rover/tree/main/results).
Project mention: Show HN: Use an "eraser" to clean data on flight without breaking your workflow | news.ycombinator.com | 2024-03-15
Around the same time, I discovered Numba and was fascinated by how easily it could bring huge performance improvements to Python code.
Any performance benchmark against intel's 'IPEX-LLM'[0] or others?
[0] - https://github.com/intel-analytics/ipex-llm
I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.
Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.
https://orangedatamining.com/
https://orange3.readthedocs.io/projects/orange-visual-progra...
Astropy [0] lives at the heart of most work. It has a Python interface, often backed by Fortran and C++ extension modules. If you use Astropy, you're indirectly using libraries like ERFA [6] and cfitsio [7] which are in C/Fortran.
I personally end up doing a lot of work that uses the HEALPix sky tesselation, so I use healpy [2] as well.
Openorb is perhaps a good example of a pure-Fortran package that I use quite. frequently for orbit propagation [3].
In C, there's Rebound [4] (for N-body simulations) and ASSIST [5] (which extends Rebound to use JPL's pre-calculated positions of major perturbers, and expands the force model to account for general relativity).
There are many more, these are just ones that come to mind from frequent usage in the last few months.
[0] https://www.astropy.org/
Project mention: Blaze: Fast query execution engine for Apache Spark | news.ycombinator.com | 2023-10-19Unfortunate name overlap with an under-loved PyData project: https://blaze.pydata.org
Python Science and Data analysis related posts
-
Taming Floating-Point Sums
-
The ultimate guide to creating a secure Python package
-
The Birth of Parquet
-
PDEP-13: The Pandas Logical Type System
-
AWS Serverless Diversity: Multi-Language Strategies for Optimal Solutions
-
Pandas reset_index(): How To Reset Indexes in Pandas
-
Hierarchical Clustering
-
A note from our sponsor - SaaSHub
www.saashub.com | 27 May 2024
Index
What are some of the best open-source Science and Data analysis projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | Pandas | 42,217 |
2 | NumPy | 26,567 |
3 | NetworkX | 14,278 |
4 | SciPy | 12,532 |
5 | SymPy | 12,474 |
6 | Dask | 12,078 |
7 | pygwalker | 10,213 |
8 | statsmodels | 9,621 |
9 | Numba | 9,512 |
10 | PyMC | 8,219 |
11 | BigDL | 6,099 |
12 | orange | 4,648 |
13 | astropy | 4,250 |
14 | Biopython | 4,194 |
15 | blaze | 3,180 |
16 | fugue | 1,891 |
17 | Cubes | 1,490 |
18 | bcbio-nextgen | 978 |
19 | Neupy | 740 |
20 | NIPY | 738 |
21 | bccb | 595 |
22 | Bubbles | 450 |
23 | PyDy | 346 |
Sponsored