Python data-profiling

Open-source Python projects categorized as data-profiling

Top 12 Python data-profiling Projects

  • ydata-profiling

    1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

  • Project mention: FLaNK 25 December 2023 | dev.to | 2023-12-26
  • great_expectations

    Always know what to expect from your data.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • cleanlab

    The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

  • Project mention: [Research] Detecting Annotation Errors in Semantic Segmentation Data | /r/MachineLearning | 2023-11-05

    We have feely open-sourced our new method for improving segmentation data, published a paper on the research behind it, and released a 5-min code tutorial. You can also read more in the blog if you'd like.

  • sweetviz

    Visualize and compare datasets, target values and associations, with one line of code.

  • soda-core

    :zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io

  • Optimus

    :truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark (by ironmussa)

  • cleanvision

    Automatically find issues in image datasets and practice data-centric computer vision.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • popmon

    Monitor the stability of a Pandas or Spark dataframe ⚙︎

  • piperider

    Code review for data in dbt

  • Project mention: Show HN: PipeRider – open-source Data Impact Analysis for dbt changes | news.ycombinator.com | 2023-09-06
  • swiple

    Swiple enables you to easily observe, understand, validate and improve the quality of your data

  • data-profiling

    a set of scripts to pull meta data and data profiling metrics from relational database systems

  • metacrafter

    Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules

  • Project mention: Metacrafter – semantic data types detection Python lib | news.ycombinator.com | 2024-03-13
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python data-profiling related posts

  • Show HN: PipeRider – open-source Data Impact Analysis for dbt changes

    3 projects | news.ycombinator.com | 6 Sep 2023
  • Looking for Unit Testing framework in Database Migration Process

    3 projects | /r/dataengineering | 23 Mar 2023
  • Data profiling as part of a data reliability strategy?

    2 projects | /r/dataengineering | 15 Sep 2022
  • Show HN: PipeRider, data reliability automated tool

    2 projects | news.ycombinator.com | 23 Jun 2022

Index

What are some of the best open-source data-profiling projects in Python? This list will help you:

Project Stars
1 ydata-profiling 12,101
2 great_expectations 9,526
3 cleanlab 8,858
4 sweetviz 2,842
5 soda-core 1,780
6 Optimus 1,447
7 cleanvision 931
8 popmon 487
9 piperider 469
10 swiple 78
11 data-profiling 70
12 metacrafter 39

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com