Python Pipeline

Open-source Python projects categorized as Pipeline

Top 23 Python Pipeline Projects

  • jina

    ☁️ Build multimodal AI applications with cloud-native stack

  • Project mention: Jina.ai: Self-host Multimodal models | news.ycombinator.com | 2024-01-26
  • Prefect

    The easiest way to build, run, and monitor data pipelines at scale.

  • Project mention: Prefect: A workflow orchestration tool for data pipelines | news.ycombinator.com | 2024-03-13
  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

  • Project mention: How to Build a Chat App with Your Postgres Data using Agent Cloud | dev.to | 2024-05-13

    AgentCloud uses Airbyte to build data pipelines, which allow us to split, chunk, and embed data from over 300 data sources, including Postgres.

  • great_expectations

    Always know what to expect from your data.

  • Kedro

    Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

  • Project mention: Nextflow: Data-Driven Computational Pipelines | news.ycombinator.com | 2023-08-10

    Interesting, thanks for sharing. I'll definitely take a look, although at this point I am so comfortable with Snakemake, it is a bit hard to imagine what would convince me to move to another tool. But I like the idea of composable pipelines: I am building a tool (too early to share) that would allow to lay Snakemake pipelines on top of each other using semi-automatic data annotations similar to how it is done in kedro (https://github.com/kedro-org/kedro).

  • Taipy

    Turns Data and AI algorithms into production-ready web applications in no time.

  • Project mention: Python Day 9: Building Interactive Web Apps without HTML/CSS and JavaScript | dev.to | 2024-04-26

    Taipy is an open-source Python library that enables data scientists and developers to build robust end-to-end data pipelines.

  • Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

  • Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • papermill

    📚 Parameterize, execute, and analyze notebooks

  • Project mention: Spreadsheet errors can have disastrous consequences – yet we keep making them | news.ycombinator.com | 2024-01-25

    Pandas docs > Comparison with spreadsheets: https://pandas.pydata.org/docs/getting_started/comparison/co...

    Pandas docs > I/O > Excel files: https://pandas.pydata.org/docs/user_guide/io.html#excel-file...

    nteract/papermill: https://github.com/nteract/papermill :

    > papermill is a tool for parameterizing, executing, and analyzing Jupyter Notebooks. [...]

    > This opens up new opportunities for how notebooks can be used. For example:

    > - Perhaps you have a financial report that you wish to run with different values on the first or last day of a month or at the beginning or end of the year, using parameters makes this task easier.

    "The World Excel Championship is being broadcast on ESPN" (2022) https://news.ycombinator.com/item?id=32420925 :

    > Computational notebook speedrun ideas:

  • pipelines

    Machine Learning Pipelines for Kubeflow

  • towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

  • Project mention: FLaNK Stack Weekly for 14 Aug 2023 | dev.to | 2023-08-14
  • PyFunctional

    Python library for creating data pipelines with chain functional programming

  • Project mention: Python: Uncovering the Overlooked Core Functionalities | news.ycombinator.com | 2023-07-24

    If you actually think this code is better there's a real library that does this: https://github.com/EntilZha/PyFunctional.

  • mara-pipelines

    A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

  • pytorch-toolbelt

    PyTorch extensions for fast R&D prototyping and Kaggle farming

  • MLBox

    MLBox is a powerful Automated Machine Learning python library.

  • galaxy

    Data intensive science for everyone.

  • Project mention: Need for GUIs for bioinformatic tools? | /r/bioinformatics | 2023-06-17

    Maybe it would help you to look at the galaxy project: GitHub main site

  • sematic

    An open-source ML pipeline development platform

  • toil

    A scalable, efficient, cross-platform (Linux/macOS) and easy-to-use workflow engine in pure Python.

  • Project mention: Show HN: Hatchet – Open-source distributed task queue | news.ycombinator.com | 2024-03-08

    a little late now, but I wonder if https://github.com/DataBiosphere/toil might meet your requirements

  • NeumAI

    Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.

  • Project mention: Show HN: Neum AI – Open-source large-scale RAG framework | news.ycombinator.com | 2023-11-21

    Interesting to see that the semantic chunking in the tools library is a wrapper around GPT-4. Asks GPT for the python code and executes it: https://github.com/NeumTry/NeumAI/blob/main/neumai-tools/neu...

  • pypyr automation task runner

    pypyr task-runner cli & api for automation pipelines. Automate anything by combining commands, different scripts in different languages & applications into one pipeline process.

  • Project mention: Simple task runner for automation pipelines | news.ycombinator.com | 2023-11-03
  • aws-lambda-handler-cookbook

    This repository provides a working, deployable, open source-based, serverless service template with an AWS Lambda function and AWS CDK Python code with all the best practices and a complete CI/CD pipeline.

  • Project mention: Serverless APIs: Auto-Generate OpenAPI Docs & CI/CD Protections | dev.to | 2024-03-04

    In case you didn’t know, the Cookbook is a template project that allows you to get started with serverless with three clicks, and it has all the best practices and utilities that a production-grade serverless service requires.

  • versatile-data-kit

    One framework to develop, deploy and operate data workflows with Python and SQL.

  • karton

    Distributed malware processing framework based on Python, Redis and S3.

  • Project mention: Advices for an automated malware analysis lab project | /r/Malware | 2023-07-11
  • fluids

    Fluid dynamics component of Chemical Engineering Design Library (ChEDL)

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Pipeline related posts

  • How to Build a Chat App with Your Postgres Data using Agent Cloud

    3 projects | dev.to | 13 May 2024
  • Launch HN: Bracket (YC W22) – Two-Way Sync Between Salesforce and Postgres

    1 project | news.ycombinator.com | 12 Dec 2023
  • Simple task runner for automation pipelines

    1 project | news.ycombinator.com | 3 Nov 2023
  • 25 million Creative Commons image dataset released!

    1 project | /r/StableDiffusion | 1 Oct 2023
  • Nextflow: Data-Driven Computational Pipelines

    9 projects | news.ycombinator.com | 10 Aug 2023
  • Airbyte API and Terraform Provider – available in open source

    1 project | news.ycombinator.com | 3 Aug 2023
  • Need help moving 16gb of mongodb data to tableau

    1 project | /r/tableau | 28 Jul 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 22 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source Pipeline projects in Python? This list will help you:

Project Stars
1 jina 20,177
2 Prefect 14,829
3 airbyte 14,296
4 great_expectations 9,526
5 Kedro 9,398
6 Taipy 8,847
7 Mage 7,171
8 papermill 5,656
9 pipelines 3,457
10 towhee 3,015
11 PyFunctional 2,342
12 mara-pipelines 2,056
13 pytorch-toolbelt 1,488
14 MLBox 1,477
15 galaxy 1,320
16 sematic 947
17 toil 874
18 NeumAI 788
19 pypyr automation task runner 571
20 aws-lambda-handler-cookbook 464
21 versatile-data-kit 412
22 karton 370
23 fluids 340

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com