Ask HN: What is the correct way to deal with pipelines?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • Huginn

    Create agents that monitor and act on your behalf. Your agents are standing by!

  • "correct" is a value judgement that depends on lots of different things. Only you can decide which tool is correct. Here are some ideas:

    - https://camel.apache.org/

    - https://www.windmill.dev/

    - https://github.com/huginn/huginn

    Your idea about a queue (in redis, or postgres, or sqlite, etc) is also totally valid. These off-the-shelf tools I listed probably wouldn't give you a huge advantage IMO.

  • Apache Camel

    Apache Camel is an open source integration framework that empowers you to quickly and easily integrate various systems consuming or producing data.

  • "correct" is a value judgement that depends on lots of different things. Only you can decide which tool is correct. Here are some ideas:

    - https://camel.apache.org/

    - https://www.windmill.dev/

    - https://github.com/huginn/huginn

    Your idea about a queue (in redis, or postgres, or sqlite, etc) is also totally valid. These off-the-shelf tools I listed probably wouldn't give you a huge advantage IMO.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Airflow

    Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

  • I agree there are many options in this space. Two others to consider:

    - https://airflow.apache.org/

    - https://github.com/spotify/luigi

    There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…

  • luigi

    Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

  • I agree there are many options in this space. Two others to consider:

    - https://airflow.apache.org/

    - https://github.com/spotify/luigi

    There are also many Kubernetes based options out there. For the specific use case you specified, you might even consider a plain old Makefile and incrond if you expect these all to run on a single host and be triggered by a new file showing up in a directory…

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • What to use with Docker?

    12 projects | /r/synology | 24 Mar 2023
  • changedetection.io alternatives?

    4 projects | /r/selfhosted | 5 Oct 2022
  • 2022-7-24 TrueCharts catalog charts update

    38 projects | /r/Xstar97TheNoob | 24 Jul 2022
  • Workflow automation for smaller use-cases

    3 projects | /r/learnpython | 22 Apr 2022
  • Notifications on mobile?

    3 projects | /r/selfhosted | 17 Feb 2022