What's the best tool to build pipelines from REST APIs?

This page summarizes the projects mentioned and recommended in the original post on /r/dataengineering

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • memphis

    Memphis.dev is a highly scalable and effortless data streaming platform

  • Great recommendations by the rest of the members here. I would love to learn more about your use case if possible, as we are adding a native REST, websocket and gRPC support to our message broker (Memphis. Let’s chat if possible, would love to work on this together

  • xidel

    Command line tool to download and extract data from HTML/XML pages or JSON-APIs, using CSS, XPath 3.0, XQuery 3.0, JSONiq or pattern matching. It can also create new or transformed XML/HTML/JSON documents.

  • Xidel for extraction and pagination

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • gnu-parallel

    A clone of GNU Parallel (git://git.savannah.gnu.org/parallel.git)

  • GNU Parallel for parallelism, retry, and resumption

  • jq

    Discontinued Command-line JSON processor [Moved to: https://github.com/jqlang/jq] (by stedolan)

  • JQ for JSON processing

  • aws_lambda_reddit_api

    Batch data processing project using data from the reddit api.

  • I have a small project like this i done before. Which i am gonna shamelessly plug in lol. https://github.com/PanzerFlow/aws_lambda_reddit_api

  • Mage

    🧙 The modern replacement for Airflow. Mage is an open-source data pipeline tool for transforming and integrating data. https://github.com/mage-ai/mage-ai

  • AWS: deploy using maintained Terraform scripts.

  • CoinCap-firehose-s3-DynamicPartitioning

    AWS CDK project using typescript. Services: Lambda, Kinesis Firehose, Glue, Quicksight.

  • I agree with the Cron triggered Lambda approach. For inspiration I have a small project where a lambda pulls data from a public api and writes it to a firehose which buffers the data and writes it to s3. There is also a cron job on Glue which catalogues the data. https://github.com/TrygviZL/CoinCap-firehose-s3-DynamicPartitioning

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • astro-sdk

    Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

  • I have an example here using COVID data. basically you just write a python function that reads the API and returns a dataframe (or any number of dataframes) and downstream tasks can then read the output as either a dataframe or a SQL table.

  • Rudderstack

    Privacy and Security focused Segment-alternative, in Golang and React

  • RudderStack is an open-source tool to build data pipelines with high-availability and high-precision event ordering. It is suitable for your use case as

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts