Qsv: Efficient CSV CLI Toolkit

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • qsv

    CSVs sliced, diced & analyzed.

  • Thanks for the detailed feedback @snidane!

    As maintainer of qsv, here's my reply:

    - Given qsv's rapid release cycle (173 releases over three years), the auto-update check is essential at the moment. Once we reach 1.0, I'll turn it off. For now, given your feedback, I've only made it check 10% of the time.

    - Pivot is in the backlog and I'll be sure to add unpivot when I implement it. (https://github.com/jqnatividad/qsv/issues/799)

    - I'll add a dedicated summing command with the group by (-by) and window by (-over) capability (https://github.com/jqnatividad/qsv/issues/1514). Do note that `stats` has basic sum as @ezequiel-garzon pointed out.

    - With the `enum` command, qsv can achieve what you proposed with `laminate`. E.g. qsv enum --new-column newcol --constant newconstant mydata.csv --output laminated-data.csv

    - With the cat rowskey command, qsv can already concatenate files with mismatched headers.

    - other file formats. qsv supports parquet, csv, tsv, excel, ods, datapackage, sqlite and more (see https://github.com/jqnatividad/qsv/tree/master#file-formats). Fixed-format though is not supported yet and quite interesting, and have added it to the backlog (https://github.com/jqnatividad/qsv/issues/1515)

    - as to "enable embedding outputs of commands", qsv is composable by design, so you can use standard stdin/stdout redirection/piping techniques to have it work with other CLI tools like jq, awk, etc.

    Finally, just released v0.120.0 that already incorporates the less aggressive self-update check. https://github.com/jqnatividad/qsv/releases/tag/0.120.0

  • vnlog

    Process labelled tabular ASCII data using normal UNIX tools

  • For simple analyses (i.e. what most people do most of the time) doing this on the commandline gets you there faster. I use vnlog (https://github.com/dkogan/vnlog/). By the time you fired up your editor to write your Python code, I already have analyses and plots ready.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

  • xsv

    A fast CSV command line toolkit written in Rust.

  • teip

    Masking tape to help commands "do one thing well"

  • citation-file-format

    The Citation File Format lets you provide citation metadata for software or datasets in plaintext files that are easy to read by both humans and machines.

  • I am somewhat tickled at the thought of citing everything in a malicious compliance kind of way. Given a Nix environment, it should be possible to pull down a list of every bit of code that was used to construct the OS. Would we have to differentiate between installed vs executed code? My Latex environment probably has thousands of packages, though I might directly only include a handful of them. Even if I include a Latex package, it might not get executed.

    The CITATION.cff format[0] is a newish format to solve the machine identification of citable works, but I suspect it is too new to see widespread adoption. It is going to take some backbreaking regexes to extract "How to Cite" sections embedded in READMEs and buried in the source.

    [0] https://citation-file-format.github.io/

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Anyone else feel like they are using Pandas as a crutch?

    1 project | /r/dataengineering | 5 Mar 2023
  • xsv

    1 project | /r/ITProTuesday | 3 Mar 2023
  • Using Commandline To Process CSV files

    1 project | /r/programming | 14 Dec 2022
  • How do I delete lines in a CSV using Sed based on condition?

    2 projects | /r/commandline | 26 Jul 2022
  • Write a program in Rust to read a CSV file and create two output CSV files – one file with odd rows and the other file with even rows from the input file

    1 project | /r/rust | 17 Jun 2022