-
hamilton
Discontinued A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton (by stitchfix)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
If you're want to do more python https://github.com/stitchfix/hamilton allows you to model dependencies at a columnar (field) level.
There are specialized tools like DataHub (see this for columnar level reporting: https://feature-requests.datahubproject.io/roadmap/541 ) that would help. But really, in a good data platform, the orchestration layer should be aggregating metadata and giving you everything you need to trace lineage, A tool like Dagster does this well if you make full use of the Software Defined Assets capability, but that is fairly new so not so many people have embraced it yet.
There are specialized tools like DataHub (see this for columnar level reporting: https://feature-requests.datahubproject.io/roadmap/541 ) that would help. But really, in a good data platform, the orchestration layer should be aggregating metadata and giving you everything you need to trace lineage, A tool like Dagster does this well if you make full use of the Software Defined Assets capability, but that is fairly new so not so many people have embraced it yet.
Column-level lineage in OpenLineage is in its early days. There's support in the spec for it, and the integration with Spark currently emits column-level metadata. You can see the facet definition here.