Cool tools

A list of tools we’ve come across and look interesting, but we haven’t assessed, tested or deployed yet:

Visualization and data exploration

Dashboard-oriented

Notebook-oriented

https://www.querybook.org/
https://popsql.com/
https://jupyterhub.readthedocs.io/en/stable/index.html
- And this version seems fitting for our needs: https://tljh.jupyter.org/en/latest/#
https://evidence.dev/
- Open source. Has cloud options. Markdown + SQL notebooks, with cached data, batch built. Has integration with dbt. Seems like a great chance to go cloud with them, knowing that we can always part ways if needed and self host it.
https://observablehq.com/
- Shared by Aled

Hybrid

dbt

A tool to run quality checks on dbt as pre-commit hooks: https://github.com/dbt-checkpoint/dbt-checkpoint
A tool to check and enforce dbt project conventions: https://github.com/godatadriven/dbt-bouncer
https://datarecce.io/docs/get-started/ ← Data diff for PRs
https://www.synq.io/integrations/dbt/
- Incident management for dbt tests, as well as data product definition and ownership.
- Get an alert, triage it, categorize, and route towards the right owner according to what failed

Semantic Layer

A good overview on what it is and what it covers: https://airbyte.com/blog/the-rise-of-the-semantic-layer-metrics-on-the-fly
The clearest option we have: https://cube.dev/
A copy-cat: https://github.com/synmetrix/synmetrix

Data cataloguing, documentation, lineage

DWH

https://pgt.dev/extensions/pgaudit pgaudit, a postgres extension that can log EXPLAIN ANALYZE statements from queries
https://github.com/citusdata/citus (an extension that could provide us with columnar storage)
- side stuff for fun: a fun benchmark on a poor-man’s columnar storage on Postgres: https://www.brianlikespostgres.com/poor-mans-column-oriented-database.html
https://www.postgresql.org/docs/current/postgres-fdw.html
- Foreign Data Wrappers extension. We could use this in our local environment dwh up so that we can read the sync schemas from production instead of having to clone data.
  - The pro is that the dev experience would be way more smooth and fast. No more dumping and restoring, no weird inconsistencies.
  - The cons are that we would depend on the prd db to develop, and that we would consume resources from the production dwh for development, which is not ideal. Also, the dump and restore approach opens a window to being hacky and safely manipulating the data you are working with.
Postgres full text search: https://supabase.com/blog/postgres-full-text-search-vs-the-rest
- This could be interesting to support filtering situations in Dashboard where users struggles with the standard strictness of string matching (in grug speak: make it easy user find name, no care about upper/lower case, space, word order, etc).
- More articles here: https://gist.github.com/cpursley/e3586382c3a42c54ca7f5fef1665be7b
https://postgresql-anonymizer.readthedocs.io/en/stable/
- A postgres extension that allows anonymizing of data.
- It has this incredibly attractive features where you can declare certain fields to be masked/faked, then set that some Postgres role is not allowed to see the real thing. The postgres role can then query the table, but will see the mask/faked colums masked/faked.

Other

Diagram tools

https://diagrams.mingrammer.com/

Mathesar

Mathesar is a web application that makes working with PostgreSQL databases both simple and powerful.

https://github.com/mathesar-foundation/mathesar

4 KiB Raw Blame History Unescape Escape