4 KiB
4 KiB
Cool tools
A list of tools we’ve come across and look interesting, but we haven’t assessed, tested or deployed yet:
Visualization and data exploration
Dashboard-oriented
Notebook-oriented
- https://www.querybook.org/
- https://popsql.com/
- https://jupyterhub.readthedocs.io/en/stable/index.html
- And this version seems fitting for our needs: https://tljh.jupyter.org/en/latest/#
- https://evidence.dev/
- Open source. Has cloud options. Markdown + SQL notebooks, with cached data, batch built. Has integration with dbt. Seems like a great chance to go cloud with them, knowing that we can always part ways if needed and self host it.
- https://observablehq.com/
- Shared by Aled
Hybrid
dbt
- A tool to run quality checks on dbt as pre-commit hooks: https://github.com/dbt-checkpoint/dbt-checkpoint
- A tool to check and enforce dbt project conventions: https://github.com/godatadriven/dbt-bouncer
- https://datarecce.io/docs/get-started/ ← Data diff for PRs
- https://www.synq.io/integrations/dbt/
- Incident management for dbt tests, as well as data product definition and ownership.
- Get an alert, triage it, categorize, and route towards the right owner according to what failed
Semantic Layer
- A good overview on what it is and what it covers: https://airbyte.com/blog/the-rise-of-the-semantic-layer-metrics-on-the-fly
- The clearest option we have: https://cube.dev/
- A copy-cat: https://github.com/synmetrix/synmetrix
Data cataloguing, documentation, lineage
DWH
- https://pgt.dev/extensions/pgaudit pgaudit, a postgres extension that can log EXPLAIN ANALYZE statements from queries
- https://github.com/citusdata/citus (an extension that could provide us with columnar storage)
- side stuff for fun: a fun benchmark on a poor-man’s columnar storage on Postgres: https://www.brianlikespostgres.com/poor-mans-column-oriented-database.html
- https://www.postgresql.org/docs/current/postgres-fdw.html
- Foreign Data Wrappers extension. We could use this in our local environment dwh up so that we can read the sync schemas from production instead of having to clone data.
- The pro is that the dev experience would be way more smooth and fast. No more dumping and restoring, no weird inconsistencies.
- The cons are that we would depend on the prd db to develop, and that we would consume resources from the production dwh for development, which is not ideal. Also, the dump and restore approach opens a window to being hacky and safely manipulating the data you are working with.
- Foreign Data Wrappers extension. We could use this in our local environment dwh up so that we can read the sync schemas from production instead of having to clone data.
- Postgres full text search: https://supabase.com/blog/postgres-full-text-search-vs-the-rest
- This could be interesting to support filtering situations in Dashboard where users struggles with the standard strictness of string matching (in grug speak: make it easy user find name, no care about upper/lower case, space, word order, etc).
- More articles here: https://gist.github.com/cpursley/e3586382c3a42c54ca7f5fef1665be7b
- https://postgresql-anonymizer.readthedocs.io/en/stable/
- A postgres extension that allows anonymizing of data.
- It has this incredibly attractive features where you can declare certain fields to be masked/faked, then set that some Postgres role is not allowed to see the real thing. The postgres role can then query the table, but will see the mask/faked colums masked/faked.
Other
Diagram tools
https://diagrams.mingrammer.com/
Mathesar
Mathesar is a web application that makes working with PostgreSQL databases both simple and powerful.