Thingies
This commit is contained in:
parent
7480222cc7
commit
1e53b3895c
2 changed files with 44 additions and 1 deletions
17
code_thingies/dbtlearn/snapshots/scd_raw_listings.sql
Normal file
17
code_thingies/dbtlearn/snapshots/scd_raw_listings.sql
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
{% snapshot scd_raw_listings %}
|
||||
|
||||
{{
|
||||
config(
|
||||
target_schema = 'dev',
|
||||
unique_key = 'id',
|
||||
strategy = 'timestamp',
|
||||
updated_at = 'updated_at',
|
||||
invalidate_hard_deletes = True
|
||||
)
|
||||
}}
|
||||
|
||||
SELECT *
|
||||
FROM
|
||||
{{ source('airbnb', 'listings')}}
|
||||
|
||||
{% endsnapshot %}
|
||||
28
notes/8.md
28
notes/8.md
|
|
@ -47,4 +47,30 @@ Bear in mind that how to define the strategy to determine what should be loaded
|
|||
Seeds are local files that you upload to a DWH from dbt. You place them as CSVs in the `seeds` folder.
|
||||
|
||||
|
||||
Sources are an abstraction layer on top of the input tables. They are not strictly necessary, but can help make the project more structured. To create sources, you create a `sources.yml` file and place it in the `models` dir.
|
||||
Sources are an abstraction layer on top of the input tables. They are not strictly necessary, but can help make the project more structured. To create sources, you create a `sources.yml` file and place it in the `models` dir. Here, you can reference models created in the `models` dir to mark them as sources. You can reference sources in other models like this:
|
||||
|
||||
```python
|
||||
{{ source('domain_name', 'source_name')}}
|
||||
```
|
||||
|
||||
Sources can define _freshness_ constraints that will provide warnings or errors when there is a significant delay.
|
||||
|
||||
|
||||
## Snapshots
|
||||
|
||||
Snapshots are a way to build SCD2s. There are two strategies to get this done:
|
||||
- Timestamp: all records have a unique key and an `update_at` field. dbt will consider a new record is necessary in the SCD2 whenever the `updated_at` field increases.
|
||||
- Check: dbt will monitor a set of columns and consider any changes in any of the columns as a new version of the record.
|
||||
|
||||
Snapshots get defined with a sql file in the `snapshots` folder using the `snapshot` macro block.
|
||||
|
||||
Once snapshots are defined, "snapshooting" can be triggered at any time by running `dbt snapshot`. dbt will create the SCD tables in the defined schema and play the `valid_from`, `valid_to` game whenever changes are detected.
|
||||
|
||||
## Tests
|
||||
|
||||
There are two kinds of tests:
|
||||
|
||||
- Singular tests: you make any `SELECT` statement you want. If the `SELECT` statement is run and any data is found, the test is considered failed. If the statement is run and no rows are returned, the test is considered passed.
|
||||
- Built-in test: just a bunch of typical stuff: uniqueness, nullability, enum validations and relationship (referential integrity)
|
||||
|
||||
You can also define your own custom generic tests.
|
||||
Loading…
Add table
Add a link
Reference in a new issue