110 lines
4 KiB
Markdown
110 lines
4 KiB
Markdown
## Models
|
|
|
|
Models are the fundamental concept behind dbt.
|
|
|
|
They are stored as SQL files in the `models` folder.
|
|
|
|
Models can be related between themselves to map dependencies.
|
|
|
|
|
|
|
|
## Materializations
|
|
|
|
- Ways in which a model can be stored in the database. There are 4:
|
|
- View: it's just a view
|
|
- Table: the model gets stored as a table
|
|
- Incremental: also a table, but can only create new records, not update
|
|
- Ephemeral: it's actually NOT materializing. The model can be used by dependents, but it won't be materialized in the DB. It will truly only be a CTE that gets used by other models. Mostly for intermediate states in transformations.
|
|
|
|
Materializations can be defined at the model level, folder level and project level. This can be modified in the `dbt_project.yml` file, under the `models` key.
|
|
|
|
To set materialization config at the model level, one must make a jinja tag at the start of the file and call the `config` dbt function. See an example below:
|
|
|
|
```python
|
|
{{
|
|
config(
|
|
materialized = 'incremental',
|
|
on_schema_change = 'fail'
|
|
)
|
|
}}
|
|
```
|
|
|
|
Incremental materializations need to a block that defines the logic to apply in incremental loads (as opposed to the 'normal' logic, that gets apply on first runs). See below an example:
|
|
|
|
```SQL
|
|
[... rest of query ...]
|
|
WHERE
|
|
review_text IS NOT NULL
|
|
{% if is_incremental() %}
|
|
AND review_date > (SELECT MAX(review_date) FROM {{ this }})
|
|
{% endif %}
|
|
```
|
|
|
|
Bear in mind that how to define the strategy to determine what should be loaded is up to the engineer. Any SQL can be placed within the `if is_incremental()` block. In the example above, we have a date field that easily signals what's the most recent date the table has currently seen.
|
|
|
|
## Sources and seeds
|
|
|
|
Seeds are local files that you upload to a DWH from dbt. You place them as CSVs in the `seeds` folder.
|
|
|
|
|
|
Sources are an abstraction layer on top of the input tables. They are not strictly necessary, but can help make the project more structured. To create sources, you create a `sources.yml` file and place it in the `models` dir. Here, you can reference models created in the `models` dir to mark them as sources. You can reference sources in other models like this:
|
|
|
|
```python
|
|
{{ source('domain_name', 'source_name')}}
|
|
```
|
|
|
|
Sources can define _freshness_ constraints that will provide warnings or errors when there is a significant delay.
|
|
|
|
|
|
## Snapshots
|
|
|
|
Snapshots are a way to build SCD2s. There are two strategies to get this done:
|
|
- Timestamp: all records have a unique key and an `update_at` field. dbt will consider a new record is necessary in the SCD2 whenever the `updated_at` field increases.
|
|
- Check: dbt will monitor a set of columns and consider any changes in any of the columns as a new version of the record.
|
|
|
|
Snapshots get defined with a sql file in the `snapshots` folder using the `snapshot` macro block.
|
|
|
|
Once snapshots are defined, "snapshooting" can be triggered at any time by running `dbt snapshot`. dbt will create the SCD tables in the defined schema and play the `valid_from`, `valid_to` game whenever changes are detected.
|
|
|
|
## Tests
|
|
|
|
There are two kinds of tests:
|
|
|
|
- Singular tests: you make any `SELECT` statement you want. If the `SELECT` statement is run and any data is found, the test is considered failed. If the statement is run and no rows are returned, the test is considered passed.
|
|
- Built-in test: just a bunch of typical stuff: uniqueness, nullability, enum validations and relationship (referential integrity)
|
|
|
|
You can also define your own custom generic tests.
|
|
|
|
|
|
## Macros
|
|
|
|
- Macros are jinja templates.
|
|
- There are many built-in macros in dbt, but you can also use your own macros.
|
|
- dbt packages exist and you can use them to have more tests and macros that you can use.
|
|
|
|
|
|
## Documentation
|
|
|
|
- Documentation is kept in the repo (yay)
|
|
- Documentation can be defined in yaml files or in standalone markdown files. For example, the landing page can be customized with an `overview.md` file.
|
|
- Documentation can be quick-served with dbt, but ideally you should compile it and serve it with a regular web server, like Nginx.
|
|
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|