udemy-complete-dbt-bootcamp/notes/8.md

50 lines
2 KiB
Markdown
Raw Normal View History

2023-10-30 16:57:30 +01:00
## Models
Models are the fundamental concept behind dbt.
They are stored as SQL files in the `models` folder.
2023-10-30 17:24:01 +01:00
Models can be related between themselves to map dependencies.
## Materializations
- Ways in which a model can be stored in the database. There are 4:
- View: it's just a view
- Table: the model gets stored as a table
- Incremental: also a table, but can only create new records, not update
- Ephemeral: it's actually NOT materializing. The model can be used by dependents, but it won't be materialized in the DB. It will truly only be a CTE that gets used by other models. Mostly for intermediate states in transformations.
2023-10-30 18:04:19 +01:00
Materializations can be defined at the model level, folder level and project level. This can be modified in the `dbt_project.yml` file, under the `models` key.
To set materialization config at the model level, one must make a jinja tag at the start of the file and call the `config` dbt function. See an example below:
```python
{{
config(
materialized = 'incremental',
on_schema_change = 'fail'
)
}}
```
Incremental materializations need to a block that defines the logic to apply in incremental loads (as opposed to the 'normal' logic, that gets apply on first runs). See below an example:
```SQL
[... rest of query ...]
WHERE
review_text IS NOT NULL
{% if is_incremental() %}
AND review_date > (SELECT MAX(review_date) FROM {{ this }})
{% endif %}
```
Bear in mind that how to define the strategy to determine what should be loaded is up to the engineer. Any SQL can be placed within the `if is_incremental()` block. In the example above, we have a date field that easily signals what's the most recent date the table has currently seen.
2023-10-30 17:24:01 +01:00
2023-10-31 17:22:51 +01:00
## Sources and seeds
Seeds are local files that you upload to a DWH from dbt. You place them as CSVs in the `seeds` folder.
Sources are an abstraction layer on top of the input tables. They are not strictly necessary, but can help make the project more structured. To create sources, you create a `sources.yml` file and place it in the `models` dir.