sh-notion/notion_data_team_no_files/How to SQL rollup 1600446ff9c9806c9fdac2ff48e48bd3.md

152 lines
4.4 KiB
Markdown
Raw Permalink Normal View History

2025-07-11 16:15:17 +02:00
# How to SQL rollup
# Introduction
Do you need to compute metric aggregations at different granularities and youre tired to create tons of code just to get a Global category?
Introducing SQL rollup!
Instead of doing this:
```sql
with booking_count_per_state as (
select
booking_state,
count(1) as booking_count
from
intermediate.int_core__bookings
group by
1
),
global_bookings as (
select
'Global' as booking_state,
sum(booking_count) as booking_count
from
booking_count_per_state
group by
1
)
select *
from global_bookings
union all
select *
from booking_count_per_state
```
To get:
![image.png](image%2041.png)
You can use `ROLLUP` to simplify the code:
```sql
select
coalesce(booking_state,'Global') as booking_state,
count(1) as booking_count
from
intermediate.int_core__bookings
group by
rollup(booking_state) -- NOT EQUIVALENT to group by rollup(1)!!!
```
and get the same results (note we didnt order by any field):
![image.png](image%2042.png)
Be aware that the use of numeric wildcards such as `group by rollup(1)` **is not advised**. Please read the next section to understand more.
# Step-by-step guide (to avoid common mistakes)
By default, when rolling up, SQL will create a category with the aggregated value with a null name. If we follow the example with Bookings, we would have:
```sql
select
booking_state,
count(1) as booking_count
from
intermediate.int_core__bookings
group by
rollup(1) -- equivalent to group by rollup(booking_state)
```
![image.png](image%2043.png)
Thats why in this case we used `COALESCE` to specify that the resulting value should be called Global, since it counts all Bookings.
However, **we should be very, very conscious of null values**. Lets go for another example, in this case, on guest payments. Lets count the total amount of payments per verification payment type. To start with, we can do:
```sql
-- SIMPLE QUERY
select
verification_payment_type,
count(1) as total_payments
from
intermediate.int_core__verification_payments icvp
group by
1 -- equivalent to group by verification_payment_type
order by
total_payments desc
```
![image.png](image%2044.png)
And we see we have 2k rows that do not have any verification payment type set. Since we know that rollup will create another category with NULL, first things first, we should tackle these verification payments that do not have a type set. Intuitively, we can name this category as `Unknown`:
```sql
-- QUERY SETTING UNKNOWN TO NULL VALUES
select
coalesce(verification_payment_type, 'Unknown') as verification_payment_type,
count(1) as total_payments
from
intermediate.int_core__verification_payments icvp
group by
1 -- equivalent to group by coalesce(verification_payment_type, 'Unknown')
```
![image.png](image%2045.png)
Now, if adding the rollup, we should get a new Null category with the total amount of payments:
```sql
-- QUERY SETTING UNKNOWN TO NULL VALUES WITH ROLLUP (1)
select
coalesce(verification_payment_type, 'Unknown') as verification_payment_type,
count(1) as total_payments
from
intermediate.int_core__verification_payments icvp
group by
rollup(1) -- equivalent to group by rollup(coalesce(verification_payment_type, 'Unknown'))
```
![image.png](image%2046.png)
Note that were using `group by rollup(1)`. This is equivalent of doing `group by rollup(coalesce(verification_payment_type, 'Unknown'))`
Cool but… how can we now re-name the resulting Null value to Global?
- We need to add another coalesce in the select clause,
- But we need to keep the previous aggregation `group by rollup(coalesce(verification_payment_type, 'Unknown'))`
In other words:
```sql
-- QUERY SETTING UNKNOWN TO NULL VALUES WITH WELL CREATED ROLLUP
select
coalesce(coalesce(verification_payment_type, 'Unknown'),'Global') as verification_payment_type,
count(1) as total_payments
from
intermediate.int_core__verification_payments icvp
group by
rollup(coalesce(verification_payment_type, 'Unknown')) -- NOT EQUIVALENT to group by rollup(1)!!!
```
![image.png](image%2047.png)
Et voilà!
The key aspect here is understanding that the field that were showing in the `select` clause is different from the field were using to conduct the `group by rollup`. Therefore, its extremely recommended to explicitly specify the `group by rollup` condition instead of using the numeric wildcards.
# Additional Links
https://neon.tech/postgresql/postgresql-tutorial/postgresql-rollup