152 lines
4.4 KiB
Markdown
152 lines
4.4 KiB
Markdown
|
|
# How to SQL rollup
|
|||
|
|
|
|||
|
|
# Introduction
|
|||
|
|
|
|||
|
|
Do you need to compute metric aggregations at different granularities and you’re tired to create tons of code just to get a Global category?
|
|||
|
|
|
|||
|
|
Introducing SQL rollup!
|
|||
|
|
|
|||
|
|
Instead of doing this:
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
with booking_count_per_state as (
|
|||
|
|
select
|
|||
|
|
booking_state,
|
|||
|
|
count(1) as booking_count
|
|||
|
|
from
|
|||
|
|
intermediate.int_core__bookings
|
|||
|
|
group by
|
|||
|
|
1
|
|||
|
|
),
|
|||
|
|
global_bookings as (
|
|||
|
|
select
|
|||
|
|
'Global' as booking_state,
|
|||
|
|
sum(booking_count) as booking_count
|
|||
|
|
from
|
|||
|
|
booking_count_per_state
|
|||
|
|
group by
|
|||
|
|
1
|
|||
|
|
)
|
|||
|
|
select *
|
|||
|
|
from global_bookings
|
|||
|
|
union all
|
|||
|
|
select *
|
|||
|
|
from booking_count_per_state
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
To get:
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
You can use `ROLLUP` to simplify the code:
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
select
|
|||
|
|
coalesce(booking_state,'Global') as booking_state,
|
|||
|
|
count(1) as booking_count
|
|||
|
|
from
|
|||
|
|
intermediate.int_core__bookings
|
|||
|
|
group by
|
|||
|
|
rollup(booking_state) -- NOT EQUIVALENT to group by rollup(1)!!!
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
and get the same results (note we didn’t order by any field):
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
Be aware that the use of numeric wildcards such as `group by rollup(1)` **is not advised**. Please read the next section to understand more.
|
|||
|
|
|
|||
|
|
# Step-by-step guide (to avoid common mistakes)
|
|||
|
|
|
|||
|
|
By default, when rolling up, SQL will create a category with the aggregated value with a null name. If we follow the example with Bookings, we would have:
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
select
|
|||
|
|
booking_state,
|
|||
|
|
count(1) as booking_count
|
|||
|
|
from
|
|||
|
|
intermediate.int_core__bookings
|
|||
|
|
group by
|
|||
|
|
rollup(1) -- equivalent to group by rollup(booking_state)
|
|||
|
|
```
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
That’s why in this case we used `COALESCE` to specify that the resulting value should be called Global, since it counts all Bookings.
|
|||
|
|
|
|||
|
|
However, **we should be very, very conscious of null values**. Let’s go for another example, in this case, on guest payments. Let’s count the total amount of payments per verification payment type. To start with, we can do:
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
-- SIMPLE QUERY
|
|||
|
|
select
|
|||
|
|
verification_payment_type,
|
|||
|
|
count(1) as total_payments
|
|||
|
|
from
|
|||
|
|
intermediate.int_core__verification_payments icvp
|
|||
|
|
group by
|
|||
|
|
1 -- equivalent to group by verification_payment_type
|
|||
|
|
order by
|
|||
|
|
total_payments desc
|
|||
|
|
```
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
And we see we have 2k rows that do not have any verification payment type set. Since we know that rollup will create another category with NULL, first things first, we should tackle these verification payments that do not have a type set. Intuitively, we can name this category as `Unknown`:
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
-- QUERY SETTING UNKNOWN TO NULL VALUES
|
|||
|
|
select
|
|||
|
|
coalesce(verification_payment_type, 'Unknown') as verification_payment_type,
|
|||
|
|
count(1) as total_payments
|
|||
|
|
from
|
|||
|
|
intermediate.int_core__verification_payments icvp
|
|||
|
|
group by
|
|||
|
|
1 -- equivalent to group by coalesce(verification_payment_type, 'Unknown')
|
|||
|
|
```
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
Now, if adding the rollup, we should get a new Null category with the total amount of payments:
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
-- QUERY SETTING UNKNOWN TO NULL VALUES WITH ROLLUP (1)
|
|||
|
|
select
|
|||
|
|
coalesce(verification_payment_type, 'Unknown') as verification_payment_type,
|
|||
|
|
count(1) as total_payments
|
|||
|
|
from
|
|||
|
|
intermediate.int_core__verification_payments icvp
|
|||
|
|
group by
|
|||
|
|
rollup(1) -- equivalent to group by rollup(coalesce(verification_payment_type, 'Unknown'))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
Note that we’re using `group by rollup(1)`. This is equivalent of doing `group by rollup(coalesce(verification_payment_type, 'Unknown'))`
|
|||
|
|
|
|||
|
|
Cool but… how can we now re-name the resulting Null value to Global?
|
|||
|
|
|
|||
|
|
- We need to add another coalesce in the select clause,
|
|||
|
|
- But we need to keep the previous aggregation `group by rollup(coalesce(verification_payment_type, 'Unknown'))`
|
|||
|
|
|
|||
|
|
In other words:
|
|||
|
|
|
|||
|
|
```sql
|
|||
|
|
-- QUERY SETTING UNKNOWN TO NULL VALUES WITH WELL CREATED ROLLUP
|
|||
|
|
select
|
|||
|
|
coalesce(coalesce(verification_payment_type, 'Unknown'),'Global') as verification_payment_type,
|
|||
|
|
count(1) as total_payments
|
|||
|
|
from
|
|||
|
|
intermediate.int_core__verification_payments icvp
|
|||
|
|
group by
|
|||
|
|
rollup(coalesce(verification_payment_type, 'Unknown')) -- NOT EQUIVALENT to group by rollup(1)!!!
|
|||
|
|
```
|
|||
|
|
|
|||
|
|

|
|||
|
|
|
|||
|
|
Et voilà!
|
|||
|
|
|
|||
|
|
The key aspect here is understanding that the field that we’re showing in the `select` clause is different from the field we’re using to conduct the `group by rollup`. Therefore, it’s extremely recommended to explicitly specify the `group by rollup` condition instead of using the numeric wildcards.
|
|||
|
|
|
|||
|
|
# Additional Links
|
|||
|
|
|
|||
|
|
https://neon.tech/postgresql/postgresql-tutorial/postgresql-rollup
|