2023-10-27 10:05:53 +02:00
# Database
The course is designed to be done on Snowflake. But I am a stubborn idiot, and I want to try dbt with PostgreSQL, so I'll just do that.
This dir contains some useful bits to raise a local PostgreSQL instance with Docker that resembles as much as possible the Snowflake environment proposed in the course.
## Setup steps
- Run a `docker compose up` with the yaml file of this dir.
- Run the following commands to get the database ready in it's starting state
```SQL
CREATE DATABASE airbnb;
-- Connect to your newly created `airbnb` database for the next commands.
CREATE SCHEMA raw;
2023-10-29 18:40:01 +01:00
-- The following tables should be created in the `raw` schema
2023-10-27 10:05:53 +02:00
CREATE TABLE raw_listings (
id INTEGER,
listing_url VARCHAR(1000),
name VARCHAR(256),
room_type VARCHAR(256),
minimum_nights INTEGER,
host_id INTEGER,
price VARCHAR(256),
created_at TIMESTAMP,
updated_at TIMESTAMP
);
CREATE TABLE raw_reviews (
listing_id INTEGER,
date TIMESTAMP,
reviewer_name VARCHAR(256),
comments TEXT,
sentiment TEXT
);
CREATE TABLE raw_hosts (
id INTEGER,
name VARCHAR(256),
is_superhost VARCHAR(256),
created_at TIMESTAMP,
updated_at TIMESTAMP
);
2023-10-29 18:40:01 +01:00
CREATE SCHEMA dev;
-- Create a user for dbt activity
CREATE USER transformation_user WITH ENCRYPTED PASSWORD 'transformation_user_password';
-- Allow dbt user to read from raw schema
GRANT CONNECT ON DATABASE airbnb TO transformation_user;
GRANT USAGE ON SCHEMA raw TO transformation_user;
GRANT SELECT ON ALL TABLES IN SCHEMA raw TO transformation_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA raw GRANT SELECT ON TABLES TO transformation_user;
GRANT ALL ON SCHEMA dev TO transformation_user;
ALTER SCHEMA dev owner to transformation_user;
2023-10-27 10:05:53 +02:00
```
2023-10-27 10:15:06 +02:00
After, you will have to download some CSV files with the data to populate the database. The AWS CLI commands below will download them for you:
2023-10-27 10:05:53 +02:00
```bash
aws s3 cp s3://dbtlearn/listings.csv listings.csv
aws s3 cp s3://dbtlearn/reviews.csv reviews.csv
aws s3 cp s3://dbtlearn/hosts.csv hosts.csv
```
2023-10-27 10:15:06 +02:00
How to put the data into the databases is up to you. I've done it successfully using the import functionality of DBeaver.
2023-10-27 10:15:24 +02:00
2023-10-30 09:09:38 +01:00
## Known issues
- Originally, the image `postgres:16` was used, but some locals had a quirky issue running it. The solution was to switch to `postgres:16