udemy-complete-dbt-bootcamp/code_thingies/database
2023-10-29 18:43:50 +01:00
..
docker-compose.yaml Remove the restart policy. It's not really needed. 2023-10-29 18:43:50 +01:00
README.md dbt stuff 2023-10-29 18:40:01 +01:00

Database

The course is designed to be done on Snowflake. But I am a stubborn idiot, and I want to try dbt with PostgreSQL, so I'll just do that.

This dir contains some useful bits to raise a local PostgreSQL instance with Docker that resembles as much as possible the Snowflake environment proposed in the course.

Setup steps

  • Run a docker compose up with the yaml file of this dir.
  • Run the following commands to get the database ready in it's starting state
CREATE DATABASE airbnb;

-- Connect to your newly created `airbnb` database for the next commands.
    
CREATE SCHEMA raw;

-- The following tables should be created in the `raw` schema

CREATE TABLE raw_listings (
    id INTEGER,
    listing_url VARCHAR(1000),
    name VARCHAR(256),
    room_type VARCHAR(256),
    minimum_nights INTEGER,
    host_id INTEGER,
    price VARCHAR(256),
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

CREATE TABLE raw_reviews (
    listing_id INTEGER,
    date TIMESTAMP,
    reviewer_name VARCHAR(256),
    comments TEXT,
    sentiment TEXT
);

CREATE TABLE raw_hosts (
    id INTEGER,
    name VARCHAR(256),
    is_superhost VARCHAR(256),
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);


CREATE SCHEMA dev;


-- Create a user for dbt activity
CREATE USER transformation_user WITH ENCRYPTED PASSWORD 'transformation_user_password';

-- Allow dbt user to read from raw schema
GRANT CONNECT ON DATABASE airbnb TO transformation_user;
GRANT USAGE ON SCHEMA raw TO transformation_user;
GRANT SELECT ON ALL TABLES IN SCHEMA raw TO transformation_user;
ALTER DEFAULT PRIVILEGES IN SCHEMA raw GRANT SELECT ON TABLES TO transformation_user;

GRANT ALL ON SCHEMA dev TO transformation_user;
ALTER SCHEMA dev owner to transformation_user;

After, you will have to download some CSV files with the data to populate the database. The AWS CLI commands below will download them for you:

aws s3 cp s3://dbtlearn/listings.csv listings.csv
aws s3 cp s3://dbtlearn/reviews.csv reviews.csv
aws s3 cp s3://dbtlearn/hosts.csv hosts.csv

How to put the data into the databases is up to you. I've done it successfully using the import functionality of DBeaver.