udemy-complete-dbt-bootcamp/code_thingies/database/README.md
2023-10-27 10:15:06 +02:00

4 KiB

Database

The course is designed to be done on Snowflake. But I am a stubborn idiot, and I want to try dbt with PostgreSQL, so I'll just do that.

This dir contains some useful bits to raise a local PostgreSQL instance with Docker that resembles as much as possible the Snowflake environment proposed in the course.

Setup steps

  • Run a docker compose up with the yaml file of this dir.
  • Run the following commands to get the database ready in it's starting state
CREATE USER transformation_user WITH ENCRYPTED PASSWORD 'transformation_user_password';

CREATE DATABASE airbnb;

-- Connect to your newly created `airbnb` database for the next commands.
    
CREATE SCHEMA raw;

CREATE TABLE raw_listings (
    id INTEGER,
    listing_url VARCHAR(1000),
    name VARCHAR(256),
    room_type VARCHAR(256),
    minimum_nights INTEGER,
    host_id INTEGER,
    price VARCHAR(256),
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

CREATE TABLE raw_reviews (
    listing_id INTEGER,
    date TIMESTAMP,
    reviewer_name VARCHAR(256),
    comments TEXT,
    sentiment TEXT
);

CREATE TABLE raw_hosts (
    id INTEGER,
    name VARCHAR(256),
    is_superhost VARCHAR(256),
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

After, you will have to download some CSV files with the data to populate the database. The AWS CLI commands below will download them for you:

aws s3 cp s3://dbtlearn/listings.csv listings.csv
aws s3 cp s3://dbtlearn/reviews.csv reviews.csv
aws s3 cp s3://dbtlearn/hosts.csv hosts.csv

How to put the data into the databases is up to you. I've done it successfully using the import functionality of DBeaver.

Introduction and Environment Setup

Snowflake user creation

Copy these SQL statements into a Snowflake Worksheet, select all and execute them (i.e. pressing the play button).

Snowflake data import

Copy these SQL statements into a Snowflake Worksheet, select all and execute them (i.e. pressing the play button).

-- Set up the defaults
USE WAREHOUSE COMPUTE_WH;
USE DATABASE airbnb;
USE SCHEMA RAW;

-- Create our three tables and import the data from S3
CREATE OR REPLACE TABLE raw_listings
                    (id integer,
                     listing_url string,
                     name string,
                     room_type string,
                     minimum_nights integer,
                     host_id integer,
                     price string,
                     created_at datetime,
                     updated_at datetime);
                    
COPY INTO raw_listings (id,
                        listing_url,
                        name,
                        room_type,
                        minimum_nights,
                        host_id,
                        price,
                        created_at,
                        updated_at)
                   from 's3://dbtlearn/listings.csv'
                    FILE_FORMAT = (type = 'CSV' skip_header = 1
                    FIELD_OPTIONALLY_ENCLOSED_BY = '"');
                    

CREATE OR REPLACE TABLE raw_reviews
                    (listing_id integer,
                     date datetime,
                     reviewer_name string,
                     comments string,
                     sentiment string);
                    
COPY INTO raw_reviews (listing_id, date, reviewer_name, comments, sentiment)
                   from 's3://dbtlearn/reviews.csv'
                    FILE_FORMAT = (type = 'CSV' skip_header = 1
                    FIELD_OPTIONALLY_ENCLOSED_BY = '"');
                    

CREATE OR REPLACE TABLE raw_hosts
                    (id integer,
                     name string,
                     is_superhost string,
                     created_at datetime,
                     updated_at datetime);
                    
COPY INTO raw_hosts (id, name, is_superhost, created_at, updated_at)
                   from 's3://dbtlearn/hosts.csv'
                    FILE_FORMAT = (type = 'CSV' skip_header = 1
                    FIELD_OPTIONALLY_ENCLOSED_BY = '"');