udemy-complete-dbt-bootcamp/code_thingies/database
2023-10-27 10:05:53 +02:00
..
docker-compose.yaml Preparing database stuff 2023-10-27 10:05:53 +02:00
README.md Preparing database stuff 2023-10-27 10:05:53 +02:00

Database

The course is designed to be done on Snowflake. But I am a stubborn idiot, and I want to try dbt with PostgreSQL, so I'll just do that.

This dir contains some useful bits to raise a local PostgreSQL instance with Docker that resembles as much as possible the Snowflake environment proposed in the course.

Setup steps

  • Run a docker compose up with the yaml file of this dir.
  • Run the following commands to get the database ready in it's starting state
CREATE USER transformation_user WITH ENCRYPTED PASSWORD 'transformation_user_password';

CREATE DATABASE airbnb;

-- Connect to your newly created `airbnb` database for the next commands.
    
CREATE SCHEMA raw;

CREATE TABLE raw_listings (
    id INTEGER,
    listing_url VARCHAR(1000),
    name VARCHAR(256),
    room_type VARCHAR(256),
    minimum_nights INTEGER,
    host_id INTEGER,
    price VARCHAR(256),
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

CREATE TABLE raw_reviews (
    listing_id INTEGER,
    date TIMESTAMP,
    reviewer_name VARCHAR(256),
    comments TEXT,
    sentiment TEXT
);

CREATE TABLE raw_hosts (
    id INTEGER,
    name VARCHAR(256),
    is_superhost VARCHAR(256),
    created_at TIMESTAMP,
    updated_at TIMESTAMP
);

After, you will have to download some CSV files with the data to populate the database. The AWS CLI commands below will do that:

aws s3 cp s3://dbtlearn/listings.csv listings.csv
aws s3 cp s3://dbtlearn/reviews.csv reviews.csv
aws s3 cp s3://dbtlearn/hosts.csv hosts.csv

Introduction and Environment Setup

Snowflake user creation

Copy these SQL statements into a Snowflake Worksheet, select all and execute them (i.e. pressing the play button).

Snowflake data import

Copy these SQL statements into a Snowflake Worksheet, select all and execute them (i.e. pressing the play button).

-- Set up the defaults
USE WAREHOUSE COMPUTE_WH;
USE DATABASE airbnb;
USE SCHEMA RAW;

-- Create our three tables and import the data from S3
CREATE OR REPLACE TABLE raw_listings
                    (id integer,
                     listing_url string,
                     name string,
                     room_type string,
                     minimum_nights integer,
                     host_id integer,
                     price string,
                     created_at datetime,
                     updated_at datetime);
                    
COPY INTO raw_listings (id,
                        listing_url,
                        name,
                        room_type,
                        minimum_nights,
                        host_id,
                        price,
                        created_at,
                        updated_at)
                   from 's3://dbtlearn/listings.csv'
                    FILE_FORMAT = (type = 'CSV' skip_header = 1
                    FIELD_OPTIONALLY_ENCLOSED_BY = '"');
                    

CREATE OR REPLACE TABLE raw_reviews
                    (listing_id integer,
                     date datetime,
                     reviewer_name string,
                     comments string,
                     sentiment string);
                    
COPY INTO raw_reviews (listing_id, date, reviewer_name, comments, sentiment)
                   from 's3://dbtlearn/reviews.csv'
                    FILE_FORMAT = (type = 'CSV' skip_header = 1
                    FIELD_OPTIONALLY_ENCLOSED_BY = '"');
                    

CREATE OR REPLACE TABLE raw_hosts
                    (id integer,
                     name string,
                     is_superhost string,
                     created_at datetime,
                     updated_at datetime);
                    
COPY INTO raw_hosts (id, name, is_superhost, created_at, updated_at)
                   from 's3://dbtlearn/hosts.csv'
                    FILE_FORMAT = (type = 'CSV' skip_header = 1
                    FIELD_OPTIONALLY_ENCLOSED_BY = '"');