# Database The course is designed to be done on Snowflake. But I am a stubborn idiot, and I want to try dbt with PostgreSQL, so I'll just do that. This dir contains some useful bits to raise a local PostgreSQL instance with Docker that resembles as much as possible the Snowflake environment proposed in the course. ## Setup steps - Run a `docker compose up` with the yaml file of this dir. - Run the following commands to get the database ready in it's starting state ```SQL CREATE USER transformation_user WITH ENCRYPTED PASSWORD 'transformation_user_password'; CREATE DATABASE airbnb; -- Connect to your newly created `airbnb` database for the next commands. CREATE SCHEMA raw; CREATE TABLE raw_listings ( id INTEGER, listing_url VARCHAR(1000), name VARCHAR(256), room_type VARCHAR(256), minimum_nights INTEGER, host_id INTEGER, price VARCHAR(256), created_at TIMESTAMP, updated_at TIMESTAMP ); CREATE TABLE raw_reviews ( listing_id INTEGER, date TIMESTAMP, reviewer_name VARCHAR(256), comments TEXT, sentiment TEXT ); CREATE TABLE raw_hosts ( id INTEGER, name VARCHAR(256), is_superhost VARCHAR(256), created_at TIMESTAMP, updated_at TIMESTAMP ); ``` After, you will have to download some CSV files with the data to populate the database. The AWS CLI commands below will do that: ```bash aws s3 cp s3://dbtlearn/listings.csv listings.csv aws s3 cp s3://dbtlearn/reviews.csv reviews.csv aws s3 cp s3://dbtlearn/hosts.csv hosts.csv ``` # Introduction and Environment Setup ## Snowflake user creation Copy these SQL statements into a Snowflake Worksheet, select all and execute them (i.e. pressing the play button). ## Snowflake data import Copy these SQL statements into a Snowflake Worksheet, select all and execute them (i.e. pressing the play button). ```sql -- Set up the defaults USE WAREHOUSE COMPUTE_WH; USE DATABASE airbnb; USE SCHEMA RAW; -- Create our three tables and import the data from S3 CREATE OR REPLACE TABLE raw_listings (id integer, listing_url string, name string, room_type string, minimum_nights integer, host_id integer, price string, created_at datetime, updated_at datetime); COPY INTO raw_listings (id, listing_url, name, room_type, minimum_nights, host_id, price, created_at, updated_at) from 's3://dbtlearn/listings.csv' FILE_FORMAT = (type = 'CSV' skip_header = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '"'); CREATE OR REPLACE TABLE raw_reviews (listing_id integer, date datetime, reviewer_name string, comments string, sentiment string); COPY INTO raw_reviews (listing_id, date, reviewer_name, comments, sentiment) from 's3://dbtlearn/reviews.csv' FILE_FORMAT = (type = 'CSV' skip_header = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '"'); CREATE OR REPLACE TABLE raw_hosts (id integer, name string, is_superhost string, created_at datetime, updated_at datetime); COPY INTO raw_hosts (id, name, is_superhost, created_at, updated_at) from 's3://dbtlearn/hosts.csv' FILE_FORMAT = (type = 'CSV' skip_header = 1 FIELD_OPTIONALLY_ENCLOSED_BY = '"');