128 lines
3.8 KiB
Markdown
128 lines
3.8 KiB
Markdown
# Database
|
|
|
|
The course is designed to be done on Snowflake. But I am a stubborn idiot, and I want to try dbt with PostgreSQL, so I'll just do that.
|
|
|
|
This dir contains some useful bits to raise a local PostgreSQL instance with Docker that resembles as much as possible the Snowflake environment proposed in the course.
|
|
|
|
## Setup steps
|
|
|
|
- Run a `docker compose up` with the yaml file of this dir.
|
|
- Run the following commands to get the database ready in it's starting state
|
|
|
|
```SQL
|
|
CREATE USER transformation_user WITH ENCRYPTED PASSWORD 'transformation_user_password';
|
|
|
|
CREATE DATABASE airbnb;
|
|
|
|
-- Connect to your newly created `airbnb` database for the next commands.
|
|
|
|
CREATE SCHEMA raw;
|
|
|
|
CREATE TABLE raw_listings (
|
|
id INTEGER,
|
|
listing_url VARCHAR(1000),
|
|
name VARCHAR(256),
|
|
room_type VARCHAR(256),
|
|
minimum_nights INTEGER,
|
|
host_id INTEGER,
|
|
price VARCHAR(256),
|
|
created_at TIMESTAMP,
|
|
updated_at TIMESTAMP
|
|
);
|
|
|
|
CREATE TABLE raw_reviews (
|
|
listing_id INTEGER,
|
|
date TIMESTAMP,
|
|
reviewer_name VARCHAR(256),
|
|
comments TEXT,
|
|
sentiment TEXT
|
|
);
|
|
|
|
CREATE TABLE raw_hosts (
|
|
id INTEGER,
|
|
name VARCHAR(256),
|
|
is_superhost VARCHAR(256),
|
|
created_at TIMESTAMP,
|
|
updated_at TIMESTAMP
|
|
);
|
|
|
|
```
|
|
|
|
After, you will have to download some CSV files with the data to populate the database. The AWS CLI commands below will do that:
|
|
|
|
```bash
|
|
aws s3 cp s3://dbtlearn/listings.csv listings.csv
|
|
aws s3 cp s3://dbtlearn/reviews.csv reviews.csv
|
|
aws s3 cp s3://dbtlearn/hosts.csv hosts.csv
|
|
```
|
|
|
|
# Introduction and Environment Setup
|
|
|
|
## Snowflake user creation
|
|
Copy these SQL statements into a Snowflake Worksheet, select all and execute them (i.e. pressing the play button).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Snowflake data import
|
|
|
|
Copy these SQL statements into a Snowflake Worksheet, select all and execute them (i.e. pressing the play button).
|
|
|
|
```sql
|
|
-- Set up the defaults
|
|
USE WAREHOUSE COMPUTE_WH;
|
|
USE DATABASE airbnb;
|
|
USE SCHEMA RAW;
|
|
|
|
-- Create our three tables and import the data from S3
|
|
CREATE OR REPLACE TABLE raw_listings
|
|
(id integer,
|
|
listing_url string,
|
|
name string,
|
|
room_type string,
|
|
minimum_nights integer,
|
|
host_id integer,
|
|
price string,
|
|
created_at datetime,
|
|
updated_at datetime);
|
|
|
|
COPY INTO raw_listings (id,
|
|
listing_url,
|
|
name,
|
|
room_type,
|
|
minimum_nights,
|
|
host_id,
|
|
price,
|
|
created_at,
|
|
updated_at)
|
|
from 's3://dbtlearn/listings.csv'
|
|
FILE_FORMAT = (type = 'CSV' skip_header = 1
|
|
FIELD_OPTIONALLY_ENCLOSED_BY = '"');
|
|
|
|
|
|
CREATE OR REPLACE TABLE raw_reviews
|
|
(listing_id integer,
|
|
date datetime,
|
|
reviewer_name string,
|
|
comments string,
|
|
sentiment string);
|
|
|
|
COPY INTO raw_reviews (listing_id, date, reviewer_name, comments, sentiment)
|
|
from 's3://dbtlearn/reviews.csv'
|
|
FILE_FORMAT = (type = 'CSV' skip_header = 1
|
|
FIELD_OPTIONALLY_ENCLOSED_BY = '"');
|
|
|
|
|
|
CREATE OR REPLACE TABLE raw_hosts
|
|
(id integer,
|
|
name string,
|
|
is_superhost string,
|
|
created_at datetime,
|
|
updated_at datetime);
|
|
|
|
COPY INTO raw_hosts (id, name, is_superhost, created_at, updated_at)
|
|
from 's3://dbtlearn/hosts.csv'
|
|
FILE_FORMAT = (type = 'CSV' skip_header = 1
|
|
FIELD_OPTIONALLY_ENCLOSED_BY = '"');
|