diff --git a/human-script.md b/human-script.md index c262f02..6a23fad 100644 --- a/human-script.md +++ b/human-script.md @@ -437,6 +437,7 @@ Follow this to deploy the entire data infra. - A user `powerbi_user`, with `consumer` role. - A user `airbyte user`, with permission to create new schemas. - A user `billingdb_reader`, with permission to read some tables from the reporting schema. + - A user `ci_reader`, with `modeler` role. - *Note: replace the password fields with serious passwords and note them down.* - *Note: replace the name of the admin user* @@ -468,6 +469,10 @@ Follow this to deploy the entire data infra. CREATE ROLE billingdb_reader LOGIN PASSWORD 'password' VALID UNTIL 'infinity'; CREATE ROLE modeler INHERIT; + + CREATE ROLE ci_reader LOGIN PASSWORD 'password' VALID UNTIL 'infinity'; + GRANT modeler to ci_reader; + -- You might want to create a first personal user with modeler role here -- Login as airbyte_user @@ -764,6 +769,123 @@ We will deploy a dedicated VM to act as a web server for internal services. - Our dbt project () can be deployed on any linux VM within the virtual network. The instructions on how to deploy and schedule it are in the project repository. - You can opt to deploy it in the same machine where airbyte is stored, since that machine is probably fairly underutilized. +### 080.1 dbt CI server + +Having CI pipelines in the dbt git project is a great way to automate certain quality checks around the DWH code. The way our CI strategy is designed, you need to prepare a VM within our Data private network for CI jobs to run in there. This section explains how to set up the VM. Note that we will only cover infrastructure topics here: you'll have to check the dbt repository for the full story on how to set up the CI. We recommend covering the steps describe here before jumping into the dbt specific part of things. + +#### 080.1.1 Deploying the CI VM + +- We will have a dedicated VM for the CI pipelines. The pipelines can be resource hungry at times, so I recommend having a dedicated VM that is not shared with other workloads so you can assign resources adequately and avoid resource competition with other services. +- Create a new VM following these steps. + - Basic settings + - Name it: `pipeline-host-` + - Use Ubuntu Server 22.04 + - Size should be adjusted to the needs of the dbt project. I suggest starting on a `B2s` instance and drive upgrade decisions based on what you observe during normal usage. + - Use username: `azureuser` + - Use the SSH Key: `superhog-data-general-ssh-` + - Select the option `None` for Public inbound ports. + - Disk settings + - Disk requirements will vary depending on the nature of the dbt project state and the PRs. I suggest starting with the default 30gb and monitoring usage. If you see spikes that get close to 100%, increase the size to prevent a particularly heavy PR to consume all space. + - Networking + - Attach to the virtual network `superhog-data-vnet-` + - Attach to the subnet `services-subnet` + - Assign no public IP. + - For setting `NIC network security group` select option `None` + - Management settings + - Defaults are fine. + - Monitoring + - Defaults are fine. + - Advanced + - Defaults are fine. + - Add tags: + - `team: data` + - `environment: ` + - `project: dbt` +- Once the VM is running, you should be able to ssh into the machine when your VPN is active. + +#### 080.1.2 Install docker and docker compose + +- We will use docker and docker compose to run a dockerized Postgres server in the VM. +- You can install docker and docker compose by placing the following code in a script and running it: +```bash +#!/bin/bash +set -e # Exit on error + +echo "🔄 Updating system packages..." +sudo apt update && sudo apt upgrade -y + +echo "📦 Installing dependencies..." +sudo apt install -y \ + apt-transport-https \ + ca-certificates \ + curl \ + software-properties-common \ + lsb-release \ + gnupg2 \ + jq \ + lsb-release + +echo "🔑 Adding Docker GPG key..." +curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg + +echo "🖋️ Adding Docker repository..." +echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null + +echo "📦 Installing Docker..." +sudo apt update +sudo apt install -y docker-ce docker-ce-cli containerd.io + +echo "✅ Docker installed successfully!" + +echo "🔧 Enabling Docker to start on boot..." +sudo systemctl enable docker + +echo "🔄 Installing Docker Compose..." +DOCKER_COMPOSE_VERSION=$(curl -s https://api.github.com/repos/docker/compose/releases/latest | jq -r .tag_name) +sudo curl -L "https://github.com/docker/compose/releases/download/${DOCKER_COMPOSE_VERSION}/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose + +echo "📂 Setting permissions for Docker Compose..." +sudo chmod +x /usr/local/bin/docker-compose + +echo "✅ Docker Compose installed successfully!" + +# Verifying installation +echo "🔍 Verifying Docker and Docker Compose versions..." +docker --version +docker-compose --version + +usermod -a -G docker $USER +newgrp docker + +echo "✅ Docker and Docker Compose installation completed!" +``` + +#### 080.1.3 Install psql + +- CI pipelines require `psql`, Postgres CLI client, to be available. +- You can install it simply with `sudo apt-get install postgresql-client-16`. + +#### 080.1.4 Create user in DWH + +- The CI Postgres will use some Foreign Data Wrappers (FDW) pointing at the DWH. We will need a dedicated user in the DWH instance to control the permissions received by the CI server. +- The section of this guide dedicated to setting up the DWH explains how to create this user. If you have followed it, you might have already created the user. Otherwise, head there to complete this part. + +#### 080.1.5 Install the Azure Devops agent and sync with Devops + +- The VM needs to have a Microsoft provided Azure agent to be reachable by Devops. This agent listens to requests from Devops, basically allowing Devops to execute things on the VM. +- Some configuration needs to be done in the Azure Devops project to allow Azure Devops to reach the VM. +- You can find how to set this up in Ubuntu in these links: + - Official MSFT docs: https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/linux-agent?view=azure-devops + - Helpful walkthrough video: https://www.youtube.com/watch?v=Hy6fne9oQJM + +#### 080.1.6 Clone the project and further steps + +- We are going to need a local clone of the git repository to perform some setup steps, as well as for business as usual execution. +- To do this: + - Use or create some SSH key to have access to clone repos from Azure Devops. This could be the key `superhog-data-general-ssh-` or some other key. This guide leaves this detail up to you. You can read more on how to use SSH keys with Azure Devops here: https://learn.microsoft.com/en-us/azure/devops/repos/git/use-ssh-keys-to-authenticate?view=azure-devops. + - Once the CI VM is capable, clone the dbt project into the `azureuser` home dir. + - There are several steps after this, for which you should find instructions in the dbt repository itself. + ## 090. Monitoring ### 090.1 Infra monitoring