tiny changes

This commit is contained in:
Pablo Martin 2024-02-21 09:15:07 +01:00
parent 4c55aef9e1
commit 72a99cf315

View file

@ -23,7 +23,7 @@ Follow this to deploy the entire data infra.
- We will create an SSH Keypair for this deployment. It will be used to access VMs, Git repos and other services.
- Create the SSH Key pair
- Name the key: `superhog-data-<your-env>-general-ssh`
- Name the key: `superhog-data-general-ssh-<your-env>`
- Add tags:
- `team: data`
- `environment: <your-env>`
@ -106,7 +106,7 @@ Follow this to deploy the entire data infra.
- Protocol: TCP
- Action: Allow
- Priority: 100
- SSH Rule
- RDP Rule
- Name: AllowRDPFromJumphostInbound
- Source: the addresss range for the `jumphost-subnet`. In this example, `10.69.0.0/29`.
- Source port ranges: *
@ -116,7 +116,7 @@ Follow this to deploy the entire data infra.
- Action: Allow
- Priority: 110
- Airbyte web rule
- Name: AllowAirbyteWebFromJumphostInbound
- Name: AllowWebFromJumphostInbound
- Source: the addresss range for the `jumphost-subnet`. In this example, `10.69.0.0/29`.
- Source port ranges: *
- Destination: the addresss range for the `services-subnet`. In this example, `10.69.0.64/26`.
@ -208,7 +208,7 @@ Follow this to deploy the entire data infra.
- Use Ubuntu Server 22.04
- Use Size: `Standard_B1s`
- Use username: `azureuser`
- Use the SSH Key: `superhog-data-<your-env>-general-ssh`
- Use the SSH Key: `superhog-data-general-ssh-<your-env>`
- Select the option `None` for Public inbound ports.
- Disk settings
- Defaults are fine. This barely needs any disk.
@ -245,8 +245,12 @@ Follow this to deploy the entire data infra.
- Run the following script (requires `sudo`) to install wireguard and configure it
- Pay attention: you need to fill in the public IP manually, as well as the network mask of the virtual network
- *Note: the IPs chosen for the VPN can absolutely be changed. Just make sure they are consistent across the server and client configurations of the VPN.*
- *Note: you need to input the public IP and the VNET network mask manually at the top of the script.*
```bash
JUMPHOST_PUBLIC_IP=<write-the-public-ip-here>
NETWORK_MASK_FOR_VNET=<write-the-network-mask-here>
echo "Installing Wireguard."
apt update
apt install wireguard -y
@ -295,8 +299,8 @@ Follow this to deploy the entire data infra.
[Peer]
PublicKey = ${SERVER_PUBLIC_KEY}
AllowedIPs = 192.168.69.1/32,<network-mask-for-vnet>
Endpoint = <fill-public-ip-here>:52420
AllowedIPs = 192.168.69.1/32,${NETWORK_MASK_FOR_VNET}
Endpoint = ${JUMPHOST_PUBLIC_IP}:52420
##############################
EOF
@ -427,9 +431,9 @@ Follow this to deploy the entire data infra.
- An airbyte user, with permission to create new schemas.
- A Power BI user, with `consumer` role.
- *Note: replace the password fields with serious passwords and note them down.*
- *Note: replace the name of the admin user*
```sql
GRANT pg_read_all_data TO dwh_admin_infratest;
CREATE ROLE airbyte_user LOGIN PASSWORD 'password' VALID UNTIL 'infinity';
GRANT CREATE ON DATABASE dwh TO airbyte_user;
@ -441,6 +445,8 @@ Follow this to deploy the entire data infra.
GRANT ALL ON ALL TABLES IN SCHEMA staging TO modeler;
GRANT ALL ON ALL TABLES IN SCHEMA intermediate TO modeler;
GRANT ALL ON ALL TABLES IN SCHEMA reporting TO modeler;
GRANT modeler TO dwh_admin_<your-env>;
ALTER SCHEMA staging OWNER TO modeler;
ALTER SCHEMA intermediate OWNER TO modeler;
ALTER SCHEMA reporting OWNER TO modeler;
@ -467,9 +473,9 @@ Follow this to deploy the entire data infra.
- Basic settings
- Name it: `airbyte-<your-env>`
- Use Ubuntu Server 22.04
- Use Size: `Standard_DS1_v2`
- I suggest size for testing `Standard_DS1_v2`. For production, get something beefier.
- Use username: `azureuser`
- Use the SSH Key: `superhog-data-<your-env>-general-ssh`
- Use the SSH Key: `superhog-data-general-ssh-<your-env>`
- Select the option `None` for Public inbound ports.
- Disk settings
- Increasing the data disk to at least 64gb as a starting point is recommended. Airbyte can be a bit of a disk hog, and running low on space might lead to obscure errors happening. Start with 64gb and monitor as you increase usage.
@ -608,7 +614,7 @@ Follow this to deploy the entire data infra.
## 7. dbt
- Our dbt project (https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project) can be deployed on any linux VM within the virtual network. The instructions on how to deploy and schedule it are in the project repository.
- Our dbt project (<https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project>) can be deployed on any linux VM within the virtual network. The instructions on how to deploy and schedule it are in the project repository.
- You can opt to deploy it in the same machine where airbyte is stored, since that machine is probably fairly underutilized.
## 8. Monitoring