322 lines
12 KiB
Markdown
322 lines
12 KiB
Markdown
# Human Script
|
|
|
|
Follow this to deploy the entire data infra.
|
|
|
|
## 0. Pre-requisites and conventions
|
|
|
|
- You need an Azure subscription and a user with administrator rights in it.
|
|
- Whenever you see `<your-env>`, you should replace that with `dev`,`uat`, `prd` or whatever fits your environment.
|
|
- We traditionally deploy resources on the `UK South` region. Unless stated otherwise, you should deploy resources there.
|
|
- You have an SSH key pair ready to use for access to the different machines. You can always add more pairs later.
|
|
|
|
## 1. Resource group and SSH Keypair
|
|
|
|
### 1.1 Create Resource Group
|
|
|
|
- Create a resource group. This resource group will hold all the resources. For the rest of this guide, assume this is the resource group where you must create resources.
|
|
- Name it: `superhog-data-rg-<your-env>`
|
|
- Add tags:
|
|
- `team: data`
|
|
- `environment: <your-env>`
|
|
|
|
### 1.2 SSH Keypair
|
|
|
|
- We will create an SSH Keypair for this deployment. It will be used to access VMs, Git repos and other services.
|
|
- Create the SSH Key pair
|
|
- Name the key: `superhog-data-<your-env>-general-ssh`
|
|
- Add tags:
|
|
- `team: data`
|
|
- `environment: <your-env>`
|
|
- Pay attention when storing the private key. You probably want to store it in a safe password manager, like Keeper.
|
|
- Optionally, you can also be extra paranoid, generate the SSH key locally and only upload the public key to Azure. Up to you.
|
|
|
|
## 2. Networking
|
|
|
|
### 2.1 VNET
|
|
|
|
- Create a virtual network. This virtual network is where all our infra will live. For the rest of this guide, assume this is the network where you must connect services.
|
|
- Name it: `superhog-data-vnet-<your-env>`
|
|
- You need to think what the network range should be like. For example, you could decide that the entire vnet will be contained within. For reference, we should be fine with a `/24` space (256 addresses) since we will only have a handful network interfaces connecting.
|
|
- As an example, we will use `10.69.0.0/24`. This link might be helpful: <https://www.davidc.net/sites/default/subnets/subnets.html?network=10.69.0.0&mask=24&division=11.f10>
|
|
- You need to add three subnets:
|
|
- Add no network security groups to any of the subnets still. We will create those later.
|
|
- Jumphost subnet
|
|
- This subnet is where jumphost boxes will live.
|
|
- It will be the only subnet where we allow inbound connections from WAN.
|
|
- Name it `jumphost-subnet`.
|
|
- For our example, we will make it `10.69.0.0/29` (8 addresses).
|
|
- Database subnet
|
|
- This subnet is where the DWH database will live.
|
|
- Inbound traffic will be allowed from both the jumphost subnet as well as the services subnet.
|
|
- Name it `database-subnet`
|
|
- For our example, we will make it `10.69.0.8/29` (8 addresses).
|
|
- Services subnet
|
|
- This subnet is where most VMs dedicated to data services live (Airbyte, dbt, PBI Data Gateway, etc).
|
|
- Inbound traffic will only be allowed from the jumphost subnet.
|
|
- Name it `services-subnet`
|
|
- For our example, we will make it `10.69.0.64/26` (64 addresses)
|
|
- Add tags:
|
|
- `team: data`
|
|
- `environment: <your-env>`
|
|
- `project: network`
|
|
|
|
### 2.2 Network security groups
|
|
|
|
- You will create three network security groups (NSG)
|
|
- Jumphost NSG
|
|
- Name it: `superhog-data-nsg-jumphost-<your-env>`
|
|
- Purpose: only allow connecting to the VPN server. We deny absolutely any other inbound traffic.
|
|
- Add tags:
|
|
- `team: data`
|
|
- `environment: <your-env>`
|
|
- `project: network`
|
|
- Add the following inbound rules
|
|
- VPN Rule
|
|
- Name: AllowWireguardInbound
|
|
- Source: Any
|
|
- Source port ranges: *
|
|
- Destination: the addresss range for the `jumphost-subnet`. In this example, `10.69.0.0/29`.
|
|
- Destination port ranges: 51420
|
|
- Protocol: UDP
|
|
- Action: Allow
|
|
- Priority: 100
|
|
- Deny Rule
|
|
- Name: DenyAllInbound
|
|
- Source: Any
|
|
- Source port ranges: *
|
|
- Destination: Any
|
|
- Destination port ranges: *
|
|
- Protocol: Any
|
|
- Action: Allow
|
|
- Priority: 1000
|
|
- Services NSG
|
|
- Name it: `superhog-data-nsg-services-<your-env>`
|
|
- Purpose: only allow the service VMs to be reached from our jumphost subnet. We deny absolutely any other inbound traffic.
|
|
- Add tags:
|
|
- `team: data`
|
|
- `environment: <your-env>`
|
|
- `project: network`
|
|
- Add the following inbound rules
|
|
- SSH Rule
|
|
- Name: AllowSSHFromJumphostInbound
|
|
- Source: the addresss range for the `jumphost-subnet`. In this example, `10.69.0.0/29`.
|
|
- Source port ranges: *
|
|
- Destination: the addresss range for the `services-subnet`. In this example, `10.69.0.64/26`.
|
|
- Destination port ranges: 22
|
|
- Protocol: TCP
|
|
- Action: Allow
|
|
- Priority: 100
|
|
- SSH Rule
|
|
- Name: AllowRDPFromJumphostInbound
|
|
- Source: the addresss range for the `jumphost-subnet`. In this example, `10.69.0.0/29`.
|
|
- Source port ranges: *
|
|
- Destination: the addresss range for the `services-subnet`. In this example, `10.69.0.64/26`.
|
|
- Destination port ranges: 3389
|
|
- Protocol: TCP
|
|
- Action: Allow
|
|
- Priority: 110
|
|
- Airbyte web rule
|
|
- Name: AllowAirbyteWebFromJumphostInbound
|
|
- Source: the addresss range for the `jumphost-subnet`. In this example, `10.69.0.0/29`.
|
|
- Source port ranges: *
|
|
- Destination: the addresss range for the `services-subnet`. In this example, `10.69.0.64/26`.
|
|
- Destination port ranges: 80
|
|
- Protocol: TCP
|
|
- Action: Allow
|
|
- Priority: 120
|
|
- Deny Rule
|
|
- Name: DenyAllInbound
|
|
- Source: Any
|
|
- Source port ranges: *
|
|
- Destination: Any
|
|
- Destination port ranges: *
|
|
- Protocol: Any
|
|
- Action: Allow
|
|
- Priority: 1000
|
|
- Database NSG
|
|
- Name it: `superhog-data-nsg-database-<your-env>`
|
|
- Purpose: make the databases subnet reachable only from our services subnet and from our jumphost subnet.
|
|
- Add tags:
|
|
- `team: data`
|
|
- `environment: <your-env>`
|
|
- `project: network`
|
|
- Add the following inbound rules
|
|
- Postgres Jumphost Rule
|
|
- Name: AllowPostgresFromJumphostInbound
|
|
- Source: the addresss range for the `jumphost-subnet`. In this example, `10.69.0.0/29`.
|
|
- Source port ranges: *
|
|
- Destination: the addresss range for the `databases-subnet`. In this example, `10.69.0.8/29`.
|
|
- Destination port ranges: 5432
|
|
- Protocol: TCP
|
|
- Action: Allow
|
|
- Priority: 100
|
|
- Postgres Services Rule
|
|
- Name: AllowPostgresFromServicesInbound
|
|
- Source: the addresss range for the `services-subnet`. In this example, `10.69.0.64/26`.
|
|
- Source port ranges: *
|
|
- Destination: the addresss range for the `databases-subnet`. In this example, `10.69.0.8/29`.
|
|
- Destination port ranges: 5432
|
|
- Protocol: TCP
|
|
- Action: Allow
|
|
- Priority: 110
|
|
- Deny Rule
|
|
- Name: DenyAllInbound
|
|
- Source: Any
|
|
- Source port ranges: *
|
|
- Destination: Any
|
|
- Destination port ranges: *
|
|
- Protocol: Any
|
|
- Action: Allow
|
|
- Priority: 1000
|
|
- Finally, you need to attach each NSG to the related subnet
|
|
- Visit the virtual network page and look for the subnets list
|
|
- For each subnet, select its NSG and attach it
|
|
|
|
### 2.3 Private DNS Zone
|
|
|
|
- We will set up a private DNS Zone to avoid using hardcoded IPs to refer to services within the virtual network. This makes integrations more resilient because a service can change its IP and still be reached by other services (as long as other network configs like firewalls are still fine).
|
|
- Create the Private DNS Zone
|
|
- Name it: `<your-env>.data.superhog.com`
|
|
- Add tags:
|
|
- `team: data`
|
|
- `environment: <your-env>`
|
|
- `project: network`
|
|
- Add a new virtual network link to the zone
|
|
- Name it: `privatelink-<your-env>.data.superhog.com`
|
|
- Associate it to the virtual network.
|
|
- Enable autoregistration
|
|
|
|
### 2.4 Public IP
|
|
|
|
- We will need a public IP for the jumphost.
|
|
- Create the public IP
|
|
- Name it: `superhog-data-jumphost-ip-<your-env>`
|
|
- For setting `Routing preference` select option: `Microsoft Network`
|
|
- Add tags:
|
|
- `team: data`
|
|
- `environment: <your-env>`
|
|
- `project: network`
|
|
|
|
## 3. Jumphost
|
|
|
|
### 3.1 Deploy Jumphost VM
|
|
|
|
- The first VM we must deploy is a jumphost, since that will be our door to all other services inside the virtual network.
|
|
- Create the VM
|
|
- Basic settings
|
|
- Name it: `jumphost`
|
|
- Use Ubuntu Server 22.04
|
|
- Use Size: `Standard_B1s`
|
|
- Use username: `azureuser`
|
|
- Use the SSH Key: `superhog-data-<your-env>-general-ssh`
|
|
- Select the option `None` for Public inbound ports.
|
|
- Disk settings
|
|
- Defaults are fine. This barely needs any disk.
|
|
- Networking
|
|
- Attach to the virtual network `superhog-data-vnet-<your-env>`
|
|
- Attach to the subnet `jumphost-subnet`
|
|
- Attach the public ip `superhog-data-jumphost-ip-<your-env>`
|
|
- For setting `NIC network security group` select option `None`
|
|
- Management settings
|
|
- Defaults are fine.
|
|
- Monitoring
|
|
- Defaults are fine.
|
|
- Advanced
|
|
- Defaults are fine.
|
|
- Add tags:
|
|
- `team: data`
|
|
- `environment: <your-env>`
|
|
- `project: network`
|
|
|
|
### 3.2 Configure a VPN Server
|
|
|
|
- The jumphost we just created is not accessible via SSH from WAN due to the NSG set in the jumphost subnet.
|
|
- To make it so, you should temporarily create a new rule like this in the NSG `superhog-data-nsg-jumphost-<your-env>`.
|
|
- Name: AllowSSHInboundTemporarily
|
|
- Source: your IP.
|
|
- Source port ranges: *
|
|
- Destination: the addresss range for the `jumphost-subnet`. In this example, `10.69.0.0/29`.
|
|
- Destination port ranges: 22
|
|
- Protocol: TCP
|
|
- Action: Allow
|
|
- Priority: 110
|
|
- Connect through SSH
|
|
- We will now set up a VPN server and client with Wireguard
|
|
- Run the following script (requires `sudo`) to install wireguard and configure it
|
|
- *Note: the IPs chosen for the VPN can absolutely be changed. Just make sure they are consistent across the server and client configurations of the VPN.*
|
|
|
|
```bash
|
|
echo "Installing Wireguard."
|
|
apt update
|
|
apt install wireguard -y
|
|
echo "Wireguard installed."
|
|
|
|
echo "Creating keys."
|
|
SERVER_PRIVATE_KEY=$(wg genkey)
|
|
SERVER_PUBLIC_KEY=$(echo "$SERVER_PRIVATE_KEY" | wg pubkey)
|
|
|
|
CLIENT_PRIVATE_KEY=$(wg genkey)
|
|
CLIENT_PUBLIC_KEY=$(echo "$CLIENT_PRIVATE_KEY" | wg pubkey)
|
|
echo "Keys created."
|
|
|
|
echo "Writing server config file."
|
|
touch /etc/wireguard/wg0.conf
|
|
cat > /etc/wireguard/wg0.conf << EOL
|
|
[Interface]
|
|
PrivateKey = ${SERVER_PRIVATE_KEY}
|
|
Address = 192.168.69.1/32
|
|
ListenPort = 52420
|
|
|
|
# IP forwarding
|
|
PreUp = sysctl -w net.ipv4.ip_forward=1
|
|
# IP masquerading
|
|
PreUp = iptables -t mangle -A PREROUTING -i wg0 -j MARK --set-mark 0x30
|
|
PreUp = iptables -t nat -A POSTROUTING ! -o wg0 -m mark --mark 0x30 -j MASQUERADE
|
|
PostDown = iptables -t mangle -D PREROUTING -i wg0 -j MARK --set-mark 0x30
|
|
PostDOwn = iptables -t nat -D POSTROUTING ! -o wg0 -m mark --mark 0x30 -j MASQUERADE
|
|
|
|
[Peer]
|
|
PublicKey = ${CLIENT_PUBLIC_KEY}
|
|
AllowedIPs = 192.168.70.1/32
|
|
|
|
EOL
|
|
echo "Server config file written."
|
|
|
|
echo "Configuration for client, copy paste in your machine."
|
|
cat << EOF
|
|
[Interface]
|
|
# Jumphost VPN
|
|
PrivateKey = ${CLIENT_PRIVATE_KEY}
|
|
Address = 192.168.70.1/32
|
|
# Uncomment when DNS Server is ready DNS = 192.168.69.1
|
|
|
|
[Peer]
|
|
PublicKey = ${SERVER_PUBLIC_KEY}
|
|
AllowedIPs = 192.168.69.1/32
|
|
Endpoint = <fill-public-ip-here>:52420
|
|
|
|
EOF
|
|
|
|
echo "Finished."
|
|
```
|
|
- CONTINUE HERE, INSTRUCTIONS ON HOW TO RAISE WG DAEMONS AND TEST
|
|
|
|
### 3.3 Configure a DNS Server
|
|
|
|
### 3.4 Harden the VM
|
|
|
|
- First, remove the AllowSSHInboundTemporarily rule that you added
|
|
|
|
## 4. DWH
|
|
|
|
## 5. Airbyte
|
|
|
|
## 6. Power BI
|
|
|
|
## 7. dbt
|
|
|
|
## 8. Status monitoring
|
|
|
|
## 9. Backups
|
|
|
|
- If you are working on a dev or staging environment, you might want to skip this section.
|