This commit is contained in:
Pablo Martin 2024-11-14 16:19:22 +01:00
parent 409c23691b
commit c986715e98

View file

@ -115,7 +115,7 @@ Follow this to deploy the entire data infra.
- Protocol: TCP - Protocol: TCP
- Action: Allow - Action: Allow
- Priority: 110 - Priority: 110
- Airbyte web rule - Web server Rule
- Name: AllowWebFromJumphostInbound - Name: AllowWebFromJumphostInbound
- Source: the addresss range for the `jumphost-subnet`. In this example, `10.69.0.0/29`. - Source: the addresss range for the `jumphost-subnet`. In this example, `10.69.0.0/29`.
- Source port ranges: * - Source port ranges: *
@ -515,6 +515,62 @@ Follow this to deploy the entire data infra.
ALTER DEFAULT PRIVILEGES IN SCHEMA sync_<some-new-source> GRANT SELECT ON TABLES TO modeler; ALTER DEFAULT PRIVILEGES IN SCHEMA sync_<some-new-source> GRANT SELECT ON TABLES TO modeler;
``` ```
## 5. Web Gateway
We will deploy a dedicated VM to act as a web server for internal services.
### 5.1 Deploy Web Gateway VM
- Create a new VM following these steps.
- Basic settings
- Name it: `web-gateway-<your-env>`
- Use Ubuntu Server 22.04
- Use size: `Standard_B1s`
- Use username: `azureuser`
- Use the SSH Key: `superhog-data-general-ssh-<your-env>`
- Select the option `None` for Public inbound ports.
- Disk settings
- Defaults are fine. This barely needs any disk.
- Networking
- Attach to the virtual network `superhog-data-vnet-<your-env>`
- Attach to the subnet `services-subnet`
- Assign no public IP.
- For setting `NIC network security group` select option `None`
- Management settings
- Defaults are fine.
- Monitoring
- Defaults are fine.
- Advanced
- Defaults are fine.
- Add tags:
- `team: data`
- `environment: <your-env>`
- `project: network`
- Once the VM is running, you should be able to ssh into the machine when your VPN is active.
### 9.2 Deploying Caddy
- We need to install caddy in the VM. You can do so with the following commands:
```bash
sudo apt install -y debian-keyring debian-archive-keyring apt-transport-https curl
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
sudo apt update
sudo apt install caddy
```
- After the previous commands, you can verify that caddy is running properly as a systemd service with: `systemctl status caddy`
- You can also verify that Caddy is reachable (should be) by running the following command from your laptop while connected to the VPN: `curl web-gateway-<your-env>.<your-env>.data.superhog.com`. If you see a wall of HTML that looks like Caddy's demo page, it means Caddy is working as expected.
### 9.3 Pointing Caddy to internal services
- Caddy will need to be configured to act as the web server or reverse proxy of the different services within the services subnet. The details of these configurations are defined in sections below.
- As a general note, the pattern will generally be:
- You will need to include the right entry in the `Caddyfile` at `/etc/caddy/Caddyfile`.
- You will need to reload caddy with `sudo systemctl reload caddy.service`.
- If the web server needs to reach a specific port in some other VM, you will need to sort networking security out. If the VM you need to reach from the web server is within the internal services subnet, you'll have to add the necessary Inbound rules in the NSG `superhog-data-nsg-services-<your-env>`.
## 5. Airbyte ## 5. Airbyte
### 5.1 Deploying Airbyte VM ### 5.1 Deploying Airbyte VM
@ -556,8 +612,6 @@ Follow this to deploy the entire data infra.
AIRBYTE_ADMIN_USER=your-user-here AIRBYTE_ADMIN_USER=your-user-here
AIRBYTE_ADMIN_PASSWORD=your-password-here AIRBYTE_ADMIN_PASSWORD=your-password-here
YOUR_ENV=<your-env>
PRIVATE_DNS_ZONE_NAME=${YOUR_ENV}.data.superhog.com
echo "Installing docker." echo "Installing docker."
apt-get update -y apt-get update -y
@ -585,34 +639,64 @@ Follow this to deploy the entire data infra.
echo "Restarting Airbyte." echo "Restarting Airbyte."
docker compose down; docker compose up -d docker compose down; docker compose up -d
echo "Deploying Caddy Webserver" echo "You can now access at http://localhost:8000"
apt install -y debian-keyring debian-archive-keyring apt-transport-https curl
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
apt update
apt install caddy
echo "Write caddyfile"
touch /etc/caddy/Caddyfile
cat > /etc/caddy/Caddyfile << EOL
# Airbyte web UI
http://airbyte-${YOUR_ENV}.${PRIVATE_DNS_ZONE_NAME} {
reverse_proxy localhost:8000
}
EOL
echo "Restart caddy"
systemctl restart caddy
echo "You can now access at http://airbyte-${YOUR_ENV}.${PRIVATE_DNS_ZONE_NAME}"
echo "Finished." echo "Finished."
``` ```
- Visit <http://airbyte-><your-env>.<your-env>.data.superhog.com. If you are prompted for user and password, it means Airbyte is running properly and is reachable. - To check that Airbyte is running fine, run this command from a terminal within the Airbyte VM: `curl localhost:8000`. You should see some HTML for Airbyte's access denied page.
### 5.3 Making Airbyte Web UI reachable
- To provide access to the Airbyte UI, we will have to integrate it with the web gateway and our networking configurations.
- First, we need to allow the web gateway to reach Airbyte locally-served webserver.
- Use the Azure portal to navigate to the NSG `superhog-data-nsg-services-<your-env>` page.
- Add a new Inbound rule with the following details:
- Name: `Allow8000TCPWithinSubnet`
- Source: the addresss range for the `services-subnet`. In this example, `10.69.0.64/26`.
- Source port ranges: *
- Destination: the addresss range for the `services-subnet`. In this example, `10.69.0.64/26`.
- Destination port ranges: 8000
- Protocol: TCP
- Action: Allow
- Priority: Set something above existing rules, but below the `DenyAllInbound` rules.
- Next, we need to set a DNS entry to generate the URL that will be used to navigate to the Airbyte UI.
- Use the Azure portal to navigate to the Private DNS Zone `<your-env>.data.superhog.com` page.
- Create a new record with the following details:
- Name: `airbyte`
- Type: `A`
- IP Address: Look for the private IP address that was assigned to the VM `web-gateway-<your-env>` and place it here.
- Finally, we must create an entry in caddy's config file.
- SSH into the web gateway VM.
- Make a script with these commands and run it:
```bash
YOUR_ENV=<your-env>
PRIVATE_DNS_ZONE_NAME=${YOUR_ENV}.data.superhog.com
AIRBYTE_SUBDOMAIN=airbyte # If you followed this guide for the DNS bit, leave this value. If you chose a different subdomain, adjust accordingly
FULL_AIRBYTE_URL=http://${AIRBYTE_SUBDOMAIN}.${PRIVATE_DNS_ZONE_NAME}
echo "Write caddyfile"
touch /etc/caddy/Caddyfile
cat > /etc/caddy/Caddyfile << EOL
# Airbyte web UI
http://${FULL_AIRBYTE_URL} {
reverse_proxy http://airbyte-${YOUR_ENV}.${PRIVATE_DNS_ZONE_NAME}:8000
}
EOL
echo "Restart caddy"
systemctl restart caddy
echo "You can now access at http://${FULL_AIRBYTE_URL}
```
- If everything is working properly, you should now be able to reach airbyte at the printed URL.
- If something doesn't work, I would advise troubleshooting through the chain of VMs to find where is the connection breaking down.
#TODO CONTINUE HERE
## 6. Power BI ## 6. Power BI
@ -677,24 +761,53 @@ WIP: we are planning on using Azure Dashboards with metrics.
WIP: we need support to learn how to use statuspage.io WIP: we need support to learn how to use statuspage.io
## 9. Backups ### 9.3 Configuring Caddy
- Now that caddy is running, you can configure it to serve whatever you need.
- This instance is designed to be the external entrypoint to serve any internal webpages to users of web services within the data virtual network. It's possible that, by the time you are reading this, there are more services that we planned originally.
- As an example, we will now show how to reverse proxy the Airbyte UI. For other services, you can follow a similar pattern.
- Edit the caddy config file with `sudo nano /etc/caddy/Caddyfile`
- To add a reverse proxy for Airbyte, add this entry:
```bash
http://airbyte.prd.data.superhog.com {
reverse_proxy http://airbyte-<your-env>.<your-env>.data.superhog.com {
#reverse_proxy http://10.69.0.68:80 {
header_up Cookie {>Cookie}
header_up Host airbyte-prd.prd.data.superhog.com
header_up X-Real-IP {remote}
header_up X-Forwarded-For {remote}
header_up X-Forwarded-Proto {scheme}
}
}
```
- Note that, if you need to do more changes in configuration, you can have Caddy pick up the changes by running `sudo systemctl reload caddy`. This will reload the configuration without incurring any downtime, as `stop` and `start` would.
### 9.4 Additional networking actions
- Allow internal service VMs to reach each other at port 80, it's in the NSG for the services subnet
## 10. Backups
- If you are working on a dev or staging environment, you might want to skip this section. - If you are working on a dev or staging environment, you might want to skip this section.
### 9.1 DWH ### 10.1 DWH
- Backups are managed with Azure. In the Azure Portal page for the PostgreSQL service, visit section `Backup and restore`. Production servers should have 14 days as a retention period. - Backups are managed with Azure. In the Azure Portal page for the PostgreSQL service, visit section `Backup and restore`. Production servers should have 14 days as a retention period.
### 9.2 Jumphost ### 10.2 Jumphost
- Jumphosts barely hold any data at all. Although it's quite tempting to forget about this and simply raise another VM if something goes wrong, it would be annoying to have to regenerate the keys of both the VPN server and other clients. - Jumphosts barely hold any data at all. Although it's quite tempting to forget about this and simply raise another VM if something goes wrong, it would be annoying to have to regenerate the keys of both the VPN server and other clients.
- To solve this, make a habit of making regular copies of the Wireguard config file in another machine. Theoretically, only making a copy everytime it gets modified should be enough. - To solve this, make a habit of making regular copies of the Wireguard config file in another machine. Theoretically, only making a copy everytime it gets modified should be enough.
### 9.3 Airbyte ### 10.3 Airbyte
- Our strategy for backing up Airbyte is to backup the entire VM. - Our strategy for backing up Airbyte is to backup the entire VM.
- WIP - WIP
### 9.4 PBI Gateway ### 10.4 PBI Gateway
- The PBI Gateway is pretty much stateless. Given this, if there are any issues or disasters on the current VM, simply create another one and set up the gateway again. - The PBI Gateway is pretty much stateless. Given this, if there are any issues or disasters on the current VM, simply create another one and set up the gateway again.