tiny changes
This commit is contained in:
parent
4c55aef9e1
commit
72a99cf315
1 changed files with 16 additions and 10 deletions
|
|
@ -23,7 +23,7 @@ Follow this to deploy the entire data infra.
|
|||
|
||||
- We will create an SSH Keypair for this deployment. It will be used to access VMs, Git repos and other services.
|
||||
- Create the SSH Key pair
|
||||
- Name the key: `superhog-data-<your-env>-general-ssh`
|
||||
- Name the key: `superhog-data-general-ssh-<your-env>`
|
||||
- Add tags:
|
||||
- `team: data`
|
||||
- `environment: <your-env>`
|
||||
|
|
@ -106,7 +106,7 @@ Follow this to deploy the entire data infra.
|
|||
- Protocol: TCP
|
||||
- Action: Allow
|
||||
- Priority: 100
|
||||
- SSH Rule
|
||||
- RDP Rule
|
||||
- Name: AllowRDPFromJumphostInbound
|
||||
- Source: the addresss range for the `jumphost-subnet`. In this example, `10.69.0.0/29`.
|
||||
- Source port ranges: *
|
||||
|
|
@ -116,7 +116,7 @@ Follow this to deploy the entire data infra.
|
|||
- Action: Allow
|
||||
- Priority: 110
|
||||
- Airbyte web rule
|
||||
- Name: AllowAirbyteWebFromJumphostInbound
|
||||
- Name: AllowWebFromJumphostInbound
|
||||
- Source: the addresss range for the `jumphost-subnet`. In this example, `10.69.0.0/29`.
|
||||
- Source port ranges: *
|
||||
- Destination: the addresss range for the `services-subnet`. In this example, `10.69.0.64/26`.
|
||||
|
|
@ -208,7 +208,7 @@ Follow this to deploy the entire data infra.
|
|||
- Use Ubuntu Server 22.04
|
||||
- Use Size: `Standard_B1s`
|
||||
- Use username: `azureuser`
|
||||
- Use the SSH Key: `superhog-data-<your-env>-general-ssh`
|
||||
- Use the SSH Key: `superhog-data-general-ssh-<your-env>`
|
||||
- Select the option `None` for Public inbound ports.
|
||||
- Disk settings
|
||||
- Defaults are fine. This barely needs any disk.
|
||||
|
|
@ -245,8 +245,12 @@ Follow this to deploy the entire data infra.
|
|||
- Run the following script (requires `sudo`) to install wireguard and configure it
|
||||
- Pay attention: you need to fill in the public IP manually, as well as the network mask of the virtual network
|
||||
- *Note: the IPs chosen for the VPN can absolutely be changed. Just make sure they are consistent across the server and client configurations of the VPN.*
|
||||
- *Note: you need to input the public IP and the VNET network mask manually at the top of the script.*
|
||||
|
||||
```bash
|
||||
JUMPHOST_PUBLIC_IP=<write-the-public-ip-here>
|
||||
NETWORK_MASK_FOR_VNET=<write-the-network-mask-here>
|
||||
|
||||
echo "Installing Wireguard."
|
||||
apt update
|
||||
apt install wireguard -y
|
||||
|
|
@ -295,8 +299,8 @@ Follow this to deploy the entire data infra.
|
|||
|
||||
[Peer]
|
||||
PublicKey = ${SERVER_PUBLIC_KEY}
|
||||
AllowedIPs = 192.168.69.1/32,<network-mask-for-vnet>
|
||||
Endpoint = <fill-public-ip-here>:52420
|
||||
AllowedIPs = 192.168.69.1/32,${NETWORK_MASK_FOR_VNET}
|
||||
Endpoint = ${JUMPHOST_PUBLIC_IP}:52420
|
||||
##############################
|
||||
|
||||
EOF
|
||||
|
|
@ -427,9 +431,9 @@ Follow this to deploy the entire data infra.
|
|||
- An airbyte user, with permission to create new schemas.
|
||||
- A Power BI user, with `consumer` role.
|
||||
- *Note: replace the password fields with serious passwords and note them down.*
|
||||
- *Note: replace the name of the admin user*
|
||||
|
||||
```sql
|
||||
GRANT pg_read_all_data TO dwh_admin_infratest;
|
||||
|
||||
CREATE ROLE airbyte_user LOGIN PASSWORD 'password' VALID UNTIL 'infinity';
|
||||
GRANT CREATE ON DATABASE dwh TO airbyte_user;
|
||||
|
|
@ -441,6 +445,8 @@ Follow this to deploy the entire data infra.
|
|||
GRANT ALL ON ALL TABLES IN SCHEMA staging TO modeler;
|
||||
GRANT ALL ON ALL TABLES IN SCHEMA intermediate TO modeler;
|
||||
GRANT ALL ON ALL TABLES IN SCHEMA reporting TO modeler;
|
||||
|
||||
GRANT modeler TO dwh_admin_<your-env>;
|
||||
ALTER SCHEMA staging OWNER TO modeler;
|
||||
ALTER SCHEMA intermediate OWNER TO modeler;
|
||||
ALTER SCHEMA reporting OWNER TO modeler;
|
||||
|
|
@ -467,9 +473,9 @@ Follow this to deploy the entire data infra.
|
|||
- Basic settings
|
||||
- Name it: `airbyte-<your-env>`
|
||||
- Use Ubuntu Server 22.04
|
||||
- Use Size: `Standard_DS1_v2`
|
||||
- I suggest size for testing `Standard_DS1_v2`. For production, get something beefier.
|
||||
- Use username: `azureuser`
|
||||
- Use the SSH Key: `superhog-data-<your-env>-general-ssh`
|
||||
- Use the SSH Key: `superhog-data-general-ssh-<your-env>`
|
||||
- Select the option `None` for Public inbound ports.
|
||||
- Disk settings
|
||||
- Increasing the data disk to at least 64gb as a starting point is recommended. Airbyte can be a bit of a disk hog, and running low on space might lead to obscure errors happening. Start with 64gb and monitor as you increase usage.
|
||||
|
|
@ -608,7 +614,7 @@ Follow this to deploy the entire data infra.
|
|||
|
||||
## 7. dbt
|
||||
|
||||
- Our dbt project (https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project) can be deployed on any linux VM within the virtual network. The instructions on how to deploy and schedule it are in the project repository.
|
||||
- Our dbt project (<https://guardhog.visualstudio.com/Data/_git/data-dwh-dbt-project>) can be deployed on any linux VM within the virtual network. The instructions on how to deploy and schedule it are in the project repository.
|
||||
- You can opt to deploy it in the same machine where airbyte is stored, since that machine is probably fairly underutilized.
|
||||
|
||||
## 8. Monitoring
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue