content
This commit is contained in:
parent
c8ca05b1b5
commit
1e9be5c3a8
3 changed files with 76 additions and 16 deletions
27
README.md
27
README.md
|
|
@ -1,20 +1,15 @@
|
|||
# Introduction
|
||||
TODO: Give a short introduction of your project. Let this section explain the objectives or the motivation behind this project.
|
||||
# Data Infra Script
|
||||
|
||||
# Getting Started
|
||||
TODO: Guide users through getting your code up and running on their own system. In this section you can talk about:
|
||||
1. Installation process
|
||||
2. Software dependencies
|
||||
3. Latest releases
|
||||
4. API references
|
||||
This repository contains our documentation and scripts on how to deploy the Data team's infrastructure stack.
|
||||
|
||||
# Build and Test
|
||||
TODO: Describe and show how to build your code and run the tests.
|
||||
Content is structured as follows:
|
||||
|
||||
# Contribute
|
||||
TODO: Explain how other users and developers can contribute to make your code better.
|
||||
- `human-script.md` is a deployment script for you to read and follow. It guides you through all actions you should take.
|
||||
- `architecture-overview.md` is an overview of the final architecture that results of following the humna script. If you are not familiar with our architecture, it probably makes sense to read this first so you know what you are setting up.
|
||||
- `./templates` contains a set of Azure templates to deploy the different services in a subscription. The `human-script.md` will guide you in when you should run what.
|
||||
- `monitoring-and-administration.md` contains guidelines on how to keep the lights on on several of the components.
|
||||
|
||||
If you want to learn more about creating good readme files then refer the following [guidelines](https://docs.microsoft.com/en-us/azure/devops/repos/git/create-a-readme?view=azure-devops). You can also seek inspiration from the below readme files:
|
||||
- [ASP.NET Core](https://github.com/aspnet/Home)
|
||||
- [Visual Studio Code](https://github.com/Microsoft/vscode)
|
||||
- [Chakra Core](https://github.com/Microsoft/ChakraCore)
|
||||
The following contents are not covered in this repository:
|
||||
|
||||
- Application-level configuration for DWH, Airbyte, Power BI, dbt, etc.
|
||||
- Instance types, disk sizes and other elements will come with defaults. But these are values that you should adapt to the deployment according to volumen and needs, and probably will also change with time. Treat the defaults with skepticism and make sure you adapt to your needs.
|
||||
|
|
|
|||
38
architecture-overview.md
Normal file
38
architecture-overview.md
Normal file
|
|
@ -0,0 +1,38 @@
|
|||
# Architecture Overview
|
||||
|
||||
Our infrastructure is designed to run on Azure.
|
||||
|
||||
The data infra architecture provides the following services:
|
||||
|
||||
- A PostgreSQL server which acts as a DWH.
|
||||
- A self-hosted Airbyte service that acts as a data integration tool (E and L out of ELT).
|
||||
- A Power BI Data Gateway to allow the Power BI service to read from the DWH.
|
||||
- A Power BI Service environment where we build reports and apps for our users.
|
||||
- A simple scheduled dbt run for a dbt project that runs on top of the DWH.
|
||||
- A VPN Server + DNS Resolution to allow developers and power users to access the different services.
|
||||
|
||||
The infra serves Superhog in the following way:
|
||||
|
||||
- Data gets ingested from several sources into our DWH.
|
||||
- We perform data cleaning and modeling inside the DWH with dbt. This results in tables in a reporting schema that support our data needs.
|
||||
- Data team members and power users build PBI reports and other data products on top of the reporting schema.
|
||||
- Data team members and other analysts can also rely on direct access to the DWH to perform ad-hoc analysis and basically cover any data needs that go beyond PBI reports.
|
||||
|
||||
The data infra relies on the following main components:
|
||||
|
||||
- A subscription to hold everything.
|
||||
- A resource group to hold all resources.
|
||||
- A private network.
|
||||
- Three subnets.
|
||||
- A private DNS zone.
|
||||
- A managed PostgreSQL server.
|
||||
- Three VMs.
|
||||
- Repositories in Azure Devops.
|
||||
|
||||
More detailed components also get created for some of those (network security groups, disks, network interfaces, etc).
|
||||
|
||||
The following elements are external to the data infrastructure but important:
|
||||
|
||||
- Superhog's application SQL Server database + Networking settings for it to be reachable from Airbyte.
|
||||
- Superhog's service status.
|
||||
- VPN configurations in our laptops to access the data private network.
|
||||
27
human-script.md
Normal file
27
human-script.md
Normal file
|
|
@ -0,0 +1,27 @@
|
|||
# Human Script
|
||||
|
||||
Follow this to deploy the entire data infra.
|
||||
|
||||
## 0. Pre-requisites
|
||||
|
||||
- You need an Azure subscription and a user with administrator rights in it.
|
||||
|
||||
## 1. Resource group
|
||||
|
||||
## 2. Networking
|
||||
|
||||
## 3. Jumphost
|
||||
|
||||
## 4. DWH
|
||||
|
||||
## 5. Airbyte
|
||||
|
||||
## 6. Power BI
|
||||
|
||||
## 7. dbt
|
||||
|
||||
## 8. Status monitoring
|
||||
|
||||
## 9. Backups
|
||||
|
||||
- If you are working on a dev or staging environment, you might want to skip this section.
|
||||
Loading…
Add table
Add a link
Reference in a new issue