data-xexe/README.md

169 lines
7.2 KiB
Markdown
Raw Normal View History

2024-06-03 17:49:31 +02:00
# xexe
`xexe` is Superhog's tool to ingest currency rates from xe.com into our DWH. `xexe` is a Python CLI application, and this is the repository where it lives.
2024-06-03 18:41:34 +02:00
2024-06-06 11:41:05 +02:00
## How to use the tool
*Note: the app has only been used so far in a Linux environment. Windows support is dubious.*
### Install
- Ensure you have Python 3.10> and `poetry` installed.
- Run `poetry install` to install dependencies.
2024-06-06 17:16:12 +02:00
- Activate the project's virtual environment. You can use `poetry shell`.
- Test that everything is working by running `xexe smoke-test`. You should see a happy pig.
2024-06-06 11:41:05 +02:00
### Set up credentials
To use `xexe`, you will need to have credentials for the `xe.com` API. Specifically, you need an account id and it's matching api key.
2024-06-13 18:00:48 +02:00
To write into the DWH, you will also need to pass credentials to connect to it.
2024-06-12 17:36:33 +02:00
2024-06-06 17:16:12 +02:00
To set up your environment, you should create a `.env` file and place it in `~/.xexe/.env`. You will have to run `xexe` as the right user to ensure the `.env` file is found. You can use the `.env-example` file as a reference. We also recommend running `chmod 400` or `chmod 600` on it for safety.
2024-06-06 11:41:05 +02:00
2024-06-12 17:36:33 +02:00
Once you have done this, you can run:
- `xexe xe-healthcheck` to validate the connection to the xe.com API. If the connection to the API was successful, you will see some output telling you so.
- `xexe dwh-healthcheck` to validate that the DWH is reachable. Again, you will see some happy output if things work.
### DWH pre-requisites
To be able to write rates into the DWH, take these points into consideration:
- `xexe` expects to find the following: a database called `dwh`, schema called `sync_xedotcom_currencies`. These should already exist before `xexe` runs.
- `xexe` should run with a user that has permission to write into `dwh/sync_xedotcom_currencies` and to create tables. It will create the right tables if it can't find them.
- These details are hardcoded in the `constants` module. You might want to refactor them into run-time configuration options if you find yourself having to change them often.
2024-06-06 16:52:29 +02:00
2024-06-13 18:00:48 +02:00
### General Usage
2024-06-06 16:52:29 +02:00
2024-06-06 17:16:12 +02:00
Remember to activate the project virtual environment.
2024-06-12 18:07:27 +02:00
You can use `xexe` to get rates and store them locally as a `.csv` file like this:
2024-06-06 16:52:29 +02:00
```bash
2024-06-06 17:16:12 +02:00
xexe get-rates --start-date "2024-01-01" --end-date "2024-01-10" --output my_rates.csv
2024-06-06 16:52:29 +02:00
```
2024-06-12 18:07:27 +02:00
By default, `xexe` runs against a mock rate generator. To get real rates from xe.com, you need to specify the `--rates-source` to `xe` like this:
```bash
xexe get-rates --rates-source xe --output my_xe_rates.csv
```
You can also be explicit about wanting mock rates like this:
```bash
xexe get-rates --rates-source mock --output my_mock_rates.csv
```
2024-06-12 17:36:33 +02:00
If you want to point writing to the DWH instead of a local file.
```bash
xexe get-rates --output dwh
```
2024-06-12 18:07:27 +02:00
If you don't want to write the rates anywhere, activate the `--dry-run` flag. You still need to specify some output.
```bash
xexe get-rates --dry-run --output my_rates.csv
```
2024-06-06 16:52:29 +02:00
You can also run without specifying dates. Not specifying `end-date` will get rates up to today. Not specifying `start-date` will get dates up to last week.
```bash
2024-06-06 17:16:12 +02:00
xexe get-rates --output my_rates.csv
2024-06-06 16:52:29 +02:00
```
`xexe` comes with a set of default currencies, but you can also specify the currencies you want to get data for by passing them like this:
```bash
# Currencies must be valid ISO 4217 codes and be comma-separated
2024-06-06 17:16:12 +02:00
xexe get-rates --currencies USD,EUR,GBP --output my_rates.csv
2024-06-06 16:52:29 +02:00
```
2024-06-12 18:07:27 +02:00
The output file for `.csv` outputs will follow this schema:
2024-06-06 16:52:29 +02:00
- `date`
- `from_currency`
- `to_currency`
- `exchange_rate`
- `exported_at`
2024-06-12 16:29:42 +02:00
The file will contain all the combinations of the different currencies and dates passed. This includes inverse and equal rates.
This is better understood with an example. Find below a real call and its real CSV output:
```bash
2024-06-12 18:07:27 +02:00
xexe get-rates --start-date 2024-01-01 --end-date 2024-01-03 --rates-source xe --currencies EUR,USD,GBP --output file.csv
2024-06-12 16:29:42 +02:00
```
```csv
from_currency,to_currency,rate,rate_date,exported_at
GBP,EUR,1.15,2024-01-01,2024-06-12T16:24:38
GBP,USD,1.27,2024-01-01,2024-06-12T16:24:38
EUR,USD,1.10,2024-01-01,2024-06-12T16:24:38
GBP,EUR,1.15,2024-01-02,2024-06-12T16:24:38
GBP,USD,1.27,2024-01-02,2024-06-12T16:24:38
EUR,USD,1.10,2024-01-02,2024-06-12T16:24:38
GBP,EUR,1.15,2024-01-03,2024-06-12T16:24:38
GBP,USD,1.26,2024-01-03,2024-06-12T16:24:38
EUR,USD,1.09,2024-01-03,2024-06-12T16:24:38
EUR,GBP,0.87,2024-01-01,2024-06-12T16:24:38
USD,GBP,0.79,2024-01-01,2024-06-12T16:24:38
USD,EUR,0.91,2024-01-01,2024-06-12T16:24:38
EUR,GBP,0.87,2024-01-02,2024-06-12T16:24:38
USD,GBP,0.79,2024-01-02,2024-06-12T16:24:38
USD,EUR,0.91,2024-01-02,2024-06-12T16:24:38
EUR,GBP,0.87,2024-01-03,2024-06-12T16:24:38
USD,GBP,0.79,2024-01-03,2024-06-12T16:24:38
USD,EUR,0.92,2024-01-03,2024-06-12T16:24:38
GBP,GBP,1,2024-01-01,2024-06-12T16:24:38
USD,USD,1,2024-01-01,2024-06-12T16:24:38
EUR,EUR,1,2024-01-01,2024-06-12T16:24:38
GBP,GBP,1,2024-01-03,2024-06-12T16:24:38
USD,USD,1,2024-01-03,2024-06-12T16:24:38
EUR,EUR,1,2024-01-03,2024-06-12T16:24:38
GBP,GBP,1,2024-01-02,2024-06-12T16:24:38
USD,USD,1,2024-01-02,2024-06-12T16:24:38
EUR,EUR,1,2024-01-02,2024-06-12T16:24:38
```
2024-06-06 16:52:29 +02:00
A few more details:
- Running `get-rates` with an `end-date` beyond the current date will ignore the future dates. The run will behave as if you had specified today as the `end-date`.
2024-06-11 23:04:14 +02:00
- Trying to place an `end-date` before a `start-date` will cause an exception.
2024-06-12 16:29:42 +02:00
- Running with the option `--dry-run` will run against a mock of the xe.com API. Format will be valid, but all rates will be fixed. This is for testing purposes.
2024-06-11 23:04:14 +02:00
2024-06-13 18:00:48 +02:00
### Deploying for Superhog infra
This tool was made specifically to feed our DWH. These are the steps to perform to deploy it from scratch.
- Setup
- Prepare a Linux VM.
- Run the steps described in this readme in sections:
- `How to Use the tool > Install`
- `How to Use the tool > Set up credentials`
- `How to Use the tool > DWH pre-requisites`
- Schedule
- Finally, schedule the execution in the Linux VM to fit your needs.
- Specifics are up to you and your circunstances.
- A general pattern would be to create a little bash script that calls the tool with the right parameters on it.
- Remember to use the `--ignore-warnings` flag if necessary to allow large, automated runs without manually interaction.
- Backfilling
- If you are loading rates for the first time, you might need to backfill long periods of time manually at first.
- The tool is flexible enough. You can probably figure out the right commands by taking a look at `How to Use the tool > General Usage`
- Be careful since you might hit consumption limits set by xe.com.
2024-06-11 23:04:14 +02:00
## Testing
2024-06-12 16:29:42 +02:00
This CLI app has three groups of tests:
- `tests_cli`: simulate calling the CLI to check proper calls, not much attention paid to actual results.
- `tests_unit`: unit tests for some domain-heavy parts of the codebase.
- `tests_integration`: full executions that assert the end result of the calls to be as expected.
You can run everything with `pytests tests`, or narrow it down more if you want to.
There is a special test in `tests_integration` that runs against the real xe.com API. This tests is commented out to avoid repeteadly consuming API hits. You can use by uncommenting it manually. I know it's annoying, but then again, it shouldn't to be very annoying since you should only use that test sparingly.