data-xexe/README.md
2024-06-13 19:24:57 +02:00

8.4 KiB

xexe

xexe is Superhog's tool to ingest currency rates from xe.com into our DWH. xexe is a Python CLI application, and this is the repository where it lives.

How to use the tool

Note: the app has only been used so far in a Linux environment. Windows support is dubious.

Install

  • Ensure you have Python 3.10> and poetry installed.
  • Run poetry install to install dependencies.
  • Activate the project's virtual environment. You can use poetry shell.
  • Test that everything is working by running xexe smoke-test. You should see a happy pig.

Set up credentials

To use xexe, you will need to have credentials for the xe.com API. Specifically, you need an account id and it's matching api key.

To write into the DWH, you will also need to pass credentials to connect to it.

To set up your environment, you should create a .env file and place it in ~/.xexe/.env. You will have to run xexe as the right user to ensure the .env file is found. You can use the .env-example file as a reference. We also recommend running chmod 400 or chmod 600 on it for safety.

Once you have done this, you can run:

  • xexe xe-healthcheck to validate the connection to the xe.com API. If the connection to the API was successful, you will see some output telling you so.
  • xexe dwh-healthcheck to validate that the DWH is reachable. Again, you will see some happy output if things work.

DWH pre-requisites

To be able to write rates into the DWH, take these points into consideration:

  • xexe expects to find the following: a database called dwh, schema called sync_xedotcom_currency_rates. These should already exist before xexe runs. You should probably create this schema with the same user that you will use to run xexe regularly.
  • xexe should run with a user that has permission to write into dwh/sync_xedotcom_currency_rates and to create tables. It will create the right tables if it can't find them.
  • These details are hardcoded in the constants module. You might want to refactor them into run-time configuration options if you find yourself having to change them often.

General Usage

Remember to activate the project virtual environment.

You can use xexe to get rates and store them locally as a .csv file like this:

xexe get-rates --start-date "2024-01-01" --end-date "2024-01-10" --output my_rates.csv

By default, xexe runs against a mock rate generator. To get real rates from xe.com, you need to specify the --rates-source to xe like this:

xexe get-rates --rates-source xe --output my_xe_rates.csv

You can also be explicit about wanting mock rates like this:

xexe get-rates --rates-source mock --output my_mock_rates.csv

If you want to point writing to the DWH instead of a local file.

xexe get-rates --output dwh

If you don't want to write the rates anywhere, activate the --dry-run flag. You still need to specify some output.

xexe get-rates --dry-run --output my_rates.csv

You can also run without specifying dates. Not specifying end-date will get rates up to today. Not specifying start-date will get dates up to last week.

xexe get-rates --output my_rates.csv

xexe comes with a set of default currencies, but you can also specify the currencies you want to get data for by passing them like this:

# Currencies must be valid ISO 4217 codes and be comma-separated
xexe get-rates --currencies USD,EUR,GBP --output my_rates.csv

The output file for .csv outputs will follow this schema:

  • date
  • from_currency
  • to_currency
  • exchange_rate
  • exported_at

The file will contain all the combinations of the different currencies and dates passed. This includes inverse and equal rates.

This is better understood with an example. Find below a real call and its real CSV output:

xexe get-rates --start-date 2024-01-01 --end-date 2024-01-03 --rates-source xe --currencies EUR,USD,GBP --output file.csv
from_currency,to_currency,rate,rate_date,exported_at
GBP,EUR,1.15,2024-01-01,2024-06-12T16:24:38
GBP,USD,1.27,2024-01-01,2024-06-12T16:24:38
EUR,USD,1.10,2024-01-01,2024-06-12T16:24:38
GBP,EUR,1.15,2024-01-02,2024-06-12T16:24:38
GBP,USD,1.27,2024-01-02,2024-06-12T16:24:38
EUR,USD,1.10,2024-01-02,2024-06-12T16:24:38
GBP,EUR,1.15,2024-01-03,2024-06-12T16:24:38
GBP,USD,1.26,2024-01-03,2024-06-12T16:24:38
EUR,USD,1.09,2024-01-03,2024-06-12T16:24:38
EUR,GBP,0.87,2024-01-01,2024-06-12T16:24:38
USD,GBP,0.79,2024-01-01,2024-06-12T16:24:38
USD,EUR,0.91,2024-01-01,2024-06-12T16:24:38
EUR,GBP,0.87,2024-01-02,2024-06-12T16:24:38
USD,GBP,0.79,2024-01-02,2024-06-12T16:24:38
USD,EUR,0.91,2024-01-02,2024-06-12T16:24:38
EUR,GBP,0.87,2024-01-03,2024-06-12T16:24:38
USD,GBP,0.79,2024-01-03,2024-06-12T16:24:38
USD,EUR,0.92,2024-01-03,2024-06-12T16:24:38
GBP,GBP,1,2024-01-01,2024-06-12T16:24:38
USD,USD,1,2024-01-01,2024-06-12T16:24:38
EUR,EUR,1,2024-01-01,2024-06-12T16:24:38
GBP,GBP,1,2024-01-03,2024-06-12T16:24:38
USD,USD,1,2024-01-03,2024-06-12T16:24:38
EUR,EUR,1,2024-01-03,2024-06-12T16:24:38
GBP,GBP,1,2024-01-02,2024-06-12T16:24:38
USD,USD,1,2024-01-02,2024-06-12T16:24:38
EUR,EUR,1,2024-01-02,2024-06-12T16:24:38

A few more details:

  • Running get-rates with an end-date beyond the current date will ignore the future dates. The run will behave as if you had specified today as the end-date.
  • Trying to place an end-date before a start-date will cause an exception.
  • Running with the option --dry-run will run against a mock of the xe.com API. Format will be valid, but all rates will be fixed. This is for testing purposes.

Deploying for Superhog infra

This tool was made specifically to feed our DWH. These are the steps to perform to deploy it from scratch.

  • Setup
    • Prepare a Linux VM.
    • Run the steps described in this readme in sections:
    • How to Use the tool > Install
    • How to Use the tool > Set up credentials
    • How to Use the tool > DWH pre-requisites
  • Schedule
    • Up next, schedule the execution in the Linux VM to fit your needs.
    • Specifics are up to you and your circunstances.
    • A general pattern would be to create a little bash script that calls the tool with the right parameters on it. You can find an example that I like in the root of this repo named run_xexe.sh, but that's opinionated and adjusted to my needs at the time of writing this. Adapt it to your environment or start from scratch if necessary. The script is designed to be placed in~/run_xexe.sh.
    • Remember to use the --ignore-warnings flag if necessary to allow large, automated runs without manually interaction.
    • The script is designed to send both success and failure messages to slack channels upon completion. To properly set this up, you will need to place a file called slack_webhook_urls.txt on the same path you drop run_xexe.sh. The file should have two lines: SLACK_ALERT_WEBHOOK_URL=<url-of-webhook-for-failures> and SLACK_RECEIPT_WEBHOOK_URL=<url-of-webhook-for-successful-runs>. Setting up the slack channels and webhooks is outside of the scope of this readme.
    • Create a cron entry with crontab -e that runs the script. For example: 0 2 * * * /bin/bash /home/azureuser/run_xexe.sh to run xexe models every day at 2AM.
  • Backfilling
    • If you are loading rates for the first time, you might need to backfill long periods of time manually at first.
    • The tool is flexible enough. You can probably figure out the right commands by taking a look at How to Use the tool > General Usage
    • Be careful since you might hit consumption limits set by xe.com.

Testing

This CLI app has three groups of automatic tests:

  • tests_cli: simulate calling the CLI to check proper calls, not much attention paid to actual results.
  • tests_unit: unit tests for some domain-heavy parts of the codebase.
  • tests_integration: full executions that assert the end result of the calls to be as expected.

You can run everything with pytests tests, or narrow it down more if you want to.

There is a special test in tests_integration that runs against the real xe.com API. This tests is commented out to avoid repeteadly consuming API hits. You can use by uncommenting it manually. I know it's annoying, but then again, it shouldn't to be very annoying since you should only use that test sparingly. Also no pressure, but if you leave that uncommented, you might end up creating a massive problem and breaking production.

Also, FYI, there are currently no automated tests for writing against a Postgres database.