This commit is contained in:
Pablo Martin 2024-08-16 12:29:47 +02:00
parent d8dc1c917a
commit 5083cbd35b

View file

@ -35,7 +35,7 @@ Syncs are incremental in nature. `anaxi` keeps track of what's the most up to da
Different streams are independent from each other. Their runs won't affect them in anyway.
To set up a stream, `anaxi` expects to find a file called `streams.yml` in the path `~/.anaxi/streams.yml`. You can check the example file in this repo named `example-streams.yml` to understand how to build this file. Each entry in the file represents one stream. The `cosmos_database_id` field and `postgres_database` field in each stream entry should be filled in with values that you have informed in the `cosmos-db.yml` and `postgres.yml` files.
To set up a stream, `anaxi` expects to find a file called `streams.yml` in the path `~/.anaxi/streams.yml`. You can check the example file in this repo named `example-streams.yml` to understand how to build this file. Each entry in the file represents one stream. The `cosmos_database_id` field and `postgres_database` field in each stream entry should be filled in with values that you have informed in the `cosmos-db.yml` and `postgres.yml` files. The `cutoff_timestamp` field allows you to specify a timestamp (ISO 8601) that should be used as the first data to read data from if no checkpoint is available. You can leave it empty to read all records from the start of the container history.
Also, you will need to create a folder named `checkpoints` in the path `~/.anaxi/checkpoints`. The state of the checkpoints for each stream will be kept there in different files.
@ -76,8 +76,8 @@ anaxi sync-stream --stream-id <your-stream-name>
- Create a cron entry with `crontab -e` that runs the script. For example: `0 2 * * * /bin/bash /home/azureuser/run_anaxi.sh` to run syncs every day at 2AM.
- If you want to run syncs at different frequencies, you can make different copies of `run_anaxi.sh` and schedule them independently.
- Backfilling and first runs
- When running the first sync for a stream, `anaxi` will by default start reading records since the start of the source Cosmos DB container. In some cases, this is probably what you want. You don't need to take any special action.
- On the other hand, if you want to only sync from a specific point on time, you can achieve this by creating a file in the checkpoints folder. The file should be named `<name-of-your-stream>.yml` and contain a single key named `highest_synced_timestamp`, with the value being the timestamp from which you want the sync to begin at (in UTC!). For, example, if I wanted to start at the time I'm writing this, this would be the content of the file:
- When running the first sync for a stream, `anaxi` will by default start reading records since the specified `cutoff_timestamp` date.
- If the value is not provided, `anaxi` will read the container's full history from the very beginning.
```yml
highest_synced_timestamp: '2024-08-16T9:02:23+00:00'