Augur Migration
This guide covers migrating from an existing Augur installation to Aveloxis. Aveloxis is designed to coexist with Augur in the same database, so you can run both systems side-by-side.
Overview
Migrating from Augur involves four steps:
Point Aveloxis at your existing Augur database
Run
aveloxis migrateto create the Aveloxis schemasImport your API keys
Import your repos
No data in Augur’s schemas is modified or deleted. Aveloxis creates its own schemas (aveloxis_data and aveloxis_ops) and operates independently.
Step 1: Point at the existing Augur database
Create aveloxis.json with your Augur database connection:
{
"database": {
"host": "localhost",
"port": 5432,
"user": "augur",
"password": "your-augur-db-password",
"dbname": "augur",
"sslmode": "prefer"
}
}
Use the same host, port, user, password, and dbname that Augur uses. Aveloxis creates its own schemas and does not touch augur_data or augur_operations.
Step 2: Create the Aveloxis schemas
aveloxis migrate
This creates:
aveloxis_data– 84 tables + 19 materialized views for collected dataaveloxis_ops– 24 tables for operational state (queue, staging, credentials, etc.)
The migration uses CREATE ... IF NOT EXISTS throughout, so it is safe to run repeatedly. It never modifies or reads from augur_data or augur_operations schemas.
Step 3: Import API keys
aveloxis add-key --from-augur
This copies all API tokens from augur_operations.worker_oauth into aveloxis_ops.worker_oauth. Duplicate keys (by token value) are skipped via ON CONFLICT DO NOTHING.
After import, keys are stored in the Aveloxis table and loaded automatically on every run. You do not need the --augur-keys flag going forward.
Tip
If you want to temporarily use Augur’s keys without copying them, pass the --augur-keys flag to serve or collect instead. This reads directly from augur_operations.worker_oauth at startup.
Step 4: Import repos
aveloxis add-repo --from-augur
This reads every repository URL from augur_data.repo and adds it to the Aveloxis collection queue. Each URL is verified via an HTTP HEAD request against the forge before being added:
200 OK – repo is added to the queue
301/302 redirect – the canonical URL is used instead
404/410 – repo is skipped (dead, private, or DMCA’d)
This verification ensures you do not import stale or dead repos that would waste API calls.
Note
For large Augur installations (tens of thousands of repos), the import can take several minutes due to URL verification. Progress is logged at INFO level.
Schema coexistence
Aveloxis and Augur use completely separate schemas in the same PostgreSQL database:
Schema |
Owner |
Purpose |
|---|---|---|
|
Augur |
Augur’s collected data |
|
Augur |
Augur’s operational tables |
|
Aveloxis |
Aveloxis collected data (84 tables + 19 matviews) |
|
Aveloxis |
Aveloxis operational tables (24 tables) |
Key points:
Aveloxis never reads from or writes to
augur_dataoraugur_operations(except during explicit--from-augurimports).Augur never reads from or writes to
aveloxis_dataoraveloxis_ops.Both systems can collect from the same repos simultaneously without interference.
Schema names are hardcoded, so there is no risk of accidental cross-contamination.
Contributor ID compatibility
Aveloxis generates deterministic cntrb_id UUIDs using the same scheme as Augur. The UUID encodes:
Byte 0: platform ID (
1for GitHub,2for GitLab)Bytes 1-4:
gh_user_id(big-endian)
This means:
The same GitHub user always gets the same
cntrb_idin both Augur and AveloxisUUIDs are byte-compatible between the two systems
Analytics queries that join on
cntrb_idwork across both schemasIf you later consolidate data, contributor identities match
This deterministic ID scheme is called GithubUUID internally.
Running both systems side-by-side
You can run Augur and Aveloxis simultaneously against the same database. Common scenarios:
Gradual migration
Keep Augur running for repos already in its queue
Add new repos only to Aveloxis
Compare data quality between the two systems
Once satisfied, stop Augur and let Aveloxis handle everything
Parallel collection for validation
Have both systems collect the same repos
Compare issue counts, PR counts, commit counts across schemas
Verify contributor resolution quality
Resource considerations
When running both systems:
Database connections: Both systems maintain connection pools. Ensure your PostgreSQL
max_connectionsis high enough (Aveloxis uses up to 20 connections).API rate limits: Both systems consume API rate limits from their respective key pools. Do not share the same API tokens between both systems, or you will see rate limit errors.
Disk space: Both systems maintain their own bare clones. The
repo_clone_dirsettings should point to different directories.CPU/memory: Aveloxis is written in Go and typically uses less memory than Augur’s Python workers.
Differences from Augur
After migration, you will notice several improvements:
Area |
Augur |
Aveloxis |
|---|---|---|
Dead repos |
Retried every cycle |
Permanently sidelined (data preserved) |
Repo renames |
Not detected |
Detected and URLs auto-updated |
Duplicate repos |
Not detected |
Detected via redirect resolution |
Monitoring |
Flower (separate service) |
Built-in dashboard at |
Queue management |
Opaque Celery state |
SQL-queryable priority queue |
Priority override |
Not supported |
|
Cleanup (optional)
Once you are confident in Aveloxis, you can optionally remove Augur’s schemas:
Warning
This permanently deletes all Augur data. Only do this after verifying that Aveloxis has collected everything you need.
-- DESTRUCTIVE: Only run after verifying Aveloxis data
DROP SCHEMA augur_data CASCADE;
DROP SCHEMA augur_operations CASCADE;
DROP SCHEMA spdx CASCADE; -- if present
Next steps
Quick Start – verify collection is working
Monitoring – use the dashboard to track progress
Scaling – configure workers and keys for large instances