Augur Migration

This guide covers migrating from an existing Augur installation to Aveloxis. Aveloxis is designed to coexist with Augur in the same database, so you can run both systems side-by-side.


Overview

Migrating from Augur involves four steps:

  1. Point Aveloxis at your existing Augur database

  2. Run aveloxis migrate to create the Aveloxis schemas

  3. Import your API keys

  4. Import your repos

No data in Augur’s schemas is modified or deleted. Aveloxis creates its own schemas (aveloxis_data and aveloxis_ops) and operates independently.


Step 1: Point at the existing Augur database

Create aveloxis.json with your Augur database connection:

{
  "database": {
    "host": "localhost",
    "port": 5432,
    "user": "augur",
    "password": "your-augur-db-password",
    "dbname": "augur",
    "sslmode": "prefer"
  }
}

Use the same host, port, user, password, and dbname that Augur uses. Aveloxis creates its own schemas and does not touch augur_data or augur_operations.


Step 2: Create the Aveloxis schemas

aveloxis migrate

This creates:

  • aveloxis_data – 84 tables + 19 materialized views for collected data

  • aveloxis_ops – 24 tables for operational state (queue, staging, credentials, etc.)

The migration uses CREATE ... IF NOT EXISTS throughout, so it is safe to run repeatedly. It never modifies or reads from augur_data or augur_operations schemas.


Step 3: Import API keys

aveloxis add-key --from-augur

This copies all API tokens from augur_operations.worker_oauth into aveloxis_ops.worker_oauth. Duplicate keys (by token value) are skipped via ON CONFLICT DO NOTHING.

After import, keys are stored in the Aveloxis table and loaded automatically on every run. You do not need the --augur-keys flag going forward.

Tip

If you want to temporarily use Augur’s keys without copying them, pass the --augur-keys flag to serve or collect instead. This reads directly from augur_operations.worker_oauth at startup.


Step 4: Import repos

aveloxis add-repo --from-augur

This reads every repository URL from augur_data.repo and adds it to the Aveloxis collection queue. Each URL is verified via an HTTP HEAD request against the forge before being added:

  • 200 OK – repo is added to the queue

  • 301/302 redirect – the canonical URL is used instead

  • 404/410 – repo is skipped (dead, private, or DMCA’d)

This verification ensures you do not import stale or dead repos that would waste API calls.

Note

For large Augur installations (tens of thousands of repos), the import can take several minutes due to URL verification. Progress is logged at INFO level.


Schema coexistence

Aveloxis and Augur use completely separate schemas in the same PostgreSQL database:

Schema

Owner

Purpose

augur_data

Augur

Augur’s collected data

augur_operations

Augur

Augur’s operational tables

aveloxis_data

Aveloxis

Aveloxis collected data (84 tables + 19 matviews)

aveloxis_ops

Aveloxis

Aveloxis operational tables (24 tables)

Key points:

  • Aveloxis never reads from or writes to augur_data or augur_operations (except during explicit --from-augur imports).

  • Augur never reads from or writes to aveloxis_data or aveloxis_ops.

  • Both systems can collect from the same repos simultaneously without interference.

  • Schema names are hardcoded, so there is no risk of accidental cross-contamination.


Contributor ID compatibility

Aveloxis generates deterministic cntrb_id UUIDs using the same scheme as Augur. The UUID encodes:

  • Byte 0: platform ID (1 for GitHub, 2 for GitLab)

  • Bytes 1-4: gh_user_id (big-endian)

This means:

  • The same GitHub user always gets the same cntrb_id in both Augur and Aveloxis

  • UUIDs are byte-compatible between the two systems

  • Analytics queries that join on cntrb_id work across both schemas

  • If you later consolidate data, contributor identities match

This deterministic ID scheme is called GithubUUID internally.


Running both systems side-by-side

You can run Augur and Aveloxis simultaneously against the same database. Common scenarios:

Gradual migration

  1. Keep Augur running for repos already in its queue

  2. Add new repos only to Aveloxis

  3. Compare data quality between the two systems

  4. Once satisfied, stop Augur and let Aveloxis handle everything

Parallel collection for validation

  1. Have both systems collect the same repos

  2. Compare issue counts, PR counts, commit counts across schemas

  3. Verify contributor resolution quality

Resource considerations

When running both systems:

  • Database connections: Both systems maintain connection pools. Ensure your PostgreSQL max_connections is high enough (Aveloxis uses up to 20 connections).

  • API rate limits: Both systems consume API rate limits from their respective key pools. Do not share the same API tokens between both systems, or you will see rate limit errors.

  • Disk space: Both systems maintain their own bare clones. The repo_clone_dir settings should point to different directories.

  • CPU/memory: Aveloxis is written in Go and typically uses less memory than Augur’s Python workers.


Differences from Augur

After migration, you will notice several improvements:

Area

Augur

Aveloxis

Dead repos

Retried every cycle

Permanently sidelined (data preserved)

Repo renames

Not detected

Detected and URLs auto-updated

Duplicate repos

Not detected

Detected via redirect resolution

Monitoring

Flower (separate service)

Built-in dashboard at /

Queue management

Opaque Celery state

SQL-queryable priority queue

Priority override

Not supported

aveloxis prioritize or dashboard Boost


Cleanup (optional)

Once you are confident in Aveloxis, you can optionally remove Augur’s schemas:

Warning

This permanently deletes all Augur data. Only do this after verifying that Aveloxis has collected everything you need.

-- DESTRUCTIVE: Only run after verifying Aveloxis data
DROP SCHEMA augur_data CASCADE;
DROP SCHEMA augur_operations CASCADE;
DROP SCHEMA spdx CASCADE;           -- if present

Next steps

  • Quick Start – verify collection is working

  • Monitoring – use the dashboard to track progress

  • Scaling – configure workers and keys for large instances