Schema-change verification with aveloxis data-test

aveloxis data-test is an operator-driven verification harness that catches data-loss regressions before they reach production. It compares a tagged release binary against the local working-tree binary by collecting the same repo into two scratch PostgreSQL databases and diffing row counts table-by-table.

Shipped in v0.22.8. Designed primarily for contributors and maintainers who are about to ship a release that touches schema.sql or migrate.go.

When to use it

  • Contributors — before submitting a PR that adds a column, changes a foreign-key declaration, alters a constraint, or modifies any DDL in internal/db/schema.sql or internal/db/migrate.go.

  • Maintainers — before merging a schema-touching PR, as a release gate. The harness exit code is 0 on PASS / FLAG-only and 1 on any FAIL (row loss), so it can drive CI.

  • Operators — before deploying a release with schema changes to a production fleet. The CLAUDE.md changelog for each release notes whether schema changed; if it did, run the harness against the prior released tag first.

You do not need to run it for changes that don’t touch the schema (API-layer changes, collector logic refactors, web GUI changes, etc.).

What it actually verifies

For each table in aveloxis_data and aveloxis_ops, the harness classifies the result:

  • PASS — equal row counts. The schema change is neutral for this table.

  • FLAG — the new schema captures more rows than the released schema. Usually means the change added new collection coverage (e.g., a previously-missed column now populates a child table). Manual review recommended but generally fine.

  • FAIL — the released schema captured rows that the new schema rejects. This is the regression signal. Common causes:

    • A new FK constraint rejects INSERTs that previously succeeded because no FK was enforced.

    • A new NOT NULL column on a table the collection path doesn’t populate.

    • A reordered or renamed column the collector still writes by the old name.

The harness does NOT do byte-for-byte content comparison. For deep semantic diff (per-row, per-column) on PR/issue tables, see the older aveloxis shadow-diff subcommand. Operators may run both: data-test first as a coarse filter, shadow-diff for deep inspection of any FLAGGED table.

Prerequisites

  • A local clone of the aveloxis repository with the tag you want to test against. If the tag is missing locally: git fetch --tags.

  • PostgreSQL access via the operator’s existing aveloxis.json credentials. The user must have CREATEDB privilege on the database server — required because the harness creates and drops scratch databases. Standard for development-mode aveloxis deployments.

  • API keys already loaded into the operator’s primary aveloxis_ops.api_keys table. The harness copies keys into both scratch DBs automatically — you don’t re-paste tokens.

  • ~1.5 hours of wall-clock time for a moderate test repo. Most of this is sequential full-history collection passes (two of them, ~30–45 min each on augurlabs/augur with a 73-token GitHub pool). The harness always passes --full to aveloxis collect so both scratch DBs receive complete data with all parent records present. This is essential — see “Why full collection” below.

  • ~5 GB of free disk space for the two scratch databases.

Running the harness

aveloxis data-test \
  --released-tag 0.22.6 \
  --repo https://github.com/augurlabs/augur

Required flags:

  • --released-tag TAG — a git tag in the local clone (e.g., 0.22.6). The harness materializes the tag’s source via git worktree add and builds it.

  • --repo URL — the test repo to collect. augurlabs/augur is the canonical choice: enough issues, PRs, and commits to exercise every collection path without being so large that the cycle takes hours.

Optional flags:

  • --keep-dbs — retain aveloxis_released and aveloxis_new after the run. Default is to drop them. Pass this when you want to inspect a FAILing table directly via psql after the report is written.

  • --work-dir PATH — where to put binaries, logs, and the report. Default is a fresh /tmp/aveloxis-data-test-<UTC-timestamp>.

Interpreting the report

The report at <work-dir>/report.md opens with a summary:

# aveloxis data-test report

- Released tag: 0.22.6 (in aveloxis_released)
- Local version: working tree (in aveloxis_new)
- Test repo: https://github.com/augurlabs/augur
- Schemas: aveloxis_data, aveloxis_ops
- Generated: 2026-05-17T14:30:00Z

## Summary

- FAIL (released has more rows = data loss): 0
- FLAG (new has more rows = likely new coverage): 2
- PASS (equal counts): 71

Followed by three tables — FAIL, FLAG, PASS — each showing per-table row counts and the delta.

When all rows are PASS

Ship the change. No data loss possible from the schema delta.

When some rows are FLAG

Review each flagged table individually. A FLAG means the new schema captures rows the released schema missed. Two scenarios:

  1. Genuinely new coverage — e.g., a new column was added and the collector now populates it, so a previously-empty child table now has rows. This is a feature, not a bug. Ship.

  2. Insertion-order or constraint timing change — e.g., a constraint was relaxed and rows that previously failed FK checks now pass. Investigate whether the previously-failed rows were actually correct. Usually yes.

If a FLAG is unexpected, run with --keep-dbs, then:

psql -d aveloxis_new -c "
  SELECT * FROM aveloxis_data.<flagged_table>
  WHERE <pk> NOT IN (SELECT <pk> FROM aveloxis_released.aveloxis_data.<flagged_table>);
"

(That cross-database query requires dblink or a foreign-data wrapper — most operators just \c aveloxis_released and run separate queries.)

When any rows are FAIL

Do not ship. Investigate the root cause:

  1. Open the failing table in both DBs via psql (use --keep-dbs if not already).

  2. Find a row present in aveloxis_released but not aveloxis_new.

  3. Read the collector code that writes that table and the migration delta in the new version.

  4. The mismatch is usually one of:

    • New NOT NULL column with no DEFAULT, and the collector writes NULLs for that column. Fix: add a DEFAULT in schema.sql or update the collector to write a value.

    • New CHECK constraint that the existing data violates. Fix: loosen the constraint or backfill before constraining.

    • New FK that rejects existing references because the parent row doesn’t exist at INSERT time. Fix: make the FK DEFERRABLE INITIALLY DEFERRED (the v0.22.7 pattern).

    • Renamed column that the collector still writes by the old name. Fix: update the collector.

Once you’ve fixed the regression, re-run data-test to confirm. The harness drops and recreates the scratch DBs idempotently, so reruns are safe.

Phase-by-phase walkthrough

Useful to know if a phase fails partway through.

  1. Resolve binaries. git worktree add --detach <work-dir>/released-src <tag> (fast — reuses local clone’s git objects, no remote fetch), then go build in the worktree. Builds the local working-tree binary the same way. Failures here usually mean the tag is missing locally — fix with git fetch --tags.

  2. Provision scratch DBs. Connects to the configured PostgreSQL host’s postgres system database, then DROPs+CREATEs aveloxis_released and aveloxis_new. Failures here usually mean missing CREATEDB privilege — fix at the database server.

  3. Generate scratch configs. JSON-edit the operator’s aveloxis.json to swap database.dbname, write to work-dir. This phase doesn’t typically fail.

  4. Per-side collection (sequential — both sides share the API key pool):

    • migrate --skip-views to bring the scratch DB to the binary’s schema.

    • copyAPIKeys direct SQL from the operator’s primary DB into the scratch DB.

    • add-repo to queue the test repo.

    • collect to run a one-shot collection. This is the ~30 min phase. Subprocess output streams live to the parent so progress is visible.

  5. Row-count diff. Connects to both scratch DBs and runs db.RowCountDiff over aveloxis_data + aveloxis_ops. Fast (a few seconds).

  6. Write report + cleanup. Markdown report at <work-dir>/report.md. Unless --keep-dbs, both scratch DBs are dropped. Cleanup failure is non-fatal — the report is already written.

Why full collection (--full)

The harness always invokes aveloxis collect URL --full, forcing since=zero (every parent issue/PR is fetched, not just recently- modified ones). Without --full, the collector uses the default incremental since-filter (typically 21 days), which produces a specific noise pattern that hides real regressions:

  • The events API returns events for issues/PRs that may have been last modified outside the since window.

  • The collector tries to INSERT those events; the FK constraint rejects them because the parent issue/PR isn’t in the local DB.

  • Hundreds of issue_events_issue_id_fkey and pull_request_events_pull_request_id_fkey violations get logged as WARN.

  • The events are silently dropped on BOTH sides.

  • The diff sees issue_events: released=0, new=0 — a false PASS.

By forcing --full, every parent is present before its children are fetched. FK constraints are exercised against fully-populated parent tables. Any real regression introduced by a schema change will surface, instead of being masked by the partial-collection noise. Empirically confirmed in v0.22.10 after a 2026-05-17 diagnostic showed this exact pattern obscuring v0.22.7’s FK behavior change.

The trade-off is wall-clock time: a full collection takes ~30–45 minutes on augurlabs/augur vs ~20–25 minutes incremental. The signal quality is worth it; the whole point of the harness is to detect FK / data-loss regressions, and partial collection fundamentally cannot detect them.

Design decisions worth knowing

  • Sequential collection, not parallel. Both sides share the operator’s API key pool. Parallel runs would burn 2× the rate budget and produce non-reproducible timings. ~1.5 hours total (with –full) is the cost of reliable results.

  • git worktree, not git clone. Reuses the local clone’s git objects. Works offline. Faster.

  • Self-discovering tables, not a per-table fixture. Adding a new table to schema.sql in a future release is automatically covered by the diff. No fixture to maintain.

  • Drop scratch DBs by default. Avoids multi-GB scratch DBs accumulating across runs. --keep-dbs is the opt-in.

  • Row counts, not row content. Catches data loss; doesn’t catch per-column drift. For column-level diff use aveloxis shadow-diff (which covers ~17 PR/issue tables with deep semantic comparison).

  • CREATEDB privilege required. The harness creates and drops scratch DBs. Standard for development-mode deployments. Production fleets running the harness need the same privilege on whichever database server they target.

CI integration

- name: Validate schema change
  run: |
    aveloxis data-test \
      --released-tag ${{ env.LAST_RELEASED_TAG }} \
      --repo https://github.com/augurlabs/augur
  # Exit code 1 on any FAIL → fails the build
  # Exit code 0 on PASS or FLAG-only → proceeds

For PR-level gating, the maintainer can require this job pass before merging any PR labeled schema-change.

Limitations

  • Single test repo per invocation. Multi-repo tests are operator-scripted shell loops for now.

  • No auto-fetch of missing tags. If the operator specifies a tag not yet in the local clone, the worktree step fails. Fix: git fetch --tags then re-run.

  • No materialized view comparison. Matviews are derived data; a row-count mismatch on a matview would just reflect upstream base-table differences. The harness focuses on base tables only.

  • Row counts only, not content. Use aveloxis shadow-diff for per-column verification when you need that depth.

  • Doesn’t catch silent column drift. If a column’s value changes meaning but the row count stays the same (e.g., a unit changed from milliseconds to seconds), data-test won’t flag it. This is a content-level concern handled by shadow-diff (for the tables it covers) or by domain-specific test suites.