Contributing to Aveloxis

This is the contributor handbook. The root CONTRIBUTING.md is a short orientation pointer; the chapters here are where the real material lives.

What Aveloxis is

A Go reimplementation of Augur, the community-health data collection pipeline used by CHAOSS metrics consumers like 8Knot. Aveloxis collects from GitHub and GitLab APIs, parses git logs, and stores everything in PostgreSQL with full Augur schema parity (108+ tables across two schemas). It adds features Augur lacks: staged collection for 400K+ repo fleets, deterministic contributor IDs, dead-repo sidelining, SBOM generation, OpenSSF Scorecard integration, vulnerability scanning, interactive visualizations, and a web GUI with OAuth.

If you’re new to the codebase, read these in order before touching code:

  1. docs/architecture/overview.md — the architecture in one page

  2. docs/architecture/staged-pipeline.md — how a single repo’s data flows from API to database

  3. docs/architecture/platform-layer.md — the GitHub/GitLab abstraction

  4. CLAUDE.md — every architectural decision, ordered newest-first. Dense, but it’s the project’s canonical memory.

How the codebase is laid out

aveloxis/
├── cmd/aveloxis/         # CLI entry point (cobra). All subcommands live here.
├── internal/
│   ├── api/              # REST API (port :8383). Charts and external consumers.
│   ├── collector/        # Per-repo collection pipeline (staged, facade, analysis, scancode, etc.)
│   ├── config/           # aveloxis.json parsing
│   ├── db/               # PostgreSQL store. Schema, migrations, upserts, queries.
│   │   └── schema.sql    # Source of truth for table definitions.
│   ├── importers/        # Apache/CNCF foundation imports.
│   ├── mailer/           # Gmail SMTP for transactional emails.
│   ├── model/            # Platform-agnostic data types shared between GitHub/GitLab.
│   ├── monitor/          # Monitoring dashboard (port :5555).
│   ├── platform/         # GitHub + GitLab API abstraction.
│   │   ├── github/       # GitHub-specific implementation of platform.Client.
│   │   ├── gitlab/       # GitLab-specific implementation.
│   │   └── platform.go   # The Client interface and shared types.
│   ├── scheduler/        # Worker pool, queue processing, periodic tasks.
│   └── web/              # Web GUI (port :8080). OAuth, groups, visualizations.
├── docs/                 # ReadTheDocs source (this file lives here).
└── scripts/              # SPDX header backfill, etc.

Every Go file carries an SPDX header (enforced by a tripwire test in scripts/). See code-conventions.md.

Chapters

Foundation (read before doing anything)

  • development-setup.md — get a local PostgreSQL, build the binary, run the test suite, set up aveloxis.runlocal.json

  • code-conventions.md — SPDX headers, file/package layout, error handling, slog, version bumping, commit style

  • testing.md — TDD discipline, source-contract pattern, integration tier via AVELOXIS_TEST_DB, the data-test harness for cross-version verification

Extending Aveloxis

What to read for a given task

You want to…

Start here

Fix a bug

testing.md, then the relevant package’s existing tests

Add a column to a table

schema-migrations.md

Add a new endpoint or chart

The relevant chapter above

Add a whole new data source

adding-a-platform.md

Understand why something was done a certain way

CLAUDE.md (search for the relevant version)

Run aveloxis locally

development-setup.md

The single-most-important rule

Bump internal/db/version.go on every code change. Feature, bugfix, refactor — every change. The version is the only way operators (and the data-test harness) can tell two versions of aveloxis apart. CHANGELOG entries (in CLAUDE.md’s ## Current Status section) reference the version, schemas tag rows with tool_version = db.ToolVersion, SBOMs embed it. Forgetting this breaks everything downstream.

See code-conventions.md for the version-bump checklist.