Contributing to Aveloxis
This is the contributor handbook. The root CONTRIBUTING.md is a short orientation pointer; the chapters here are where the real material lives.
What Aveloxis is
A Go reimplementation of Augur, the community-health data collection pipeline used by CHAOSS metrics consumers like 8Knot. Aveloxis collects from GitHub and GitLab APIs, parses git logs, and stores everything in PostgreSQL with full Augur schema parity (108+ tables across two schemas). It adds features Augur lacks: staged collection for 400K+ repo fleets, deterministic contributor IDs, dead-repo sidelining, SBOM generation, OpenSSF Scorecard integration, vulnerability scanning, interactive visualizations, and a web GUI with OAuth.
If you’re new to the codebase, read these in order before touching code:
docs/architecture/overview.md— the architecture in one pagedocs/architecture/staged-pipeline.md— how a single repo’s data flows from API to databasedocs/architecture/platform-layer.md— the GitHub/GitLab abstractionCLAUDE.md— every architectural decision, ordered newest-first. Dense, but it’s the project’s canonical memory.
How the codebase is laid out
aveloxis/
├── cmd/aveloxis/ # CLI entry point (cobra). All subcommands live here.
├── internal/
│ ├── api/ # REST API (port :8383). Charts and external consumers.
│ ├── collector/ # Per-repo collection pipeline (staged, facade, analysis, scancode, etc.)
│ ├── config/ # aveloxis.json parsing
│ ├── db/ # PostgreSQL store. Schema, migrations, upserts, queries.
│ │ └── schema.sql # Source of truth for table definitions.
│ ├── importers/ # Apache/CNCF foundation imports.
│ ├── mailer/ # Gmail SMTP for transactional emails.
│ ├── model/ # Platform-agnostic data types shared between GitHub/GitLab.
│ ├── monitor/ # Monitoring dashboard (port :5555).
│ ├── platform/ # GitHub + GitLab API abstraction.
│ │ ├── github/ # GitHub-specific implementation of platform.Client.
│ │ ├── gitlab/ # GitLab-specific implementation.
│ │ └── platform.go # The Client interface and shared types.
│ ├── scheduler/ # Worker pool, queue processing, periodic tasks.
│ └── web/ # Web GUI (port :8080). OAuth, groups, visualizations.
├── docs/ # ReadTheDocs source (this file lives here).
└── scripts/ # SPDX header backfill, etc.
Every Go file carries an SPDX header (enforced by a tripwire test in scripts/). See code-conventions.md.
Chapters
Foundation (read before doing anything)
development-setup.md— get a local PostgreSQL, build the binary, run the test suite, set upaveloxis.runlocal.jsoncode-conventions.md— SPDX headers, file/package layout, error handling, slog, version bumping, commit styletesting.md— TDD discipline, source-contract pattern, integration tier viaAVELOXIS_TEST_DB, thedata-testharness for cross-version verification
Extending Aveloxis
schema-migrations.md— adding columns, indexes, backfills; fail-closed contract; integration-test recipeadding-a-platform.md— the big one. Add Bugzilla / Gitea / Forgejo / SourceHut / whatever else. Concrete walkthrough with code skeletons.adding-a-rest-endpoint.md— expose new data through the REST APIadding-a-collection-phase.md— plug a new phase into the staged pipeline (analysis, SBOM, vulnerability scan are all phases)adding-a-visualization.md— new Chart.js panel on the repo detail or comparison page
What to read for a given task
You want to… |
Start here |
|---|---|
Fix a bug |
|
Add a column to a table |
|
Add a new endpoint or chart |
The relevant chapter above |
Add a whole new data source |
|
Understand why something was done a certain way |
|
Run aveloxis locally |
The single-most-important rule
Bump internal/db/version.go on every code change. Feature, bugfix, refactor — every change. The version is the only way operators (and the data-test harness) can tell two versions of aveloxis apart. CHANGELOG entries (in CLAUDE.md’s ## Current Status section) reference the version, schemas tag rows with tool_version = db.ToolVersion, SBOMs embed it. Forgetting this breaks everything downstream.
See code-conventions.md for the version-bump checklist.