Aveloxis Documentation
Aveloxis is a high-performance open source community health data collection platform written in Go. It collects data from GitHub and GitLab with equal completeness, storing it in a shared PostgreSQL schema for cross-platform analysis. It is designed as a companion to (and eventual replacement for) the Augur collection pipeline.
Key Features
Full GitHub + GitLab parity — same data types collected from both platforms, including MR discussion review comments
Staged collection pipeline — JSONB staging decouples API speed from DB write contention at 400K+ repos
Postgres-backed queue — no Redis, RabbitMQ, or Celery. Multiple instances share the same queue via
SKIP LOCKEDGit commit analysis — bare clones +
git log --numstatfor per-file commit data, parent tracking, and Facade aggregatesContributor resolution — resolves git commit emails to GitHub users via noreply parsing, Commits API, and Search API
Dependency & complexity analysis — scans 15 ecosystems, calculates libyear across 12 package registries, runs scc for code complexity
Vulnerability scanning — OSV.dev batch API for CVE/GHSA lookup across all dependencies
SBOM generation — CycloneDX 1.5 + SPDX 2.3 with license capture from 12 registries
Interactive visualizations — weekly time-series charts, cross-project comparison with Z-score normalization, dependency license analysis
REST API — JSON endpoints for stats, time series, licenses, SBOM download, and repo search
19 materialized views — 8Knot-compatible analytics views, rebuilt weekly
Dead repo sidelining — permanently archives 404’d repos while preserving data
Deterministic contributor IDs — Augur-compatible GithubUUID scheme
Getting Started
User Guide
- Commands Reference
- Web GUI
- Prerequisites
- Configuration
- Starting the Web GUI
- Login Flow
- Creating Groups
- Adding Individual Repos to a Group
- Adding an Entire GitHub Org or GitLab Group
- Navigation and Breadcrumbs
- Comparing Repositories
- Searching and Pagination
- How Org Tracking Works
- How Repos Get Queued for Collection
- Session Management
- Running Alongside
aveloxis serve - Security Considerations
- REST API
- Visualizations
- Collection Pipeline
- Monitoring
- CI/CD Pipelines
- Scaling
- Troubleshooting
- Monitor dashboard renders slowly on a large fleet
- Search keystrokes freeze the dashboard at 100K repos
- /api/queue endpoint slow or returns huge JSON
- Token invalidation (401 vs 403)
- FK constraint violations
- “No API keys configured” / Startup failure
- “Commit resolution FAILED”
- “No data collected”
unsupported Unicode escape sequence (SQLSTATE 22P05)- “Pull requests / contributors / events: not found” or “forbidden”
- Gap-filled historical issues/PRs have no comments
- Metadata shows issues/PRs but gathered count stays at 0
- Repeated “unexpected status 301” retries on moved/renamed repos
- HTTP 410 Gone on individual issues / PRs
- GitLab repo_info.commit_count is 0 but facade found commits
- Release collection “not found” errors
- Git clone exit status 128
- Garbage timestamps (year 0001 BC)
- Schema version mismatch warning
- Null byte errors in text fields
- Restart procedure
- Checking queue status
- Changed
days_until_recollectis being ignored - Checking collection status
- Re-running a failed repo
- GraphQL PR batch errors on large repos
- Force-recollect a single repository
- Dead repo sidelining and un-sidelining
- Gateway errors (502/503/504)
- Deadlock errors
prepared statement "stmtcache_..." does not exist (SQLSTATE 26000)- Restart appears to take days before collection resumes
- All API tokens exhausted within minutes of restart
- Repeated
duplicate key value violates unique constraint "contributors_pkey"in Postgres logs - Orphaned postgres backend after
aveloxis stop serve - Next steps