Commands Reference
Complete reference for every Aveloxis CLI command.
aveloxis serve
Starts the long-running scheduler that continuously collects repos from the priority queue, plus a web monitoring dashboard.
aveloxis serve [flags]
Flags
Flag |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Address for the monitoring dashboard and REST API. Set to |
|
integer |
|
Number of concurrent collection workers. Each worker claims one repo at a time from the queue. |
|
boolean |
|
Also load API keys from Augur’s |
Behavior
Uses the staged collection pipeline (API -> staging -> processing -> facade -> commit resolution -> analysis).
The queue is Postgres-backed (
aveloxis_ops.collection_queue) and usesSELECT ... FOR UPDATE SKIP LOCKEDfor atomic job claiming.Safe to stop and restart at any time. On shutdown (
Ctrl-C/SIGTERM), active workers finish their current API call, queue locks are released, and staging data is preserved.On startup, automatically processes any leftover staged data from a previous interrupted run.
Stale locks from crashed instances are recovered after 1 hour.
Multiple instances can share the same queue for horizontal scaling.
Periodic tasks
The scheduler also runs these background tasks:
Task |
Interval |
Description |
|---|---|---|
Org refresh |
Every 4 hours |
Re-fetches org membership lists |
Contributor breadth |
Every 6 hours |
Discovers cross-repo contributor activity via GitHub Events API |
Materialized view rebuild |
Weekly (Saturday) |
Pauses collection, refreshes all 19 matviews, resumes |
Stale lock recovery |
Every 5 minutes |
Re-queues jobs locked for more than 1 hour |
Scancode worker tuning
aveloxis serve also runs a decoupled ScancodeWorker pool (v0.21.0+) for per-file license + copyright scanning. The pool’s behavior is tuned via the collection block of aveloxis.json — no CLI flag. The most operationally relevant knobs:
|
Default |
What to change it for |
|---|---|---|
|
|
More concurrent scancode invocations (cap by CPU cores). |
|
|
Raise on RAM-rich hosts (e.g. |
|
|
Raise on fleets skewed toward big repos so the per-job timeout starts higher (default is adaptive — every wall-clock timeout doubles the next attempt’s budget up to |
|
|
Lower for testing or when source-license churn is unusually high. |
See configuration.md -> scancode worker for the full per-field reference and the v0.21.x → v0.25.x context.
Examples
# Start with defaults (1 worker, dashboard on :5555)
aveloxis serve
# Start with 4 workers and a custom dashboard port
aveloxis serve --workers 4 --monitor :8082
# Start using Augur's API keys
aveloxis serve --workers 4 --augur-keys
aveloxis web
Starts the web GUI for OAuth-based group management. Users log in via GitHub or GitLab, create groups, and add repositories or entire organizations for collection.
aveloxis web
Configuration
The web command has no CLI flags. All settings come from the web section of aveloxis.json:
Config field |
Type |
Default |
Description |
|---|---|---|---|
|
string |
|
Listen address for the web server. |
|
string |
|
External URL used to construct OAuth callback URLs. |
|
string |
(required) |
Secret key for signing session cookies. |
|
string |
|
GitHub OAuth app client ID. |
|
string |
|
GitHub OAuth app client secret. |
|
string |
|
GitLab OAuth app client ID (Application ID). |
|
string |
|
GitLab OAuth app client secret. |
|
string |
|
GitLab instance URL for self-hosted instances. |
OAuth app setup
GitHub: Create an OAuth app at https://github.com/settings/developers. Set the callback URL to
{web.base_url}/auth/github/callback.GitLab: Create an OAuth app at https://gitlab.com/-/profile/applications. Set the redirect URI to
{web.base_url}/auth/gitlab/callback. Check theread_userscope.
Behavior
Serves the web GUI on the configured listen address.
Users authenticate via OAuth, then create groups and add repos or orgs through the browser.
Breadcrumb navigation shows
Home / {Group Name}for easy navigation.Group detail pages display repos 25 per page with pagination controls. A case-insensitive search box filters by repo name, owner, or URL.
Repos added through the web GUI are inserted into the same collection queue used by
aveloxis add-repo.When a user adds a GitHub org or GitLab group, a
user_org_requestsrow is created. A background task scans tracked orgs every 4 hours to discover and queue new repos.Sessions are stored in-memory with a 24-hour expiry. Restarting the process clears all sessions.
Runs as a separate process from
aveloxis serve. Both share the database.
Examples
# Start the web GUI (uses settings from aveloxis.json)
aveloxis web
# Then open http://localhost:8082 in your browser
See the Web GUI guide for detailed setup instructions.
aveloxis collect
One-shot collection of specific repos without the scheduler. Uses the direct collection pipeline (bypasses staging, writes directly to relational tables). Best for testing or collecting a small number of repos.
aveloxis collect [flags] <url> [<url> ...]
Flags
Flag |
Type |
Default |
Description |
|---|---|---|---|
|
boolean |
|
Full historical collection. Ignores the |
|
boolean |
|
Also load API keys from Augur’s |
Examples
# Incremental collection (only new data since last run)
aveloxis collect https://github.com/chaoss/augur
# Full historical collection
aveloxis collect --full https://github.com/chaoss/augur
# Multiple repos, mixed platforms
aveloxis collect \
https://github.com/torvalds/linux \
https://gitlab.com/fdroid/fdroidclient
aveloxis add-repo
Adds repositories to the collection queue. Platform is auto-detected from the URL.
aveloxis add-repo [flags] <url> [<url> ...]
Flags
Flag |
Type |
Default |
Description |
|---|---|---|---|
|
integer |
|
Queue priority. Lower numbers are collected first. |
|
boolean |
|
Import all repos from |
URL formats
# Single GitHub repo
aveloxis add-repo https://github.com/chaoss/augur
# Single GitLab repo (including nested subgroups)
aveloxis add-repo https://gitlab.com/group/subgroup/project
# GitHub organization (adds all repos in the org)
aveloxis add-repo https://github.com/chaoss
# Multiple repos at once
aveloxis add-repo \
https://github.com/torvalds/linux \
https://github.com/chaoss/grimoirelab
# High priority
aveloxis add-repo --priority 10 https://github.com/kubernetes/kubernetes
# Import from Augur
aveloxis add-repo --from-augur
Platform detection
URLs containing
github.comare treated as GitHubURLs containing
gitlabin the hostname are treated as GitLabHostnames listed in
gitlab.gitlab_hostsin the config are treated as GitLab
aveloxis add-key
Stores API keys in the database for use during collection.
aveloxis add-key [flags] [<token>]
Flags
Flag |
Type |
Default |
Description |
|---|---|---|---|
|
string |
(required unless |
Platform for the key: |
|
boolean |
|
Bulk import all keys from |
Examples
# Store a GitHub token
aveloxis add-key ghp_your_github_token --platform github
# Store a GitLab token
aveloxis add-key glpat-your_gitlab_token --platform gitlab
# Bulk import from Augur
aveloxis add-key --from-augur
aveloxis prioritize
Pushes a repository to the front of the collection queue.
aveloxis prioritize <url>
Sets the repo’s priority to 0 and due time to now. The scheduler will collect it next.
aveloxis prioritize https://github.com/chaoss/augur
Also available via the REST API:
curl -X POST http://localhost:5555/api/prioritize/42
Where 42 is the repo’s repo_id.
aveloxis recollect
Flags one or more repositories for a full (since=zero) re-collection on their next scheduler cycle.
aveloxis recollect <url>...
Sets the force_full_collect flag on each named repo’s aveloxis_ops.collection_queue row. When the scheduler dequeues a flagged repo, determineSince() returns zero time, triggering a full re-collection regardless of last_collected. The flag clears itself on the next successful completion.
aveloxis recollect https://github.com/chaoss/augur
aveloxis recollect https://github.com/a/b https://github.com/c/d # multiple
Use this command:
After a bugfix or schema change that invalidates previously-collected data for a specific repo.
To manually force a refresh when you suspect the incremental-since window missed something.
In combination with
aveloxis prioritize <url>to start the full re-collection immediately rather than at the repo’s normal due time.
Automatic triggering (v0.18.24+)
The scheduler also sets this flag automatically when a collection ends with an error that indicates incomplete PR child data — specifically the GraphQL PR batch error classes (stream CANCEL mid-body, “Timeout on validation of query”, or retry exhaustion). The next cycle then backfills whatever the failed batch missed. Operators see a force_full_recollect set WARN log line when auto-flagging fires.
See the troubleshooting guide for details on the error classes that trigger auto-flagging.
aveloxis migrate
Creates or updates the database schema.
aveloxis migrate
Creates 108 tables and 19 materialized views across two PostgreSQL schemas:
aveloxis_data(84 tables + 19 materialized views) – all collected dataaveloxis_ops(24 tables) – operational state
Also performs a data cleanup pass that nullifies garbage timestamps (year < 1970) across all tables, preventing BC-era dates from poisoning queries.
Safe to run repeatedly. All DDL uses CREATE ... IF NOT EXISTS and inserts use ON CONFLICT DO NOTHING. Does not touch Augur schemas if sharing a database.
aveloxis refresh-views
Manually refreshes all 19 materialized views.
aveloxis refresh-views
Uses REFRESH MATERIALIZED VIEW CONCURRENTLY where unique indexes exist, so reads are not blocked during the refresh. Views are also rebuilt automatically every Saturday by aveloxis serve.
aveloxis install-tools
Installs optional analysis tools.
aveloxis install-tools
Currently installs scc (Sloc Cloc and Code) for per-file code complexity analysis. Requires Go to be installed.
If scc is not installed, the code complexity phase is silently skipped during collection.
aveloxis start
Launches aveloxis components as detached background processes with log output redirected to files in ~/.aveloxis/.
aveloxis start serve # scheduler + monitor → ~/.aveloxis/aveloxis.log
aveloxis start web # web GUI → ~/.aveloxis/web.log
aveloxis start api # REST API → ~/.aveloxis/api.log
aveloxis start all # all three at once
PID files are written to ~/.aveloxis/aveloxis-{serve,web,api}.pid. If a component is already running, the command reports it and skips the launch.
Log files are opened in append mode — existing content is preserved across restarts.
aveloxis stop
Gracefully stops background aveloxis processes.
aveloxis stop serve # stop only the scheduler
aveloxis stop web # stop only the web GUI
aveloxis stop api # stop only the REST API
aveloxis stop all # stop all three
aveloxis stop # (no args) same as 'all'
Sends SIGTERM to the specified component(s) using PID files in ~/.aveloxis/. Active workers finish their current API call, queue locks are released, and staging data is preserved. PID files are cleaned up automatically. Stale PID files (process no longer running) are detected and removed.
Note
aveloxis stop also works for processes started in the foreground (e.g., aveloxis serve), because all foreground processes write PID files on startup.
aveloxis data-test
Operator-driven shadow-database verification harness for schema
changes. Builds binaries from a tagged release and the local working
tree, provisions two scratch databases (aveloxis_released and
aveloxis_new), collects the same repo into each, and reports
row-count differences. Catches data-loss regressions before they
ship. Shipped in v0.22.8.
See the full guide: Schema-change verification.
Flags
--released-tag TAG— git tag of the released aveloxis version to compare against (e.g.,0.22.6). The tag must exist in the local clone;git fetch --tagsif missing. Required.--repo URL— git URL of the test repo to collect into both scratch DBs.augurlabs/auguris the canonical choice. Required.--keep-dbs— retain the scratch DBs after the run. Default is to drop them. Pass when you want to inspect a failing table viapsqlafter the report is written.--work-dir PATH— where to put binaries, logs, and the report. Default is/tmp/aveloxis-data-test-<UTC-timestamp>.
Behavior
Builds the released binary via
git worktree add(reuses local clone’s objects — no remote fetch).Builds the local binary from the current working tree.
Connects to the configured PostgreSQL host using the operator’s
aveloxis.jsoncredentials. The user must have CREATEDB privilege because the harness creates and drops scratch databases.Copies API keys from the operator’s primary
aveloxis_ops.api_keystable into both scratch DBs — operator doesn’t re-paste tokens.Collections run sequentially (~30 min each), not parallel, because both sides share the API key pool.
Exit code 0 on PASS or FLAG-only; exit code 1 on any FAIL (row loss detected) — suitable for CI gating.
Examples
# Validate the current working tree against v0.22.6 using augur
aveloxis data-test --released-tag 0.22.6 \
--repo https://github.com/augurlabs/augur
# Keep scratch DBs for ad-hoc inspection after a FAIL
aveloxis data-test --released-tag 0.22.6 \
--repo https://github.com/augurlabs/augur \
--keep-dbs
# Custom work directory (useful for CI artifact retention)
aveloxis data-test --released-tag 0.22.6 \
--repo https://github.com/augurlabs/augur \
--work-dir /var/cache/aveloxis-data-test
The full report is written to <work-dir>/report.md. See
Schema-change verification for guidance on
interpreting PASS / FLAG / FAIL results.
Mailing-list commands
These commands register and verify the mailing-list ingestion subsystem
(off by default; set collection.mailing_list_enabled = true to collect).
See Mailing-list ingestion for how the
subsystem works.
aveloxis load-foundation-core-repos
Loads one core/primary repository per project across the tracked open-source
foundations (Apache TLPs + podlings). Idempotently registers each project’s
flagship repo into a foundation group. Renamed from import-foundations
(kept as a hidden alias so existing scripts keep working).
aveloxis load-foundation-core-repos
aveloxis load-foundation-core-repos --dry-run
aveloxis load-foundation-orgs
Registers each foundation’s GitHub org(s) as tracked orgs under your
user, so the periodic org-refresh ticker continuously discovers new repos.
Tracking the apache org pulls all ~3,000 apache/* repos (a large
collection-budget commitment — surfaced in the output) so sibling repos
like arrow-rs get collected and mailing-list repo signals resolve.
aveloxis load-foundation-orgs --dry-run
aveloxis load-foundation-orgs --yes
Flag |
Description |
|---|---|
|
owning user (default 1) |
|
print planned registrations without writing |
|
proceed without the interactive confirmation |
aveloxis load-apache-lists
For each Apache PMC, ensures a per-PMC repo_group, links the PMC’s primary
repo, and registers the dev@ and users@ lists for collection. Run
load-foundation-core-repos first so the primary repos exist; PMCs whose
repo isn’t in the catalog are skipped.
aveloxis load-apache-lists --dry-run
aveloxis load-apache-lists
aveloxis register-mailing-list
Registers a single mailing list for collection under any archive system —
used for curated, non-catalog lists like the kernel’s lore public-inbox.
The list is attached to the repo’s repo_group (named after the repo, so
multiple lists for one repo share a group).
aveloxis register-mailing-list \
--system lore_public_inbox \
--list linux-pci@vger.kernel.org \
--repo https://github.com/torvalds/linux
Flag |
Description |
|---|---|
|
archive system ( |
|
list address |
|
repo URL to attach the list’s discussion to (must already exist in the catalog) |
aveloxis backfill-issue-external-keys
Populates issues.external_key from bracketed [KEY-N] title prefixes
(Apache Jira → GitHub issue imports). This is what lets issue_event
mailing-list mail bridge to the imported issue by its Jira key.
aveloxis backfill-issue-external-keys
aveloxis mailing-list-stats
Read-only coverage rollup: registered lists, email_message counts, mirror
rate, signaled-repo resolution, sender-identity resolution, and the
per-class distribution. Safe to run alongside an active serve.
aveloxis mailing-list-stats
The same data is available over HTTP at GET /api/v1/mailing-list/stats
(see REST API).
aveloxis verify-mailing-list
The Phase 4 branch-coverage harness. Reads the collected data and prints a
PASS / EMPTY / DEFER table for every logic branch (each msg_class, both
backends, each routing outcome, threading, signaled/sender resolution,
external_key backfill), plus the contributor-resolution assessment.
aveloxis verify-mailing-list # report only (exit 0)
aveloxis verify-mailing-list --strict # exit non-zero if a required branch is empty
Flag |
Description |
|---|---|
|
exit non-zero when a required (mailing-list-native) branch produced zero rows |
--strict gates only the mailing-list-native branches. Cross-subsystem
branches (bridged-to-issue/PR, mirror-linked, sender-resolved,
external_key) report as DEFER and never gate — they fill in
steady-state operation once the linked repos’ GitHub data is collected and
the periodic backfills run. See
Mailing-list ingestion §12 for the
collection-ordering caveat.
aveloxis version
Prints the Aveloxis version.
aveloxis version
Global behavior
Config file
All commands look for aveloxis.json in the current working directory. The config file must exist and contain valid database connection parameters.
Exit codes
Code |
Meaning |
|---|---|
0 |
Success |
1 |
General error (invalid arguments, config not found, database connection failed) |
Signal handling
aveloxis serve handles the following signals:
SIGTERM/SIGINT(Ctrl-C) – graceful shutdown. Workers finish current API calls, locks are released, staging data is preserved.SIGTERMsent byaveloxis stop– same graceful shutdown.