REST API
Aveloxis includes a REST API server for programmatic access to collected data, repository statistics, time-series metrics, SBOM downloads, and vulnerability information. Start it with:
aveloxis api --addr :8383
The API runs as a separate process alongside aveloxis serve (collection) and aveloxis web (GUI). All three share the same PostgreSQL database.
Endpoints
Health Check
GET /api/v1/health
Returns the server status and version.
{"status": "ok", "version": "0.9.0"}
Repository Statistics
GET /api/v1/repos/{repoID}/stats
Returns gathered (actual row counts) vs metadata (API-reported totals) for a single repo.
{
"repo_id": 42,
"gathered_prs": 1500,
"gathered_issues": 800,
"gathered_commits": 5000,
"metadata_prs": 1520,
"metadata_issues": 810,
"metadata_commits": 5100,
"vulnerabilities": 12,
"critical_vulns": 2
}
Gathered counts come from actual rows in the data tables.
Metadata counts come from the most recent
repo_infosnapshot (GitHub GraphQL / GitLab API totals).Vulnerabilities come from OSV.dev vulnerability scanning.
Batch Statistics
GET /api/v1/repos/stats?ids=1,2,3,42
Returns stats for multiple repos in one call. Response is a map keyed by repo ID.
Time Series
GET /api/v1/repos/{repoID}/timeseries
GET /api/v1/repos/{repoID}/timeseries?since=2024-01-01
Returns weekly aggregated counts for commits, PRs opened, PRs merged, and issues.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
date (YYYY-MM-DD) |
2 years ago |
Start date for time series |
{
"repo_id": 42,
"repo_name": "augur",
"repo_owner": "aveloxis",
"commits": [
{"week_start": "2024-01-01T00:00:00Z", "count": 15},
{"week_start": "2024-01-08T00:00:00Z", "count": 22}
],
"prs_opened": [...],
"prs_merged": [...],
"issues": [...]
}
Weeks are Monday-aligned via PostgreSQL date_trunc('week', timestamp). Queries use indexed timestamp columns for fast responses even on large databases.
Dependency Licenses
GET /api/v1/repos/{repoID}/licenses
Returns a summary of dependency licenses with counts and OSI compliance status.
[
{"license": "MIT", "count": 45, "is_osi": true},
{"license": "Apache-2.0", "count": 12, "is_osi": true},
{"license": "Unknown", "count": 3, "is_osi": false}
]
OSI compliance is checked against a built-in list of 30+ known OSI-approved SPDX identifiers.
Repository Search
GET /api/v1/repos/search?q=augur
Case-insensitive search across repo name, owner, and URL. Returns up to 20 matches. Used by the comparison page’s autocomplete search.
[
{"id": 2, "owner": "aveloxis", "name": "augur"},
{"id": 31, "owner": "chaoss", "name": "augur-license"}
]
SBOM Download
GET /api/v1/repos/{repoID}/sbom?format=cyclonedx
GET /api/v1/repos/{repoID}/sbom?format=spdx
Generates and downloads a Software Bill of Materials in CycloneDX 1.5 or SPDX 2.3 JSON format. The SBOM is generated on-the-fly from collected dependency data.
Parameter |
Values |
Default |
Description |
|---|---|---|---|
|
|
|
SBOM format |
Returns JSON with Content-Disposition: attachment header for download.
Contributor identities in a window
GET /api/v1/repos/{repoID}/contributions/identities
GET /api/v1/repos/{repoID}/contributions/identities?since=2024-01-01
GET /api/v1/repos/{repoID}/contributions/identities?since=2024-01-01&until=2024-12-31
Returns every distinct contributor who made any kind of contribution to the repo in the requested window. The result is one row per person, suitable for rendering a roster or building an affiliation chart against a derived per-person grouping.
Parameter |
Type |
Default |
Description |
|---|---|---|---|
|
date (YYYY-MM-DD) |
2 years ago |
Window lower bound (inclusive) |
|
date (YYYY-MM-DD) |
unbounded (now) |
Window upper bound (inclusive — see note below) |
The until date is treated as inclusive of the entire calendar day — the server shifts it by +1 day before comparing against the half-open < upper SQL filter, so passing until=2024-12-31 captures everything through 2024-12-31T23:59:59.999Z.
Malformed dates fall back to the defaults rather than returning 400, matching the existing /timeseries endpoint behavior so charts and dashboards keep rendering. The one validation error that does surface as 400 is since >= until, which is almost certainly an operator typo.
Response shape:
[
{
"cntrb_id": "01000001-0000-4000-8000-000000000000",
"login": "alice",
"full_name": "Alice Anderson",
"email": "alice@example.com",
"profile_company": "Acme Corp",
"location": "Berlin"
},
...
]
All string fields are normalized server-side: "" represents “no value recorded” (no per-field null handling needed on the client). Ordering is by login NULLS LAST, full_name NULLS LAST so unidentifiable contributors sort to the bottom of the roster.
What counts as a contribution (all in one window via the unified messages table and the standard work-tracking tables):
Kind |
Source table |
Time column |
|---|---|---|
Commit authorship |
|
|
Issue opened |
|
|
Issue closed |
|
|
Issue event (label / assignment / reference) |
|
|
PR opened |
|
|
PR review submitted |
|
|
PR event |
|
|
Any message (issue comment, PR conversation comment, inline review comment body) |
|
|
Per the “Unified message architecture” contract, all three text-contribution kinds live in messages with cntrb_id as the author — one filter covers them all.
What’s intentionally not counted:
Assignees and reviewers who never actually did anything — being asked to review isn’t a contribution. They show up in
*_assignees/*_reviewerstables but aren’t surfaced here.Commits whose
cmt_ght_author_idis NULL — these are commits aveloxis hasn’t been able to resolve to a contributor row (private email, or the search-resolve background ticker hasn’t reached them yet). They don’t have acntrb_idto return. The number of such commits in a given window is queryable via the metric endpoints (/code-changes) and the gap is closed over time by the v0.19.2 search-resolve work.contributor_reporows from the breadth worker — those represent “this person was active anywhere on the repo at some point” but the time semantics are different (collection-cycle timestamp, not when the contribution happened), so they don’t belong in a contribution-window query.Soft-deleted contributors (
cntrb_deleted != 0) — the v0.20.2 logical-merge path marks loser rows when a rename was detected. Filtering them out is the contract; merged identities surface only under the winningcntrb_id.
Affiliation breakdown for the same window
GET /api/v1/repos/{repoID}/contributions/affiliations
GET /api/v1/repos/{repoID}/contributions/affiliations?since=2024-01-01&until=2024-12-31
Returns the count of distinct contributors per affiliation, using the same window and the same contribution-kind definition as /contributions/identities. The two endpoints share a single SQL CTE on the server so the two responses can never disagree on which people are in scope — a sum across this endpoint’s contributor_count values equals the row count of /contributions/identities.
Same since / until parameters as the identities endpoint; same behavior on malformed input and since >= until.
Response shape:
[
{"affiliation": "Acme Corp", "contributor_count": 47},
{"affiliation": "RedHat", "contributor_count": 12},
{"affiliation": "(unknown)", "contributor_count": 31}
]
Ordered by contributor_count DESC then affiliation ASC. The (unknown) bucket is included rather than hidden so callers can decide whether to surface unaffiliated contributors (often the right call on community projects) or omit them.
Affiliation derivation priority (applied per-contributor):
contributor_affiliations[domain_of(cntrb_canonical)]— the curated email-domain → org map maintained by aveloxis’sPopulateAffiliationsbackground task. This is the most reliable signal because it covers people whose GitHub/GitLab profile is blank but whose verified email domain is well-known (e.g.@redhat.com→ “RedHat”).cntrb_company— what the user typed into their GitHub or GitLab profile. Freeform text; often blank, sometimes “@org” (the GitHub@-mention reference style — aveloxis strips the leading@before using it as an affiliation label).(unknown)— fallback bucket for contributors with neither a domain-mapped canonical email nor a profile company string.
The derivation priority is deliberate: the curated domain map is updated by a background task that watches observed contributor data, while the profile field is freeform and easily stale (“Self-employed”, “Earth”, typos of well-known company names, etc.). When both are present the domain-mapped value wins because it’s more likely to be canonical.
Tweaks you can make on the client side
Narrow to creative work only: this endpoint includes everything. To exclude event-only activity (labels, references) post-process the identities list against another endpoint that’s restricted to commits/PRs/issues only, or filter on the client.
Group of repos: the two endpoints are per-repo. For an org-wide rollup, call them for each repo in the group and merge the responses (the
cntrb_idcolumn is stable across repos so dedup is trivial).Hide the
(unknown)bucket: filter on the client. The server returns it so the math reconciles with/contributions/identities.Different windows:
?since=YYYY-MM-DDand?until=YYYY-MM-DDare both accepted independently. Omituntilfor “everything sincesince”.
Knowing whether your coverage is complete
GET /api/v1/repos/{repoID}/contributions/coverage
GET /api/v1/repos/{repoID}/contributions/coverage?since=2024-01-01&until=2024-12-31
Returns the enrichment-state snapshot for the same cohort as /contributions/identities and /contributions/affiliations. Operators call this before drawing conclusions from the affiliation breakdown to tell whether an (unknown) bucket represents truly unaffiliated contributors or just people the v0.18.29 enrichment ticker hasn’t reached yet.
Same since / until parameters as the other two endpoints; same behavior on malformed input and since >= until.
Response shape:
{
"window_since": "2024-05-21T00:00:00Z",
"window_until": "2026-05-21T00:00:00Z",
"total_contributors": 412,
"enriched": 389,
"canonical_email": 356,
"gh_user_id_resolved": 401,
"search_resolve_attempted": 47,
"breadth_attempted": 378,
"affiliation_resolved": 318,
"affiliation_unknown": 94,
"enrichment_oldest_pending": "2026-05-12T18:31:04Z",
"enrichment_stalest": "2024-08-15T03:22:11Z"
}
The two timestamp fields are omitted entirely when the cohort has no rows in the relevant state (no pointer → field absent in JSON rather than emitting zero-time, which is operator-confusing).
Reading the response. A response with total=412, enriched=389, affiliation_resolved=318, affiliation_unknown=94 reads as:
412 people contributed in the window. 389 of them have been successfully enriched via
/users/{login}and 23 haven’t yet — the enrichment ticker is still working through them. 318 have a resolvable affiliation (either via the curated email-domain map or via their profile company field). 94 are bucketed as(unknown)— but 23 of those might be the unenriched cohort that will pick up an affiliation once the ticker reaches them. So the true unaffiliated count for this window is somewhere between 71 (if all 23 unenriched contributors turn out to be unaffiliated) and 94 (if none of them do).
Operators surface this floor-and-ceiling on dashboards rather than reporting (unknown) alone — the latter conflates “no affiliation” with “we haven’t asked yet.”
Field-by-field reference:
Field |
Source signal |
What it tells you |
|---|---|---|
|
The cohort |
Denominator for everything else |
|
|
|
|
|
Verified email known — drives domain → affiliation lookup |
|
|
Person matched to numeric GitHub user (stable identity across renames) |
|
|
v0.19.2 search-resolve ticker has tried to look this person up by email (60-min cooldown, 30-day re-attempt) |
|
|
v0.20.17 breadth worker has tried |
|
Domain-mapped via |
Will show up under a non- |
|
|
The |
|
|
How long the most-delayed unenriched contributor has been waiting — compare against your configured |
|
|
Oldest “last refreshed” timestamp — surfaces the long tail of “enriched 18 months ago and never refreshed” |
Spotting a stuck enrichment ticker. If enrichment_oldest_pending is more than ~2× your configured enrich_interval_minutes behind NOW(), the ticker may be stuck. Investigation:
# What does the enrich interval look like?
grep enrich_interval_minutes ~/.aveloxis/aveloxis.json
# Has the enrichment ticker been ticking?
grep -E "EnrichThinContributors|enrichment" ~/.aveloxis/aveloxis.log | tail -20
# Are we burning API budget?
grep -E "all API keys rate-limited|rate limit" ~/.aveloxis/aveloxis.log | tail -10
If the ticker is running but enrichment is still falling behind, it’s almost always API-key budget exhaustion (the v0.18.29 EnrichBatchSize = 14000 per tick is sized for a 73-key fleet; smaller key pools can’t keep up).
What this endpoint doesn’t tell you:
Per-affiliation coverage drill-down: the response is global to the cohort. If you need “what % of Acme Corp contributors have canonical emails” specifically, that’s a derived query — call
/identitiesand group client-side, or open an issue for a per-affiliation coverage endpoint.Whether
PopulateAffiliationsis current: the domain-mapped affiliations come from thecontributor_affiliationstable, which is rebuilt hourly by the v0.19.7 ticker. The table state at any given moment reflects the most recent successful rebuild, not a continuous live view. If you’ve just added new contributors with novel company strings, give it an hour forPopulateAffiliationsto surface them in the map.Fleet-wide coverage: this endpoint is per-repo. For a fleet-wide rollup, call it per repo and aggregate (or, if there’s operator demand, request a
/api/v1/contributions/coverageglobal endpoint as a follow-up).
When to use which endpoint
Need |
Use |
|---|---|
“Who contributed to this repo in the last two years?” |
|
“How many people from each company contributed?” |
|
“Is the affiliation data trustworthy yet, or is enrichment still catching up?” |
|
“How many new contributors per month did this repo gain?” (Augur metric) |
|
“Total contributor count, no window” |
|
“How many commits per week?” |
|
The Augur-compatible endpoints (/contributors, /contributors-new, etc.) follow Augur’s swagger spec with begin_date / end_date / period query params and return aggregated counts. The /contributions/* endpoints follow the aveloxis convention (since / until) and return per-contributor identity rows, an aggregated affiliation roll-up, and a coverage snapshot respectively. The two groups serve different questions and don’t overlap.
Mailing-list collection coverage
GET /api/v1/mailing-list/stats
Fleet-wide rollup of the mailing-list ingestion subsystem (architecture) — the same data as aveloxis mailing-list-stats. No parameters. Returns 500 if the query fails.
Response (note: keys are PascalCase — the rollup struct carries no JSON tags):
{
"Lists": 16,
"ScanComplete": 14,
"EmailMessages": 68514,
"Mirrors": 41841,
"SignaledCaptured": 40044,
"SignaledResolved": 25591,
"SenderTotal": 26673,
"SenderResolved": 17012,
"ByClass": {
"github_mirror": 40044,
"issue_event": 5251,
"patch_submission": 4568,
"discuss": 1161,
"review": 953
}
}
Field |
Meaning |
|---|---|
|
registered lists, and how many have finished their current scan |
|
total |
|
rows classified as mirror mail ( |
|
messages that named a repo (Axis B) / those resolved to a repo we hold. The ratio is catalog-coverage, not quality — unresolved signals point at sibling repos not yet tracked. |
|
mailing-list message bodies / those whose sender resolved to a contributor (improves over time via the hourly backfill) |
|
per- |
CORS
All API endpoints return Access-Control-Allow-Origin: * to allow cross-origin requests from the web GUI (which runs on a different port).
Deployment
The API server is stateless — it reads directly from PostgreSQL. You can run multiple instances behind a load balancer for high availability.
# Typical 3-process deployment
(nohup aveloxis serve --workers 40 --monitor :5555 >> aveloxis.log &)
(nohup aveloxis web >> web.log &)
(nohup aveloxis api --addr :8383 >> api.log &)
The web GUI’s Chart.js visualizations fetch data from the API server. The API URL is configured as http://localhost:8383 by default. If running on a different host or port, update the API base URL in the web templates.