Platform Abstraction Layer
The internal/platform package is the HTTP client, rate limiting, and API abstraction layer that enables Aveloxis to collect from GitHub and GitLab with equal completeness through a single interface.
Interface hierarchy
platform.Client
|-- Platform() -> model.Platform
|-- ParseRepoURL() -> owner, repo
|-- RepoCollector
| |-- FetchRepoInfo
| |-- FetchCloneStats
|-- IssueCollector
| |-- ListIssues
| |-- ListIssueLabels
| |-- ListIssueAssignees
|-- PullRequestCollector
| |-- ListPullRequests
| |-- ListPRLabels, ListPRAssignees, ListPRReviewers
| |-- ListPRReviews, ListPRCommits, ListPRFiles
| |-- FetchPRMeta
|-- EventCollector
| |-- ListIssueEvents
| |-- ListPREvents
|-- MessageCollector
| |-- ListIssueComments
| |-- ListPRComments
| |-- ListReviewComments
|-- ReleaseCollector
| |-- ListReleases
|-- ContributorCollector
|-- ListContributors
|-- EnrichContributor
All list methods return iter.Seq2[T, error] (Go 1.23 iterators) for memory-efficient streaming pagination. Callers consume results with for item, err := range client.ListIssues(...).
HTTP client (HTTPClient)
Shared by both GitHub and GitLab implementations. Features:
Platform-aware authentication:
AuthStyleparameter controls the auth header format. GitHub usesAuthorization: token <key>(PATs). GitLab usesPRIVATE-TOKEN: <key>. Set at construction viaNewHTTPClient(..., AuthGitHub)orNewHTTPClient(..., AuthGitLab).Connection pooling: HTTP/2 enabled, 20 idle connections per host for high-throughput collection.
Automatic retries: Up to 10 retries with exponential backoff for transient errors (502/503/504).
Rate limit awareness: Reads
X-RateLimit-*(GitHub) andRateLimit-*(GitLab) headers, waits for reset when exhausted.Secondary rate limit handling: Respects
Retry-Afterheaders from GitHub’s secondary rate limits.Conditional requests (ETags): Caches ETags from responses and sends
If-None-Matchon subsequent requests. GitHub does not count 304 responses against the rate limit, saving quota on unchanged data during incremental collection.Bad credential detection: 401 responses permanently invalidate the API key.
Explicit redirect handling (v0.16.10+): Go’s default redirect follower is disabled (
CheckRedirect: http.ErrUseLastResponse). The switch handles 301, 302, 307, 308 directly by reading theLocationheader and re-issuing against the new URL, capped atmaxRedirectHops = 5per call. Each hop logsfollowing redirect from=... to=... status=... hop=N. Centralizing the logic means there is only one place to reason about auth-header preservation, hop caps, and cross-host edge cases.ErrGonesentinel (v0.16.10+): Distinct fromErrNotFound. Returned for (a) 410 Gone responses, (b) 3xx responses with an empty/missingLocationheader (observed when GitHub cannot determine the redirect target, body{"url":""}), and (c) redirect chains exceedingmaxRedirectHops. Callers useerrors.Is(err, ErrGone)to treat these as “skip this resource” without failing the job. The staged collector’sisOptionalEndpointSkipchecksErrNotFound | ErrForbidden | ErrGonetogether.Per-item comment endpoints (v0.16.12+):
MessageCollectorhas three per-item methods alongside the repo-wide since-filtered listings:ListCommentsForIssue(owner, repo, issueNumber),ListCommentsForPR(owner, repo, prNumber),ListReviewCommentsForPR(owner, repo, prNumber). GitHub implementations target/repos/{o}/{r}/issues/{n}/comments(tagged as IssueRef or PRRef by the caller’s context) and/repos/{o}/{r}/pulls/{n}/comments. GitLab implementations target/projects/:id/issues/:iid/notes,/projects/:id/merge_requests/:iid/notes, and/projects/:id/merge_requests/:iid/discussions(filtered to notes carrying aposition). These power gap fill and open-item refresh, which need comments on historical or prior-cycle-missed items that would otherwise fall outside any repo-wide since window.
Key pool (KeyPool)
Manages multiple API tokens with round-robin rotation for maximum throughput.
Round-robin rotation: Every key’s rate limit is fully utilized before the pool waits.
Configurable buffer: Stops using a key when
remainingdrops tobuffer(default 15), preventing 403s from concurrent workers that checked out a key before the count was updated.Automatic refill: Keys are refilled to 5000 when the rate-limit window resets.
Resource-aware: Only core API responses update the key’s rate-limit counter. Search and GraphQL responses (which have separate buckets) are ignored to prevent premature key rotation.
Pagination
Both GitHub and GitLab use 100-item pages. The pagination engine is shared, with platform-specific next-page resolution:
Platform |
Primary method |
Fallback |
|---|---|---|
GitHub |
|
– |
GitLab |
|
|
The pagination functions (PaginateGitHub, PaginateGitLab) are generic and work with any JSON-decodable type.
URL parsing (RepoURL)
Parses repository URLs and identifies the platform:
https://github.com/owner/repo-> GitHub, owner=”owner”, repo=”repo”https://gitlab.com/group/subgroup/project-> GitLab, owner=”group/subgroup”, repo=”project”Self-hosted instances detected by hostname hints or “gitlab” substring in hostname.
The APIURL() method returns the correct API base URL, including GitHub Enterprise (/api/v3) and GitLab (/api/v4).
Adding a new platform
To add support for a new forge (e.g., Gitea):
Create
internal/platform/gitea/withtypes.go(raw API types) andclient.go.Implement
platform.Client– all 7 sub-interfaces.Add the platform to
model.Platformconstants.Add URL detection in
repourl.go’sdetectPlatform().Wire into
cmd/aveloxis/main.goclient creation.
The HTTPClient, KeyPool, and pagination engine are reusable across all platforms.
Design notes
GitLab API differences: GitLab lacks bulk endpoints for notes (comments) and requires iterating parent entities. The GitLab client iterates issues/MRs and fetches their notes individually. This is slower but unavoidable given the API design.
GitHub events endpoint: GitHub’s
/repos/{owner}/{repo}/issues/eventsreturns events for both issues and PRs. The GitHub client fetches this once via a shared helper and filters by type forListIssueEventsandListPREvents.GitLab review comments: GitLab uses “discussions” with positioned notes instead of GitHub’s explicit review comments. The
ListReviewCommentsmethod maps positioned discussion notes to theReviewCommentmodel.
GitHub vs GitLab data gaps
All platform.Client interface methods are implemented for both platforms. The following data discrepancies exist due to GitLab API limitations:
Data |
GitHub |
GitLab |
Impact |
|---|---|---|---|
Community profile files |
GraphQL file detection (CHANGELOG, CONTRIBUTING, CODE_OF_CONDUCT, SECURITY) |
Not yet implemented (closable via |
|
Watcher count |
|
No public API |
|
Clone stats |
|
Admin-only API |
|
GraphQL node IDs |
Available on all entities |
Not applicable (uses numeric IDs) |
|
Contributor identity URLs |
10+ per-user URL fields (followers, gists, starred, etc.) |
Not available |
|
Contributor type |
|
Not distinguished |
|
Contributor breadth |
|
No equivalent |
|