Database Package (internal/db)
The db package is the PostgreSQL data access layer for Aveloxis. It provides upsert-based persistence with deadlock retry, batch operations for throughput, and helper subsystems for text sanitization, contributor resolution, and queue management.
Package layout
File |
Responsibility |
|---|---|
|
|
|
Core |
|
Postgres-backed priority queue ( |
|
JSONB staging writer and batch processor |
|
|
|
Commit-to-contributor resolution queries |
|
Email domain to organization affiliation resolver |
|
Facade post-processing (annual/monthly/weekly aggregates) |
|
Dependency, libyear, code complexity, SBOM storage |
|
Contributor breadth (cross-repo activity) storage |
|
OAuth users, user groups, org tracking |
|
Materialized view creation and refresh |
|
Schema migrations, timestamp cleanup, tool_version backfill, schema version tracking |
|
Text sanitization (null bytes, invalid UTF-8, control chars) |
|
Deterministic UUID generation from platform user IDs |
|
API key loading with Augur fallback |
|
Augur repo import helper |
|
|
Deadlock retry
All write operations that can contend use withRetry(), which catches PostgreSQL deadlock errors (SQLSTATE 40P01) and retries with exponential backoff plus jitter, up to 10 attempts. This mirrors Augur’s DatabaseSession retry logic.
func (s *PostgresStore) withRetry(ctx context.Context, fn func(ctx context.Context) error) error
Methods that use withRetry: UpsertRepo, UpsertIssue, UpsertPullRequest, UpsertContributorBatch, UpsertMessageBatch, UpsertReviewCommentBatch, EnqueueRepo, CompleteJob, and all label/assignee/reviewer/event/commit upserts.
Batch operations
Several operations support batching via pgx.Batch to reduce network round-trips:
Method |
Use case |
|---|---|
|
Deduplicates contributors in-memory, then upserts in a single transaction with savepoints for race safety |
|
Upserts messages + issue/PR refs in one transaction |
|
Upserts review comments + messages in one transaction |
|
Batch label upsert via |
|
Batch label upsert via |
|
Batch via |
|
Batch via |
|
Batch dependency insert via |
|
Batch libyear insert via |
|
Batch code complexity insert via |
|
Batch breadth events via |
|
Buffers staging inserts, flushes every 500 rows |
The single-row variants (InsertRepoDependency, InsertRepoLibyear, InsertRepoLabor, InsertContributorRepo) remain available for callers that process items one at a time.
Text sanitization
SanitizeText() cleans strings for PostgreSQL TEXT columns by removing null bytes, replacing invalid UTF-8 with the Unicode replacement character, and stripping control characters (except \n, \r, \t). It includes a fast-path check that returns the original string when no cleaning is needed.
NullTime() converts Go’s zero time.Time to nil to prevent year-0001 garbage timestamps in PostgreSQL.
Both are called automatically by the upsert methods on text fields and timestamp fields.
Queue system
The collection queue (aveloxis_ops.collection_queue) is a PostgreSQL-backed priority queue using FOR UPDATE SKIP LOCKED for atomic job claiming:
EnqueueRepo– adds/updates a repo in the queueDequeueNext– atomically claims the highest-priority due jobCompleteJob– marks done and re-queues with a future due timePrioritizeRepo– pushes a repo to priority 0 (immediate)RecoverStaleLocks– resets jobs locked longer than a timeout (crash recovery)ListQueue/QueueStats– observability
Group ownership verification
Web store operations that modify user groups use verifyGroupOwnership() / verifyGroupOwned() to check that a group belongs to the requesting user before proceeding. This is a single-query check that returns an error if the group is not found or not owned by the user.