Database Package (internal/db)

The db package is the PostgreSQL data access layer for Aveloxis. It provides upsert-based persistence with deadlock retry, batch operations for throughput, and helper subsystems for text sanitization, contributor resolution, and queue management.


Package layout

File

Responsibility

store.go

Store interface and CollectionState type

postgres.go

Core PostgresStore implementation: repos, issues, PRs, events, messages, releases, commits, collection status

queue.go

Postgres-backed priority queue (SKIP LOCKED)

staging.go

JSONB staging writer and batch processor

contributors.go

ContributorResolver with in-memory cache

commit_resolver_store.go

Commit-to-contributor resolution queries

affiliations.go

Email domain to organization affiliation resolver

aggregates.go

Facade post-processing (annual/monthly/weekly aggregates)

analysis_store.go

Dependency, libyear, code complexity, SBOM storage

breadth_store.go

Contributor breadth (cross-repo activity) storage

web_store.go

OAuth users, user groups, org tracking

matviews.go

Materialized view creation and refresh

migrate.go

Schema migrations, timestamp cleanup, tool_version backfill, schema version tracking

sanitize.go

Text sanitization (null bytes, invalid UTF-8, control chars)

github_uuid.go

Deterministic UUID generation from platform user IDs

keys.go

API key loading with Augur fallback

import.go

Augur repo import helper

version.go

ToolVersion constant


Deadlock retry

All write operations that can contend use withRetry(), which catches PostgreSQL deadlock errors (SQLSTATE 40P01) and retries with exponential backoff plus jitter, up to 10 attempts. This mirrors Augur’s DatabaseSession retry logic.

func (s *PostgresStore) withRetry(ctx context.Context, fn func(ctx context.Context) error) error

Methods that use withRetry: UpsertRepo, UpsertIssue, UpsertPullRequest, UpsertContributorBatch, UpsertMessageBatch, UpsertReviewCommentBatch, EnqueueRepo, CompleteJob, and all label/assignee/reviewer/event/commit upserts.


Batch operations

Several operations support batching via pgx.Batch to reduce network round-trips:

Method

Use case

UpsertContributorBatch

Deduplicates contributors in-memory, then upserts in a single transaction with savepoints for race safety

UpsertMessageBatch

Upserts messages + issue/PR refs in one transaction

UpsertReviewCommentBatch

Upserts review comments + messages in one transaction

UpsertIssueLabels

Batch label upsert via pgx.Batch

UpsertPRLabels

Batch label upsert via pgx.Batch

UpsertPRAssignees

Batch via pgx.Batch

UpsertPRReviewers

Batch via pgx.Batch

InsertRepoDependencyBatch

Batch dependency insert via pgx.Batch

InsertRepoLibyearBatch

Batch libyear insert via pgx.Batch

InsertRepoLaborBatch

Batch code complexity insert via pgx.Batch

InsertContributorRepoBatch

Batch breadth events via pgx.Batch

StagingWriter.Stage/Flush

Buffers staging inserts, flushes every 500 rows

The single-row variants (InsertRepoDependency, InsertRepoLibyear, InsertRepoLabor, InsertContributorRepo) remain available for callers that process items one at a time.


Text sanitization

SanitizeText() cleans strings for PostgreSQL TEXT columns by removing null bytes, replacing invalid UTF-8 with the Unicode replacement character, and stripping control characters (except \n, \r, \t). It includes a fast-path check that returns the original string when no cleaning is needed.

NullTime() converts Go’s zero time.Time to nil to prevent year-0001 garbage timestamps in PostgreSQL.

Both are called automatically by the upsert methods on text fields and timestamp fields.


Queue system

The collection queue (aveloxis_ops.collection_queue) is a PostgreSQL-backed priority queue using FOR UPDATE SKIP LOCKED for atomic job claiming:

  • EnqueueRepo – adds/updates a repo in the queue

  • DequeueNext – atomically claims the highest-priority due job

  • CompleteJob – marks done and re-queues with a future due time

  • PrioritizeRepo – pushes a repo to priority 0 (immediate)

  • RecoverStaleLocks – resets jobs locked longer than a timeout (crash recovery)

  • ListQueue / QueueStats – observability


Group ownership verification

Web store operations that modify user groups use verifyGroupOwnership() / verifyGroupOwned() to check that a group belongs to the requesting user before proceeding. This is a single-query check that returns an error if the group is not found or not owned by the user.