# Aveloxis Schema Documentation *Reference documentation for the Aveloxis database schema -- the Go-based open source community health metrics collection pipeline.* --- ## Overview Aveloxis uses two PostgreSQL schemas to separate collected data from operational state: - **`aveloxis_data`** -- Collected open source community health data. Contains tables for repositories, contributors, issues, pull requests, commits, releases, messages, dependency information, and aggregated data mart views. These tables hold the output of collection workers that talk to GitHub, GitLab, and local git clones. - **`aveloxis_ops`** -- Operational and orchestration tables. Contains the collection queue, staging area, API credentials, user accounts, worker state, and configuration. These tables drive the collection pipeline itself. Both schemas maintain full parity with Augur's `augur_data` and `augur_operations` schemas. All `CREATE TABLE` statements use `IF NOT EXISTS` and inserts use `ON CONFLICT DO NOTHING` for idempotent migrations. --- ## Metadata Columns Most tables in `aveloxis_data` include four provenance columns. They are documented once here and referenced as **"Standard metadata columns"** in the per-table documentation below. | Column | Type | Default | Description | |--------|------|---------|-------------| | `tool_source` | TEXT | `'aveloxis'` | Identifies which component created the row. Values: `'aveloxis'` (API collection), `'aveloxis-facade'` (git log parsing), `'aveloxis-commit-resolver'` (commit author resolution). Used for data provenance and debugging. | | `tool_version` | TEXT | `''` | Version of the tool that created the row. Currently empty; reserved for future use. | | `data_source` | TEXT | `''` | Where the raw data came from. Values: `'GitHub API'`, `'GitLab API'`, `'git'`. Distinguishes API-sourced from git-sourced data. | | `data_collection_date` | TIMESTAMPTZ | `NOW()` | Timestamp of when this row was inserted or last updated. Auto-set to `NOW()` on insert. Used to track data freshness and identify stale records. | --- ## aveloxis_data Schema ### Core Tables #### platforms Lookup table for supported forge platforms. Seeded on schema creation with GitHub (1), GitLab (2), and Mailing List (6 — the transport platform for mailing-list-sourced rows; see [Mailing-list ingestion](architecture/mailing-list.md)). | Column | Type | Source | Description | |--------|------|--------|-------------| | `platform_id` | SMALLINT (PK) | Seeded | Primary key. `1` = GitHub, `2` = GitLab, `6` = Mailing List. | | `platform_name` | TEXT NOT NULL UNIQUE | Seeded | Human-readable platform name. | --- #### repo_groups Logical groupings of repositories, typically representing a project, organization, or research cohort. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_group_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `rg_name` | TEXT NOT NULL | User input | Name of the repo group (e.g., "CHAOSS", "Linux Foundation"). | | `rg_description` | TEXT | User input | Free-text description of the group. | | `rg_website` | TEXT | User input | URL for the group's website. | | `rg_recache` | SMALLINT | User input | Flag indicating whether the group should be re-cached. `1` = yes. | | `rg_last_modified` | TIMESTAMPTZ | Auto-generated | Timestamp of last modification. Defaults to `NOW()`. | | `rg_type` | TEXT | User input | Classification of the group type. | | | | | *Standard metadata columns* | --- #### repos Central repository table. Every collected repository has exactly one row here. All entity tables (issues, PRs, commits, etc.) reference `repo_id`. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_id` | BIGSERIAL (PK) | Auto-generated | Primary key, referenced by nearly every other table. | | `repo_group_id` | BIGINT (FK -> repo_groups) | User input | The group this repo belongs to. | | `platform_id` | SMALLINT NOT NULL (FK -> platforms) | User input | `1` for GitHub, `2` for GitLab. | | `repo_git` | TEXT NOT NULL UNIQUE | User input | Git clone URL. Serves as the natural unique key. | | `repo_name` | TEXT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | Short repository name (e.g., `"augur"`). | | `repo_owner` | TEXT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | Owner or namespace (e.g., `"chaoss"`). | | `repo_path` | TEXT | Computed | Local filesystem path to the cloned repo (used by facade). | | `repo_description` | TEXT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}?statistics=true` | Repository description from the forge. | | `primary_language` | TEXT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | Primary programming language. | | `forked_from` | TEXT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | URL of the parent repo if this is a fork. | | `repo_archived` | BOOLEAN | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | Whether the repo is archived on the forge. | | `platform_repo_id` | TEXT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | The forge's numeric ID for this repo, stored as text for cross-platform compatibility. | | `created_at` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | When the repo was created on the forge. | | `updated_at` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | When the repo was last updated on the forge. | | | | | *Standard metadata columns* | **Unique constraint:** `(repo_git)` --- #### repo_groups_list_serve Mailing list registry + per-list collection state. Each row is a list the MailingListWorker collects; the `mlls_*` columns (v0.25.7) carry the claim/checkpoint/lock state. See [Mailing-list ingestion](architecture/mailing-list.md). | Column | Type | Source | Description | |--------|------|--------|-------------| | `rgls_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_group_id` | BIGINT NOT NULL (FK -> repo_groups) | User input | The repo group this mailing list belongs to. | | `rgls_name` | TEXT | User input | Mailing list name. | | `rgls_description` | TEXT | User input | Description of the mailing list. | | `rgls_sponsor` | TEXT | User input | Organization sponsoring the list. | | `rgls_email` | TEXT | User input | List address (e.g. `dev@kafka.apache.org`). | | `mlls_system` | TEXT | Registration | Archive system definition that applies (`apache_ponymail`, `lore_public_inbox`). Non-empty = collectable by the worker. | | `mlls_last_month` | TEXT | Worker | `yyyy-mm` backfill checkpoint (resume point). | | `mlls_scan_complete` | BOOLEAN | Worker | Partial-scan flag; FALSE → re-eligible immediately. | | `mlls_failed_attempts` | INTEGER | Worker | Consecutive failure counter (quadratic backoff; sideline at 10). | | `mlls_last_failed_at` | TIMESTAMPTZ | Worker | Backoff gate input. | | `mlls_last_run` | TIMESTAMPTZ | Worker | Last successful tail-refresh. | | `mlls_locked_at` / `mlls_locked_pid` / `mlls_locked_boot_id` | TIMESTAMPTZ / INTEGER / TEXT | Worker | `(pid, boot_id)` crash-recovery lock; a lock older than 2h is presumed dead. | | | | | *Standard metadata columns* | **Unique constraint:** `(repo_group_id, rgls_email)` --- #### email_message A mailing-list email as a first-class entity (v0.25.7) — peer to `issues` / `pull_requests` / `pull_request_reviews`. The body lives in `messages` (linked via `email_message_ref`); this table carries the classification (Axis A) and repo association (Axis B). Declared after `issues`/`pull_requests`/`messages` in `schema.sql` because it FK-references all three. See [Mailing-list ingestion](architecture/mailing-list.md). | Column | Type | Source | Description | |--------|------|--------|-------------| | `email_message_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` / `repo_group_id` / `rgls_id` | BIGINT (FKs) | Worker | The repo / group / list this message came from. | | `platform_id` | SMALLINT NOT NULL (FK -> platforms) | Worker | Always `6` (Mailing List). | | `ml_system` | TEXT | Worker | Archive system (`apache_ponymail`, `lore_public_inbox`). | | `message_id_header` | TEXT NOT NULL UNIQUE | Email | RFC-822 Message-ID (idempotency key). | | `list_address` / `list_id_header` | TEXT | Email | List the message came from. | | `subject` / `sender_email` / `sent_at` | TEXT / TEXT / TIMESTAMPTZ | Email | Parsed header fields. | | `in_reply_to` / `references_chain` / `thread_root_id` | TEXT | Email | Threading: In-Reply-To, References, resolved thread root. | | `has_patch` | BOOLEAN | Classifier | Body contains a patch. | | `msg_class` | TEXT | Classifier | Axis A: `issue_event`, `patch_submission`, `review`, `github_mirror`, `commit_notify`, `vote`, `announce`, `result`, `discuss`, `support`, `unclassified`. | | `classification_source` | TEXT | Classifier | Which rule fired (subject_regex / body_url / sender / list_id / list_address). | | `is_mirror` / `mirrors_url` | BOOLEAN / TEXT | Classifier | Mirror-list mail that echoes GitHub activity. | | `signaled_repo_url` | TEXT | Classifier | Axis B: canonical repo URL the message signals (captured even if not in catalog). | | `signaled_repo_id` | BIGINT (FK -> repos, ON DELETE SET NULL) | Resolver | Resolved repo FK (NULL until the URL matches a loaded repo). | | `linked_issue_id` / `linked_pull_request_id` | BIGINT (FKs) | Router | Routed target when the message is an issue/PR event. | | `linked_external_key` / `linked_commit_hash` | TEXT | Router | Jira/Bugzilla key / commit the message references. | | | | | *Standard metadata columns* (`tool_source` defaults to `Aveloxis Mailing List Collector`) | --- #### email_message_ref Bridge from `email_message` to the shared `messages` table (the body), mirroring `issue_message_ref` / `pull_request_message_ref`. | Column | Type | Source | Description | |--------|------|--------|-------------| | `email_msg_ref_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `email_message_id` | BIGINT NOT NULL (FK -> email_message) | Worker | The email entity. | | `msg_id` | BIGINT NOT NULL (FK -> messages) | Worker | The body row in `messages`. | | `repo_group_id` | BIGINT (FK -> repo_groups) | Worker | Group context. | | | | | *Standard metadata columns* | **Unique constraint:** `(email_message_id, msg_id)` --- ### Contributors #### contributors Platform-agnostic contributor identity. Each unique person across GitHub and GitLab maps to one row. Contains both canonical fields and platform-specific profile data. | Column | Type | Source | Description | |--------|------|--------|-------------| | `cntrb_id` | UUID (PK) | Auto-generated (`gen_random_uuid()`) | Primary key. Referenced by issues, PRs, commits, etc. | | `cntrb_login` | TEXT NOT NULL | GitHub REST: `/users/{login}`, GitLab: `/projects/{id}/members/all` | Platform login/username. | | `cntrb_email` | TEXT | GitHub REST: `/users/{login}`, GitLab: `/projects/{id}/members/all` | Email address (may be empty if private). | | `cntrb_full_name` | TEXT | GitHub REST: `/users/{login}`, GitLab: `/projects/{id}/members/all` | Full display name. | | `cntrb_company` | TEXT | GitHub REST: `/users/{login}` | Company affiliation from profile. | | `cntrb_location` | TEXT | GitHub REST: `/users/{login}` | Location string from profile. | | `cntrb_canonical` | TEXT | Computed / `aveloxis-commit-resolver` | Canonical email used for identity merging. | | `cntrb_type` | TEXT | GitHub REST: `/users/{login}` | Account type (e.g., `"User"`, `"Bot"`, `"Organization"`). | | `cntrb_fake` | SMALLINT | Computed | Flag for synthetic/placeholder contributors. `1` = fake. | | `cntrb_deleted` | SMALLINT | Computed | Soft-delete flag. `1` = deleted. | | `cntrb_long` | NUMERIC(11,8) | Computed | Longitude from geocoded location. | | `cntrb_lat` | NUMERIC(10,8) | Computed | Latitude from geocoded location. | | `cntrb_country_code` | CHAR(3) | Computed | ISO 3166 country code from geocoded location. | | `cntrb_state` | TEXT | Computed | State/province from geocoded location. | | `cntrb_city` | TEXT | Computed | City from geocoded location. | | `cntrb_last_used` | TIMESTAMPTZ | Computed | Timestamp of most recent activity by this contributor. | | `gh_user_id` | BIGINT | GitHub REST: `/users/{login}` | GitHub's numeric user ID. | | `gh_login` | TEXT | GitHub REST: `/users/{login}` | GitHub login name. | | `gh_url` | TEXT | GitHub REST: `/users/{login}` | GitHub API URL for this user. | | `gh_html_url` | TEXT | GitHub REST: `/users/{login}` | GitHub profile URL. | | `gh_node_id` | TEXT | GitHub REST: `/users/{login}` | GitHub GraphQL node ID. | | `gh_avatar_url` | TEXT | GitHub REST: `/users/{login}` | GitHub avatar image URL. | | `gh_gravatar_id` | TEXT | GitHub REST: `/users/{login}` | Gravatar ID (legacy). | | `gh_followers_url` | TEXT | GitHub REST: `/users/{login}` | API URL for followers list. | | `gh_following_url` | TEXT | GitHub REST: `/users/{login}` | API URL for following list. | | `gh_gists_url` | TEXT | GitHub REST: `/users/{login}` | API URL for gists. | | `gh_starred_url` | TEXT | GitHub REST: `/users/{login}` | API URL for starred repos. | | `gh_subscriptions_url` | TEXT | GitHub REST: `/users/{login}` | API URL for subscriptions. | | `gh_organizations_url` | TEXT | GitHub REST: `/users/{login}` | API URL for orgs membership. | | `gh_repos_url` | TEXT | GitHub REST: `/users/{login}` | API URL for user's repos. | | `gh_events_url` | TEXT | GitHub REST: `/users/{login}` | API URL for user's events. | | `gh_received_events_url` | TEXT | GitHub REST: `/users/{login}` | API URL for received events. | | `gh_type` | TEXT | GitHub REST: `/users/{login}` | GitHub account type string. | | `gh_site_admin` | TEXT | GitHub REST: `/users/{login}` | Whether user is a GitHub site admin. | | `gl_web_url` | TEXT | GitLab API v4: `/projects/{id}/members/all` | GitLab profile web URL. | | `gl_avatar_url` | TEXT | GitLab API v4: `/projects/{id}/members/all` | GitLab avatar URL. | | `gl_state` | TEXT | GitLab API v4: `/projects/{id}/members/all` | GitLab account state (e.g., `"active"`). | | `gl_username` | TEXT | GitLab API v4: `/projects/{id}/members/all` | GitLab username. | | `gl_full_name` | TEXT | GitLab API v4: `/projects/{id}/members/all` | GitLab display name. | | `gl_id` | BIGINT | GitLab API v4: `/projects/{id}/members/all` | GitLab numeric user ID. | | `cntrb_created_at` | TIMESTAMPTZ | GitHub REST: `/users/{login}`, GitLab API v4 | When the account was created on the forge. | | | | | *Standard metadata columns* | **Unique index:** `(cntrb_login) WHERE cntrb_login != ''` --- #### contributor_identities Maps a contributor to per-platform identities. One `cntrb_id` may have identities on both GitHub and GitLab. | Column | Type | Source | Description | |--------|------|--------|-------------| | `identity_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `cntrb_id` | UUID NOT NULL (FK -> contributors) | Computed | The canonical contributor this identity belongs to. | | `platform_id` | SMALLINT NOT NULL (FK -> platforms) | Computed | Which platform this identity is on. | | `platform_user_id` | BIGINT NOT NULL | GitHub REST: `/users/{login}`, GitLab API v4: `/projects/{id}/members/all` | Numeric user ID on the platform. | | `login` | TEXT NOT NULL | GitHub REST: `/users/{login}`, GitLab API v4 | Username on the platform. | | `name` | TEXT | GitHub REST: `/users/{login}`, GitLab API v4 | Display name. | | `email` | TEXT | GitHub REST: `/users/{login}`, GitLab API v4 | Email address. | | `avatar_url` | TEXT | GitHub REST: `/users/{login}`, GitLab API v4 | Avatar image URL. | | `profile_url` | TEXT | GitHub REST: `/users/{login}`, GitLab API v4 | Profile page URL. | | `node_id` | TEXT | GitHub REST: `/users/{login}` | GitHub GraphQL node ID (empty for GitLab). | | `user_type` | TEXT | GitHub REST: `/users/{login}`, GitLab API v4 | Account type. Default `'User'`. | | `is_admin` | BOOLEAN | GitHub REST: `/users/{login}`, GitLab API v4 | Whether the user is a site/instance admin. | **Unique constraint:** `(platform_id, platform_user_id)` --- #### contributors_aliases Maps alternate email addresses to a contributor's canonical email. Used by the commit resolver to unify git commit emails with API identities. | Column | Type | Source | Description | |--------|------|--------|-------------| | `cntrb_alias_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `cntrb_id` | UUID NOT NULL (FK -> contributors) | `aveloxis-commit-resolver` | The contributor this alias belongs to. | | `canonical_email` | TEXT NOT NULL | `aveloxis-commit-resolver` | The contributor's canonical email. | | `alias_email` | TEXT NOT NULL UNIQUE | `aveloxis-commit-resolver` | An alternate email that maps to this contributor. | | `cntrb_active` | SMALLINT NOT NULL | `aveloxis-commit-resolver` | Whether this alias is active. `1` = active. | | `cntrb_last_modified` | TIMESTAMPTZ | Auto-generated | Last modification timestamp. | | | | | *Standard metadata columns* | --- #### contributor_affiliations Maps email domains to organizational affiliations. Used to attribute commits and activity to companies/organizations. | Column | Type | Source | Description | |--------|------|--------|-------------| | `ca_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `ca_domain` | TEXT NOT NULL UNIQUE | User input / Augur import | Email domain (e.g., `"redhat.com"`). | | `ca_start_date` | DATE | User input | Date the affiliation began. Default `'1970-01-01'`. | | `ca_last_used` | TIMESTAMPTZ | Computed | When this affiliation was last matched. | | `ca_affiliation` | TEXT | User input | Organization name (e.g., `"Red Hat"`). | | `ca_active` | SMALLINT | User input | Whether this mapping is active. `1` = active. | | | | | *Standard metadata columns* | --- #### contributor_repo Records contributor activity events tied to specific repositories. Tracks which contributors interact with which repos and how. | Column | Type | Source | Description | |--------|------|--------|-------------| | `cntrb_repo_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `cntrb_id` | UUID NOT NULL (FK -> contributors) | GitHub REST: `/issues/events`, `/repos/{o}/{r}/contributors` | The contributor involved. | | `repo_git` | TEXT NOT NULL | GitHub REST: `/repos/{o}/{r}` | Git URL of the repo. | | `repo_name` | TEXT NOT NULL | GitHub REST: `/repos/{o}/{r}` | Repository name. | | `gh_repo_id` | BIGINT NOT NULL | GitHub REST: `/repos/{o}/{r}` | GitHub's numeric repository ID. | | `cntrb_category` | TEXT | Computed | Category of the contribution (e.g., event type). | | `event_id` | BIGINT | GitHub REST: `/issues/events` | Platform event ID that triggered this record. | | `created_at` | TIMESTAMPTZ | GitHub REST: `/issues/events` | Timestamp of the event. | | | | | *Standard metadata columns* | **Unique constraint:** `(event_id, tool_version)` --- #### contributors_old Legacy backup table for contributor data. Holds a snapshot of contributor records before a migration or schema change. Structure mirrors the `contributors` table. | Column | Type | Source | Description | |--------|------|--------|-------------| | `cntrb_id` | UUID (PK) | Augur import | Primary key (not auto-generated; copied from contributors). | | `cntrb_login` | TEXT | Augur import | Platform login. | | `cntrb_email` | TEXT | Augur import | Email address. | | `cntrb_full_name` | TEXT | Augur import | Full display name. | | `cntrb_company` | TEXT | Augur import | Company affiliation. | | `cntrb_created_at` | TIMESTAMPTZ | Augur import | Account creation timestamp. | | `cntrb_type` | TEXT | Augur import | Account type. | | `cntrb_fake` | SMALLINT | Augur import | Fake flag. | | `cntrb_deleted` | SMALLINT | Augur import | Soft-delete flag. | | `cntrb_long` | NUMERIC(11,8) | Augur import | Longitude. | | `cntrb_lat` | NUMERIC(10,8) | Augur import | Latitude. | | `cntrb_country_code` | CHAR(3) | Augur import | ISO country code. | | `cntrb_state` | TEXT | Augur import | State/province. | | `cntrb_city` | TEXT | Augur import | City. | | `cntrb_location` | TEXT | Augur import | Location string. | | `cntrb_canonical` | TEXT | Augur import | Canonical email. | | `cntrb_last_used` | TIMESTAMPTZ | Augur import | Last activity timestamp. | | `gh_user_id` | BIGINT | Augur import | GitHub user ID. | | `gh_login` | TEXT | Augur import | GitHub login. | | `gh_url` | TEXT | Augur import | GitHub API URL. | | `gh_html_url` | TEXT | Augur import | GitHub profile URL. | | `gh_node_id` | TEXT | Augur import | GitHub node ID. | | `gh_avatar_url` | TEXT | Augur import | GitHub avatar URL. | | `gh_gravatar_id` | TEXT | Augur import | Gravatar ID. | | `gh_followers_url` | TEXT | Augur import | Followers API URL. | | `gh_following_url` | TEXT | Augur import | Following API URL. | | `gh_gists_url` | TEXT | Augur import | Gists API URL. | | `gh_starred_url` | TEXT | Augur import | Starred API URL. | | `gh_subscriptions_url` | TEXT | Augur import | Subscriptions API URL. | | `gh_organizations_url` | TEXT | Augur import | Organizations API URL. | | `gh_repos_url` | TEXT | Augur import | Repos API URL. | | `gh_events_url` | TEXT | Augur import | Events API URL. | | `gh_received_events_url` | TEXT | Augur import | Received events API URL. | | `gh_type` | TEXT | Augur import | GitHub account type. | | `gh_site_admin` | TEXT | Augur import | Site admin flag. | | `gl_web_url` | TEXT | Augur import | GitLab web URL. | | `gl_avatar_url` | TEXT | Augur import | GitLab avatar URL. | | `gl_state` | TEXT | Augur import | GitLab account state. | | `gl_username` | TEXT | Augur import | GitLab username. | | `gl_full_name` | TEXT | Augur import | GitLab full name. | | `gl_id` | BIGINT | Augur import | GitLab user ID. | | | | | *Standard metadata columns* | --- #### unresolved_commit_emails Holds email addresses found in git commits that could not be resolved to a known contributor. The commit resolver processes these to attempt matching. | Column | Type | Source | Description | |--------|------|--------|-------------| | `email_unresolved_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `email` | TEXT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | The unresolved email from a commit. | | `name` | TEXT | Git: `git log --numstat` (default branch only, per v0.16.4) | The name associated with the email in the commit. | | | | | *Standard metadata columns* | --- ### Issues #### issues Issue tracker records from GitHub Issues or GitLab Issues. Each row represents one issue in one repository. | Column | Type | Source | Description | |--------|------|--------|-------------| | `issue_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository this issue belongs to. | | `platform_issue_id` | BIGINT NOT NULL | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | The platform's numeric ID for this issue. | | `issue_number` | INT NOT NULL | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | Human-readable issue number (e.g., `#42`). | | `node_id` | TEXT | GitHub REST: `/repos/{o}/{r}/issues` | GitHub GraphQL node ID (empty for GitLab). | | `issue_title` | TEXT | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | Issue title. | | `issue_body` | TEXT | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | Issue body/description text. | | `issue_state` | TEXT | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | Current state: `'open'`, `'closed'`. | | `issue_url` | TEXT | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | API URL for the issue. | | `html_url` | TEXT | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | Web URL for the issue. | | `reporter_id` | UUID (FK -> contributors) | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | Contributor who opened the issue. | | `closed_by_id` | UUID (FK -> contributors) | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | Contributor who closed the issue (null if open). | | `pull_request` | BIGINT | GitHub REST: `/repos/{o}/{r}/issues` | Non-null if this issue is actually a PR (GitHub conflates the two). | | `pull_request_id` | BIGINT | Computed | Foreign key to `pull_requests` if linked. | | `created_at` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | When the issue was created. | | `updated_at` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | When the issue was last updated. | | `closed_at` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | When the issue was closed (null if open). | | `due_on` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | Due date from milestone. | | `comment_count` | INT | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | Number of comments on the issue. | | `external_key` | TEXT | `backfill-issue-external-keys` | Bracketed `[KEY-N]` Jira/Bugzilla key from the title (Apache Jira → GitHub imports). Lets mailing-list `issue_event` mail bridge to the imported issue. Partial unique `(repo_id, external_key) WHERE external_key <> ''`. v0.25.7. | | | | | *Standard metadata columns* | **Unique constraint:** `(repo_id, platform_issue_id)` --- #### issue_labels Labels attached to issues. | Column | Type | Source | Description | |--------|------|--------|-------------| | `issue_label_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `issue_id` | BIGINT NOT NULL (FK -> issues) | Computed | The issue this label is on. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository for denormalized querying. | | `platform_label_id` | BIGINT | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | The platform's numeric ID for this label. | | `node_id` | TEXT | GitHub REST: `/repos/{o}/{r}/issues` | GitHub GraphQL node ID. | | `label_text` | TEXT NOT NULL | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | Label name text (e.g., `"bug"`, `"enhancement"`). | | `label_description` | TEXT | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | Label description. | | `label_color` | TEXT | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | Hex color code (e.g., `"fc2929"`). | | | | | *Standard metadata columns* | **Unique constraint:** `(issue_id, label_text)` --- #### issue_assignees Users assigned to an issue. | Column | Type | Source | Description | |--------|------|--------|-------------| | `issue_assignee_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `issue_id` | BIGINT NOT NULL (FK -> issues) | Computed | The issue. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository for denormalized querying. | | `cntrb_id` | UUID (FK -> contributors) | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | The assigned contributor. | | `platform_assignee_id` | BIGINT | GitHub REST: `/repos/{o}/{r}/issues`, GitLab: `/projects/{id}/issues` | Platform's numeric ID for the assignee. | | `platform_node_id` | TEXT | GitHub REST: `/repos/{o}/{r}/issues` | GitHub GraphQL node ID. | | | | | *Standard metadata columns* | **Unique constraint:** `(issue_id, platform_assignee_id)` --- #### issue_events Timeline events on issues (labeled, closed, assigned, referenced, etc.). | Column | Type | Source | Description | |--------|------|--------|-------------| | `issue_event_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `issue_id` | BIGINT NOT NULL (FK -> issues) | Computed | The issue this event occurred on. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository for denormalized querying. | | `cntrb_id` | UUID (FK -> contributors) | GitHub REST: `/issues/events`, GitLab: `/projects/{id}/issues` | The contributor who triggered the event. | | `platform_id` | SMALLINT NOT NULL (FK -> platforms) | Computed | Platform where the event occurred. | | `platform_event_id` | BIGINT NOT NULL | GitHub REST: `/issues/events`, GitLab: `/projects/{id}/issues` | Platform's numeric event ID. | | `node_id` | TEXT | GitHub REST: `/issues/events` | GitHub GraphQL node ID. | | `action` | TEXT NOT NULL | GitHub REST: `/issues/events`, GitLab: `/projects/{id}/issues` | Event type (e.g., `"closed"`, `"labeled"`, `"assigned"`). | | `action_commit_hash` | TEXT | GitHub REST: `/issues/events` | Commit SHA if the event references a commit. | | `created_at` | TIMESTAMPTZ | GitHub REST: `/issues/events`, GitLab: `/projects/{id}/issues` | When the event occurred. | | | | | *Standard metadata columns* | **Unique constraint:** `(repo_id, platform_event_id)` --- #### issue_message_ref Join table linking issues to messages (comments). Each row maps one comment to one issue. | Column | Type | Source | Description | |--------|------|--------|-------------| | `issue_msg_ref_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `issue_id` | BIGINT NOT NULL (FK -> issues) | Computed | The issue. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository for denormalized querying. | | `msg_id` | BIGINT NOT NULL (FK -> messages) | Computed | The message/comment. | | `platform_src_id` | BIGINT | GitHub (REST `/issues/comments` in rest mode; inline via GraphQL `Issue.comments` in graphql mode — see `messages` table for the per-mode source breakdown) ; GitLab `/projects/{id}/issues/{iid}/notes` | Platform's comment ID. | | `platform_node_id` | TEXT | GitHub (REST `/issues/comments` or GraphQL inline) | GitHub GraphQL node ID. | | | | | *Standard metadata columns* | **Unique constraint:** `(issue_id, msg_id)` --- ### Pull Requests #### pull_requests Pull requests (GitHub) or merge requests (GitLab). Each row represents one PR/MR in one repository. | Column | Type | Source | Description | |--------|------|--------|-------------| | `pull_request_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository this PR belongs to. | | `platform_pr_id` | BIGINT NOT NULL | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Platform's numeric PR/MR ID. | | `node_id` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | GitHub GraphQL node ID. | | `pr_number` | INT NOT NULL | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Human-readable PR number. | | `pr_url` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | API URL. | | `pr_html_url` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Web URL. | | `pr_diff_url` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | URL to the diff. | | `pr_title` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | PR title. | | `pr_body` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | PR body/description. | | `pr_state` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | State: `'open'`, `'closed'`, `'merged'`. | | `pr_locked` | BOOLEAN | GitHub REST: `/repos/{o}/{r}/pulls` | Whether the PR conversation is locked. | | `created_at` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Creation timestamp. | | `updated_at` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Last update timestamp. | | `closed_at` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Close timestamp (null if open). | | `merged_at` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Merge timestamp (null if not merged). | | `merge_commit_sha` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | SHA of the merge commit. | | `author_id` | UUID (FK -> contributors) | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | PR author. | | `author_association` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | Author's association to the repo (e.g., `"MEMBER"`, `"CONTRIBUTOR"`). | | `meta_head_id` | BIGINT | Computed | FK to `pull_request_meta` for the head branch. | | `meta_base_id` | BIGINT | Computed | FK to `pull_request_meta` for the base branch. | | | | | *Standard metadata columns* | **Unique constraint:** `(repo_id, platform_pr_id)` --- #### pull_request_labels Labels attached to pull requests. | Column | Type | Source | Description | |--------|------|--------|-------------| | `pr_label_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pull_request_id` | BIGINT NOT NULL (FK -> pull_requests) | Computed | The PR this label is on. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository for denormalized querying. | | `platform_label_id` | BIGINT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Platform's label ID. | | `node_id` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | GitHub GraphQL node ID. | | `label_name` | TEXT NOT NULL | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Label name text. | | `label_description` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Label description. | | `label_color` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Hex color code. | | `is_default` | BOOLEAN | GitHub REST: `/repos/{o}/{r}/pulls` | Whether this is a default label. | | | | | *Standard metadata columns* | **Unique constraint:** `(pull_request_id, label_name)` --- #### pull_request_assignees Users assigned to a pull request. | Column | Type | Source | Description | |--------|------|--------|-------------| | `pr_assignee_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pull_request_id` | BIGINT NOT NULL (FK -> pull_requests) | Computed | The PR. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `cntrb_id` | UUID (FK -> contributors) | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | The assigned contributor. | | `platform_assignee_id` | BIGINT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Platform's assignee ID. | | | | | *Standard metadata columns* | **Unique constraint:** `(pull_request_id, platform_assignee_id)` --- #### pull_request_reviewers Requested reviewers on a pull request. | Column | Type | Source | Description | |--------|------|--------|-------------| | `pr_reviewer_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pull_request_id` | BIGINT NOT NULL (FK -> pull_requests) | Computed | The PR. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `cntrb_id` | UUID (FK -> contributors) | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/merge_requests/{n}/approvals` | The reviewer. | | `platform_reviewer_id` | BIGINT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/merge_requests/{n}/approvals` | Platform's reviewer ID. | | | | | *Standard metadata columns* | **Unique constraint:** `(pull_request_id, platform_reviewer_id)` --- #### pull_request_reviews Submitted reviews on a pull request (approved, changes requested, commented). | Column | Type | Source | Description | |--------|------|--------|-------------| | `pr_review_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pull_request_id` | BIGINT NOT NULL (FK -> pull_requests) | Computed | The PR being reviewed. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `cntrb_id` | UUID (FK -> contributors) | GitHub REST: `/pulls/{n}/reviews`, GitLab: `/merge_requests/{n}/notes` | The reviewer. | | `platform_id` | SMALLINT NOT NULL (FK -> platforms) | Computed | Platform. | | `platform_review_id` | BIGINT NOT NULL | GitHub REST: `/pulls/{n}/reviews`, GitLab: `/merge_requests/{n}/notes` | Platform's review ID. | | `node_id` | TEXT | GitHub REST: `/pulls/{n}/reviews` | GitHub GraphQL node ID. | | `review_state` | TEXT | GitHub REST: `/pulls/{n}/reviews`, GitLab: `/merge_requests/{n}/notes` | State: `"APPROVED"`, `"CHANGES_REQUESTED"`, `"COMMENTED"`, `"DISMISSED"`. | | `review_body` | TEXT | GitHub REST: `/pulls/{n}/reviews`, GitLab: `/merge_requests/{n}/notes` | Review body text. | | `submitted_at` | TIMESTAMPTZ | GitHub REST: `/pulls/{n}/reviews`, GitLab: `/merge_requests/{n}/notes` | When the review was submitted. | | `author_association` | TEXT | GitHub REST: `/pulls/{n}/reviews` | Reviewer's association to the repo. | | `commit_id` | TEXT | GitHub REST: `/pulls/{n}/reviews` | SHA of the commit the review was made against. | | `html_url` | TEXT | GitHub REST: `/pulls/{n}/reviews` | Web URL of the review. | | | | | *Standard metadata columns* | **Unique constraint:** `(pull_request_id, platform_review_id)` --- #### pull_request_meta Branch metadata for the head and base of a pull request. | Column | Type | Source | Description | |--------|------|--------|-------------| | `pr_meta_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pull_request_id` | BIGINT NOT NULL (FK -> pull_requests) | Computed | The PR. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `cntrb_id` | UUID (FK -> contributors) | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Owner of the head/base repo. | | `head_or_base` | TEXT NOT NULL | Computed | `'head'` or `'base'`. | | `meta_label` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Label string (e.g., `"owner:branch"`). | | `meta_ref` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Branch name. | | `meta_sha` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls`, GitLab: `/projects/{id}/merge_requests` | Commit SHA at the tip of the branch. | | | | | *Standard metadata columns* | **Unique constraint:** `(pull_request_id, head_or_base)` --- #### pull_request_commits Commits included in a pull request. | Column | Type | Source | Description | |--------|------|--------|-------------| | `pr_commit_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pull_request_id` | BIGINT NOT NULL (FK -> pull_requests) | Computed | The PR. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `author_cntrb_id` | UUID (FK -> contributors) | GitHub REST: `/pulls/{n}/commits`, GitLab: `/merge_requests/{n}/commits` | Commit author. | | `pr_cmt_sha` | TEXT NOT NULL | GitHub REST: `/pulls/{n}/commits`, GitLab: `/merge_requests/{n}/commits` | Commit SHA. | | `pr_cmt_node_id` | TEXT | GitHub REST: `/pulls/{n}/commits` | GitHub GraphQL node ID. | | `pr_cmt_message` | TEXT | GitHub REST: `/pulls/{n}/commits`, GitLab: `/merge_requests/{n}/commits` | Commit message. | | `pr_cmt_author_email` | TEXT | GitHub REST: `/pulls/{n}/commits`, GitLab: `/merge_requests/{n}/commits` | Author email from the commit. | | `pr_cmt_timestamp` | TIMESTAMPTZ | GitHub REST: `/pulls/{n}/commits`, GitLab: `/merge_requests/{n}/commits` | Commit timestamp. | | | | | *Standard metadata columns* | **Unique constraint:** `(pull_request_id, pr_cmt_sha)` --- #### pull_request_files Files changed in a pull request. | Column | Type | Source | Description | |--------|------|--------|-------------| | `pr_file_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pull_request_id` | BIGINT NOT NULL (FK -> pull_requests) | Computed | The PR. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `pr_file_path` | TEXT NOT NULL | GitHub REST: `/pulls/{n}/files`, GitLab: `/merge_requests/{n}/diffs` | Path of the changed file. | | `pr_file_additions` | INT | GitHub REST: `/pulls/{n}/files`, GitLab: `/merge_requests/{n}/diffs` | Lines added. | | `pr_file_deletions` | INT | GitHub REST: `/pulls/{n}/files`, GitLab: `/merge_requests/{n}/diffs` | Lines removed. | | | | | *Standard metadata columns* | **Unique constraint:** `(pull_request_id, pr_file_path)` --- #### pull_request_events Timeline events on pull requests (labeled, closed, merged, review_requested, etc.). | Column | Type | Source | Description | |--------|------|--------|-------------| | `pr_event_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pull_request_id` | BIGINT NOT NULL (FK -> pull_requests) | Computed | The PR. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `cntrb_id` | UUID (FK -> contributors) | GitHub REST: `/issues/events`, GitLab: `/projects/{id}/merge_requests` | Contributor who triggered the event. | | `platform_id` | SMALLINT NOT NULL (FK -> platforms) | Computed | Platform. | | `platform_event_id` | BIGINT NOT NULL | GitHub REST: `/issues/events`, GitLab: `/projects/{id}/merge_requests` | Platform's event ID. | | `node_id` | TEXT | GitHub REST: `/issues/events` | GitHub GraphQL node ID. | | `action` | TEXT NOT NULL | GitHub REST: `/issues/events`, GitLab: `/projects/{id}/merge_requests` | Event type (e.g., `"merged"`, `"closed"`, `"review_requested"`). | | `action_commit_hash` | TEXT | GitHub REST: `/issues/events` | Commit SHA if the event references a commit. | | `created_at` | TIMESTAMPTZ | GitHub REST: `/issues/events`, GitLab: `/projects/{id}/merge_requests` | When the event occurred. | | | | | *Standard metadata columns* | **Unique constraint:** `(repo_id, platform_event_id)` --- #### pull_request_repo Fork repository metadata referenced in pull requests. Records source repos for PR head/base branches that point to forks. | Column | Type | Source | Description | |--------|------|--------|-------------| | `pr_repo_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pr_repo_meta_id` | BIGINT | Computed | FK to `pull_request_meta`. | | `pr_repo_head_or_base` | TEXT | Computed | `'head'` or `'base'`. | | `pr_src_repo_id` | BIGINT | GitHub REST: `/repos/{o}/{r}/pulls` | Platform's numeric ID for the source repo. | | `pr_src_node_id` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | GitHub GraphQL node ID of the source repo. | | `pr_repo_name` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | Short name of the fork repo. | | `pr_repo_full_name` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | Full name (e.g., `"user/repo"`). | | `pr_repo_private_bool` | BOOLEAN | GitHub REST: `/repos/{o}/{r}/pulls` | Whether the fork is private. | | `pr_cntrb_id` | UUID (FK -> contributors) | GitHub REST: `/repos/{o}/{r}/pulls` | Owner of the fork repo. | | | | | *Standard metadata columns* | --- #### pull_request_review_message_ref Bridge table linking a pull request **review** to its review-body message in the `messages` table. One row per review whose author submitted body text alongside their Approve / Request Changes / Comment action. Reviews submitted without body text (the common "Approved" with no comment case) do NOT create a row here. This is NOT the inline-diff-comment table — those live in `review_comments` (below) and have their own messages-table row each. The schema retains a number of diff-position columns (`pr_review_msg_diff_hunk`, `pr_review_msg_path`, `pr_review_msg_position`, etc.) from the Augur-era schema, but Aveloxis writes ONLY `pr_review_id`, `repo_id`, `msg_id`, `pr_review_src_id`, and `pr_review_msg_node_id`. The diff-position columns are kept for schema parity / 8Knot compatibility but are not populated. | Column | Type | Source | Description | |--------|------|--------|-------------| | `pr_review_msg_ref_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pr_review_id` | BIGINT NOT NULL (FK -> pull_request_reviews) | Computed | The review whose body is linked. | | `repo_id` | BIGINT (FK -> repos) | Computed | Repository. | | `msg_id` | BIGINT NOT NULL | Computed | FK to messages table; carries the review body text. | | `pr_review_src_id` | BIGINT | GitHub REST: `/pulls/{n}/reviews` (or GraphQL `reviews` connection on PullRequest in graphql mode) | Platform's review ID. | | `pr_review_msg_node_id` | TEXT | GitHub REST: `/pulls/{n}/reviews` | GitHub GraphQL node ID for the review. | | `pr_review_msg_url`, `pr_review_msg_src_id`, `pr_review_msg_diff_hunk`, `pr_review_msg_path`, `pr_review_msg_position`, `pr_review_msg_original_position`, `pr_review_msg_commit_id`, `pr_review_msg_original_commit_id`, `pr_review_msg_updated_at`, `pr_review_msg_html_url`, `pr_url`, `pr_review_msg_author_association`, `pr_review_msg_start_line`, `pr_review_msg_original_start_line`, `pr_review_msg_start_side`, `pr_review_msg_line`, `pr_review_msg_original_line`, `pr_review_msg_side` | (various) | Not populated by Aveloxis | Vestigial Augur-era columns. Use `review_comments` (below) for inline-diff comment data. | | | | | *Standard metadata columns* | --- #### pull_request_teams Teams requested to review a pull request. | Column | Type | Source | Description | |--------|------|--------|-------------| | `pr_team_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pull_request_id` | BIGINT (FK -> pull_requests) | Computed | The PR. | | `pr_src_team_id` | BIGINT | GitHub REST: `/repos/{o}/{r}/pulls` | Platform's team ID. | | `pr_src_team_node` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | GitHub GraphQL node ID. | | `pr_src_team_url` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | API URL for the team. | | `pr_team_name` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | Team name. | | `pr_team_slug` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | Team slug (URL-safe name). | | `pr_team_description` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | Team description. | | `pr_team_privacy` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | Privacy level (e.g., `"closed"`, `"secret"`). | | `pr_team_permission` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | Permission level (e.g., `"push"`, `"admin"`). | | `pr_team_src_members_url` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | API URL for team members. | | `pr_team_src_repositories_url` | TEXT | GitHub REST: `/repos/{o}/{r}/pulls` | API URL for team repos. | | `pr_team_parent_id` | BIGINT | GitHub REST: `/repos/{o}/{r}/pulls` | Parent team ID (for nested teams). | | | | | *Standard metadata columns* | --- #### pull_request_analysis ML-based merge prediction results for pull requests. Schema parity table; not yet populated by Aveloxis workers. | Column | Type | Source | Description | |--------|------|--------|-------------| | `pull_request_analysis_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pull_request_id` | BIGINT (FK -> pull_requests) | Not yet populated | The PR analyzed. | | `merge_probability` | NUMERIC(256,250) | Not yet populated | Predicted probability of merge. | | `mechanism` | TEXT | Not yet populated | ML model or heuristic used. | | | | | *Standard metadata columns* | --- ### Messages and Comments #### messages Unified comment/message table shared by issues and pull requests. Each row is one piece of text. The four bridge tables (`issue_message_ref`, `pull_request_message_ref`, `pull_request_review_message_ref`, `review_comments`) link messages to their semantic origin, enabling cross-type text analysis. The four kinds of text stored here: - **Issue conversation comments** — comments under an issue. Linked via `issue_message_ref`. - **PR conversation comments** — comments under a PR (the IssueComment-shaped nodes GitHub serves under `/issues/{n}/comments` on REST, even for PRs). Linked via `pull_request_message_ref`. - **PR review bodies** — text submitted alongside an Approve / Request Changes / Comment review action, when non-empty. Linked via `pull_request_review_message_ref`. - **Inline PR review comments** — diff-anchored comments attached to a specific file/line. Linked via `review_comments` (carries the diff position metadata). **Source endpoints by mode (GitHub):** - `pr_child_mode=rest` AND `listing_mode=rest`: comments fetched via `/issues/comments` (covers both issue and PR conversation comments), `/pulls/comments` (inline review comments), `/pulls/{n}/reviews` (review bodies). - `pr_child_mode=graphql` AND `listing_mode=graphql` (v0.18.5+ Phase 4): issue and PR conversation comments arrive **inline** via the GraphQL listing + PR-batch queries. `/issues/comments` is skipped entirely by the staged collector's `fullGraphQLMode()` gate. Review bodies are inline via `reviews` connection on PullRequest. The `/pulls/comments` REST iterator continues to run regardless of mode because GraphQL's PullRequestReviewComment type does not expose `side` / `startSide`. GitLab: all four kinds come from `/projects/{id}/merge_requests/{n}/notes` and `/projects/{id}/merge_requests/{n}/discussions` (REST composition; no GraphQL path on GitLab). | Column | Type | Source | Description | |--------|------|--------|-------------| | `msg_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `rgls_id` | BIGINT | Computed | Optional FK to `repo_groups_list_serve` for mailing list messages. | | `platform_msg_id` | BIGINT NOT NULL | GitHub (REST or GraphQL inline, see modes above) ; GitLab `/merge_requests/{n}/notes`, `/merge_requests/{n}/discussions` | Platform's comment / review ID. For review bodies this is the platform review ID. | | `platform_id` | SMALLINT NOT NULL (FK -> platforms) | Computed | Platform. | | `node_id` | TEXT | GitHub (REST or GraphQL inline) | GitHub GraphQL node ID. | | `msg_text` | TEXT | GitHub (REST or GraphQL inline) ; GitLab `/merge_requests/{n}/notes` | Comment / review body text. | | `msg_timestamp` | TIMESTAMPTZ | GitHub (REST or GraphQL inline) ; GitLab `/merge_requests/{n}/notes` | When the message was posted (or review submitted). | | `msg_sender_email` | TEXT | Computed | Email of the comment author (resolved from contributor). | | `msg_header` | TEXT | Not yet populated | Message header (for mailing list messages). | | `cntrb_id` | UUID (FK -> contributors) | Computed (resolved from platform user data) | Comment / review author. | | | | | *Standard metadata columns* | **Unique constraint:** `(platform_msg_id, platform_id)` --- #### pull_request_message_ref Join table linking pull requests to their **conversation** comments in the messages table. These are the IssueComment-shaped nodes GitHub serves under `/issues/{n}/comments` on REST (even for PRs); they are NOT diff-anchored inline review comments — those live in `review_comments`. | Column | Type | Source | Description | |--------|------|--------|-------------| | `pr_msg_ref_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pull_request_id` | BIGINT NOT NULL (FK -> pull_requests) | Computed | The PR. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `msg_id` | BIGINT NOT NULL (FK -> messages) | Computed | The message/comment. | | `platform_src_id` | BIGINT | GitHub (REST `/issues/comments` in rest mode; inline via GraphQL `PullRequest.comments` in graphql mode) ; GitLab `/merge_requests/{n}/notes` | Platform's comment ID. | | `platform_node_id` | TEXT | GitHub (REST or GraphQL inline) | GitHub GraphQL node ID. | | | | | *Standard metadata columns* | **Unique constraint:** `(pull_request_id, msg_id)` --- #### review_comments Inline code review comments with full diff positioning. One row per diff-anchored review comment. Each row links to both a `pull_request_reviews` row (the parent review) and a `messages` row (the comment body text). **Always populated via REST `/repos/{o}/{r}/pulls/comments` regardless of `pr_child_mode` / `listing_mode` configuration.** GitHub's GraphQL `PullRequestReviewComment` type does not expose the `side` / `startSide` fields aveloxis requires for diff-side fidelity, so the v0.18.5 Phase 4 GraphQL-inline-comments work explicitly carved this endpoint out and kept the REST iterator running. See CLAUDE.md's "Inline REVIEW comments NOT fetched via GraphQL — deliberate" subsection for the schema-validation history. | Column | Type | Source | Description | |--------|------|--------|-------------| | `review_comment_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `pr_review_id` | BIGINT (FK -> pull_request_reviews) | Computed | The parent review. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `msg_id` | BIGINT NOT NULL (FK -> messages) | Computed | The message body. | | `platform_src_id` | BIGINT | GitHub REST: `/pulls/comments` | Platform's comment ID. | | `node_id` | TEXT | GitHub REST: `/pulls/comments` | GitHub GraphQL node ID. | | `diff_hunk` | TEXT | GitHub REST: `/pulls/comments` | Diff hunk context. | | `file_path` | TEXT | GitHub REST: `/pulls/comments` | File path the comment is on. | | `position` | INT | GitHub REST: `/pulls/comments` | Line position in the diff. | | `original_position` | INT | GitHub REST: `/pulls/comments` | Original line position. | | `commit_id` | TEXT | GitHub REST: `/pulls/comments` | Commit SHA the comment is on. | | `original_commit_id` | TEXT | GitHub REST: `/pulls/comments` | Original commit SHA. | | `line` | INT | GitHub REST: `/pulls/comments` | End line. | | `original_line` | INT | GitHub REST: `/pulls/comments` | Original end line. | | `side` | TEXT | GitHub REST: `/pulls/comments` | Diff side (`"LEFT"` or `"RIGHT"`). | | `start_line` | INT | GitHub REST: `/pulls/comments` | Multi-line start. | | `original_start_line` | INT | GitHub REST: `/pulls/comments` | Original multi-line start. | | `start_side` | TEXT | GitHub REST: `/pulls/comments` | Diff side for start line. | | `author_association` | TEXT | GitHub REST: `/pulls/comments` | Author's association to the repo. | | `html_url` | TEXT | GitHub REST: `/pulls/comments` | Web URL. | | `updated_at` | TIMESTAMPTZ | GitHub REST: `/pulls/comments` | Last update timestamp. | **Unique constraint:** `(repo_id, platform_src_id)` --- ### Commits (Git/Facade) #### commits Commit data from `git log --numstat` on the repo's **default branch only** (resolved via `git symbolic-ref HEAD`). One row per file per commit (the same commit hash appears multiple times, once for each file touched). Populated by the facade worker. > **Note (v0.16.4+):** earlier versions of aveloxis walked `git log --all --numstat` across every ref. That over-counted commits relative to GitHub's metadata `commit_count`, which reflects only the default branch. The facade was changed in v0.16.4 to use the default branch exclusively so the gathered count matches the metadata. Commits on side branches that never merged to default are NOT in this table by design. | Column | Type | Source | Description | |--------|------|--------|-------------| | `cmt_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `cmt_commit_hash` | TEXT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | Full SHA-1 hash. | | `cmt_author_name` | TEXT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | Author name from the commit. | | `cmt_author_raw_email` | TEXT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | Author email exactly as it appears in the commit. | | `cmt_author_email` | TEXT NOT NULL | `aveloxis-commit-resolver` | Resolved/canonical author email. | | `cmt_author_date` | TEXT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | Author date string. | | `cmt_author_affiliation` | TEXT | `aveloxis-commit-resolver` | Resolved organizational affiliation of the author. | | `cmt_committer_name` | TEXT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | Committer name. | | `cmt_committer_raw_email` | TEXT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | Committer email exactly as in the commit. | | `cmt_committer_email` | TEXT NOT NULL | `aveloxis-commit-resolver` | Resolved/canonical committer email. | | `cmt_committer_date` | TEXT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | Committer date string. | | `cmt_committer_affiliation` | TEXT | `aveloxis-commit-resolver` | Resolved organizational affiliation of the committer. | | `cmt_added` | INT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | Lines added in this file. | | `cmt_removed` | INT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | Lines removed in this file. | | `cmt_whitespace` | INT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | Whitespace-only changes in this file. | | `cmt_filename` | TEXT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | Path of the file changed. | | `cmt_date_attempted` | TIMESTAMPTZ NOT NULL | Auto-generated | When this row was first processed. | | `cmt_ght_committer_id` | INT | Augur import | Legacy GHTorrent committer ID. | | `cmt_ght_committed_at` | TIMESTAMPTZ | Augur import | Legacy GHTorrent commit timestamp. | | `cmt_committer_timestamp` | TIMESTAMPTZ | Git: `git log --numstat` (default branch only, per v0.16.4) | Parsed committer timestamp. | | `cmt_author_timestamp` | TIMESTAMPTZ | Git: `git log --numstat` (default branch only, per v0.16.4) | Parsed author timestamp. | | `cmt_author_platform_username` | TEXT | `aveloxis-commit-resolver` | Platform username resolved from the commit email. | | `cmt_ght_author_id` | UUID | `aveloxis-commit-resolver` | FK to contributors (resolved author). | | | | | *Standard metadata columns* | **Indexes:** `(repo_id, cmt_commit_hash)`, `(cmt_author_email)`, `(cmt_author_raw_email)`, `(cmt_committer_raw_email)`, `(cmt_author_affiliation)` --- #### commit_parents Parent-child relationships between commits. Used to reconstruct commit DAGs and identify merge commits. | Column | Type | Source | Description | |--------|------|--------|-------------| | `cmt_id` | BIGINT NOT NULL (PK part 1) | Git: `git log --numstat` (default branch only, per v0.16.4) | The child commit ID. | | `parent_id` | BIGSERIAL NOT NULL (PK part 2) | Git: `git log --numstat` (default branch only, per v0.16.4) | Auto-incrementing parent ordinal. | | | | | *Standard metadata columns* | **Primary key:** `(cmt_id, parent_id)` --- #### commit_messages Stores the full commit message text, deduplicated per repo and commit hash. | Column | Type | Source | Description | |--------|------|--------|-------------| | `cmt_msg_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `cmt_msg` | TEXT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | Full commit message text. | | `cmt_hash` | TEXT NOT NULL | Git: `git log --numstat` (default branch only, per v0.16.4) | Commit SHA. | | | | | *Standard metadata columns* | **Unique constraint:** `(repo_id, cmt_hash)` --- #### commit_comment_ref Comments made on specific commits (not PR review comments, but commit-level comments from the forge). | Column | Type | Source | Description | |--------|------|--------|-------------| | `cmt_comment_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `cmt_id` | BIGINT NOT NULL | Computed | The commit this comment is on. | | `repo_id` | BIGINT (FK -> repos) | Computed | Repository. | | `msg_id` | BIGINT NOT NULL | Computed | FK to messages table. | | `user_id` | BIGINT NOT NULL | GitHub REST: `/repos/{o}/{r}/commits/{sha}` | Platform user ID of the commenter. | | `body` | TEXT | GitHub REST: `/repos/{o}/{r}/commits/{sha}` | Comment body text. | | `line` | BIGINT | GitHub REST: `/repos/{o}/{r}/commits/{sha}` | Line number in the file. | | `position` | BIGINT | GitHub REST: `/repos/{o}/{r}/commits/{sha}` | Position in the diff. | | `commit_comment_src_node_id` | TEXT | GitHub REST: `/repos/{o}/{r}/commits/{sha}` | GitHub GraphQL node ID. | | `cmt_comment_src_id` | BIGINT NOT NULL UNIQUE | GitHub REST: `/repos/{o}/{r}/commits/{sha}` | Platform's comment ID. | | `created_at` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}/commits/{sha}` | When the comment was created. | | | | | *Standard metadata columns* | --- ### Releases #### releases Software releases and tags from the forge. | Column | Type | Source | Description | |--------|------|--------|-------------| | `release_id` | TEXT NOT NULL (PK part 1) | GitHub REST: `/repos/{o}/{r}/releases`, GitLab: `/projects/{id}/releases` | Platform's release identifier. | | `repo_id` | BIGINT NOT NULL (PK part 2, FK -> repos) | Computed | Repository. | | `release_name` | TEXT | GitHub REST: `/repos/{o}/{r}/releases`, GitLab: `/projects/{id}/releases` | Release title. | | `release_description` | TEXT | GitHub REST: `/repos/{o}/{r}/releases`, GitLab: `/projects/{id}/releases` | Release notes/body. | | `release_author` | TEXT | GitHub REST: `/repos/{o}/{r}/releases`, GitLab: `/projects/{id}/releases` | Author login. | | `release_tag_name` | TEXT | GitHub REST: `/repos/{o}/{r}/releases`, GitLab: `/projects/{id}/releases` | Git tag name (e.g., `"v1.0.0"`). | | `release_url` | TEXT | GitHub REST: `/repos/{o}/{r}/releases`, GitLab: `/projects/{id}/releases` | Web URL. | | `created_at` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}/releases`, GitLab: `/projects/{id}/releases` | Creation timestamp. | | `published_at` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}/releases`, GitLab: `/projects/{id}/releases` | Publication timestamp. | | `updated_at` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}/releases`, GitLab: `/projects/{id}/releases` | Last update timestamp. | | `is_draft` | BOOLEAN | GitHub REST: `/repos/{o}/{r}/releases`, GitLab: `/projects/{id}/releases` | Whether the release is a draft. | | `is_prerelease` | BOOLEAN | GitHub REST: `/repos/{o}/{r}/releases`, GitLab: `/projects/{id}/releases` | Whether the release is a pre-release. | | `tag_only` | BOOLEAN | Computed | `TRUE` if this is a lightweight tag with no release body. | | | | | *Standard metadata columns* | **Primary key:** `(repo_id, release_id)` --- ### Repository Metadata #### repo_info Point-in-time snapshots of repository metadata and statistics. A new row is inserted on each collection run, creating a time series. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_info_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `last_updated` | TIMESTAMPTZ | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}?statistics=true` | When the repo was last pushed to. | | `issues_enabled` | TEXT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | Whether issues are enabled. | | `prs_enabled` | TEXT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | Whether PRs/MRs are enabled. | | `wiki_enabled` | TEXT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | Whether the wiki is enabled. | | `pages_enabled` | TEXT | GitHub REST: `/repos/{o}/{r}` | Whether GitHub Pages is enabled. | | `fork_count` | INT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}?statistics=true` | Number of forks. | | `star_count` | INT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}?statistics=true` | Number of stars. | | `watcher_count` | INT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}?statistics=true` | Number of watchers. | | `open_issues` | INT | GitHub REST: `/repos/{o}/{r}` | Number of open issues (GitHub includes PRs). | | `committer_count` | INT | Computed | Distinct committers. | | `commit_count` | BIGINT | GitLab: `/projects/{id}?statistics=true` / Computed | Total commits. | | `issues_count` | BIGINT | Computed | Total issues. | | `issues_closed` | BIGINT | Computed | Closed issues. | | `pr_count` | BIGINT | Computed | Total PRs. | | `prs_open` | BIGINT | Computed | Open PRs. | | `prs_closed` | BIGINT | Computed | Closed (not merged) PRs. | | `prs_merged` | BIGINT | Computed | Merged PRs. | | `default_branch` | TEXT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | Default branch name (e.g., `"main"`). | | `license` | TEXT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | SPDX license identifier. | | `issue_contributors_count` | TEXT | Computed | Distinct issue contributors. | | `changelog_file` | TEXT | Computed | Whether a CHANGELOG file exists. | | `contributing_file` | TEXT | Computed | Whether a CONTRIBUTING file exists. | | `license_file` | TEXT | Computed | Whether a LICENSE file exists. | | `code_of_conduct_file` | TEXT | Computed | Whether a CODE_OF_CONDUCT file exists. | | `security_issue_file` | TEXT | Computed | Whether a SECURITY file exists. | | `security_audit_file` | TEXT | Computed | Whether a security audit file exists. | | `status` | TEXT | Computed | Repository health status. | | `keywords` | TEXT | GitHub REST: `/repos/{o}/{r}`, GitLab: `/projects/{id}` | Repository topics/tags. | | | | | *Standard metadata columns* | --- #### repo_clones Clone traffic data. Only available for repos where you have push access (GitHub-only feature). | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_clone_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `clone_timestamp` | TIMESTAMPTZ NOT NULL | GitHub REST: `/repos/{o}/{r}/traffic/clones` | Date of the clone data point. | | `total_clones` | INT | GitHub REST: `/repos/{o}/{r}/traffic/clones` | Total clones. | | `unique_clones` | INT | GitHub REST: `/repos/{o}/{r}/traffic/clones` | Unique cloners. | | | | | *Standard metadata columns* | **Unique constraint:** `(repo_id, clone_timestamp)` --- #### repo_badging CII Best Practices badge data stored as raw JSON. Each row is a snapshot from the CII badge API. | Column | Type | Source | Description | |--------|------|--------|-------------| | `badge_collection_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Computed | Repository. | | `created_at` | TIMESTAMPTZ | Auto-generated | When this snapshot was taken. | | `data` | JSONB | Not yet populated | Raw badge data JSON. | | | | | *Standard metadata columns* | --- #### dei_badging DEI (Diversity, Equity, and Inclusion) badging levels for repositories. | Column | Type | Source | Description | |--------|------|--------|-------------| | `id` | SERIAL NOT NULL (PK part 1) | Auto-generated | Primary key part 1. | | `badging_id` | INT NOT NULL | Not yet populated | External badging system ID. | | `level` | TEXT NOT NULL | Not yet populated | Badge level achieved. | | `repo_id` | BIGINT NOT NULL (PK part 2, FK -> repos) | Computed | Repository. | **Primary key:** `(id, repo_id)` --- #### repo_insights Computed insight records for repositories. Each row is one metric observation at one point in time. | Column | Type | Source | Description | |--------|------|--------|-------------| | `ri_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Computed | Repository. | | `ri_metric` | TEXT | Computed | Metric name (e.g., `"commit-count"`). | | `ri_value` | TEXT | Computed | Metric value. | | `ri_date` | TIMESTAMPTZ | Computed | Date of the observation. | | `ri_fresh` | BOOLEAN | Computed | Whether this insight is current. | | `ri_score` | NUMERIC | Computed | Numeric score for ranking. | | `ri_field` | TEXT | Computed | Field name within the metric. | | `ri_detection_method` | TEXT | Computed | How the insight was detected (e.g., anomaly detection). | | | | | *Standard metadata columns* | --- #### repo_insights_records Archived insight records. Same structure as `repo_insights` but for historical data. | Column | Type | Source | Description | |--------|------|--------|-------------| | `ri_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Computed | Repository. | | `ri_metric` | TEXT | Computed | Metric name. | | `ri_field` | TEXT | Computed | Field name. | | `ri_value` | TEXT | Computed | Metric value. | | `ri_date` | TIMESTAMPTZ | Computed | Observation date. | | `ri_score` | FLOAT | Computed | Numeric score. | | `ri_detection_method` | TEXT | Computed | Detection method. | | | | | *Standard metadata columns* | --- #### repo_group_insights Aggregated insights at the repo-group level. | Column | Type | Source | Description | |--------|------|--------|-------------| | `rgi_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_group_id` | BIGINT (FK -> repo_groups) | Computed | Repo group. | | `rgi_metric` | TEXT | Computed | Metric name. | | `rgi_value` | TEXT | Computed | Metric value. | | `cms_id` | BIGINT | Computed | FK to `chaoss_metric_status`. | | `rgi_fresh` | BOOLEAN | Computed | Whether this insight is current. | | | | | *Standard metadata columns* | --- ### Dependencies and SBOM #### repo_dependencies High-level dependency counts per language for a repository. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_dependencies_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Computed | Repository. | | `dep_name` | TEXT | Not yet populated | Dependency name. | | `dep_count` | INT | Not yet populated | Number of times this dependency appears. | | `dep_language` | TEXT | Not yet populated | Language of the dependency. | | | | | *Standard metadata columns* | --- #### repo_deps_libyear Libyear analysis results. Measures how out-of-date each dependency is by comparing current vs. latest versions. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_deps_libyear_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Computed | Repository. | | `name` | TEXT | Not yet populated | Dependency name. | | `requirement` | TEXT | Not yet populated | Version requirement string. | | `type` | TEXT | Not yet populated | Dependency type (e.g., `"runtime"`, `"development"`). | | `package_manager` | TEXT | Not yet populated | Package manager (e.g., `"npm"`, `"pip"`). | | `current_version` | TEXT | Not yet populated | Currently used version. | | `latest_version` | TEXT | Not yet populated | Latest available version. | | `current_release_date` | TEXT | Not yet populated | Release date of current version. | | `latest_release_date` | TEXT | Not yet populated | Release date of latest version. | | `libyear` | FLOAT | Not yet populated | Libyear score (years between current and latest). | | | | | *Standard metadata columns* | --- #### repo_deps_scorecard OpenSSF Scorecard check results for a repository. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_deps_scorecard_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Computed | Repository. | | `name` | TEXT | Not yet populated | Scorecard check name (e.g., `"Code-Review"`, `"Branch-Protection"`). | | `score` | TEXT | Not yet populated | Check score. | | `scorecard_check_details` | JSONB | Not yet populated | Detailed check results as JSON. | | | | | *Standard metadata columns* | --- #### repo_sbom_scans Raw SBOM (Software Bill of Materials) scan results. | Column | Type | Source | Description | |--------|------|--------|-------------| | `rsb_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Computed | Repository. | | `sbom_scan` | JSON | Not yet populated | Raw SBOM scan output as JSON. | --- ### Libraries #### libraries Metadata about software libraries/packages associated with a repository. | Column | Type | Source | Description | |--------|------|--------|-------------| | `library_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Computed | Repository. | | `platform` | TEXT | Not yet populated | Package registry platform (e.g., `"npm"`, `"pypi"`). | | `name` | TEXT | Not yet populated | Package name. | | `created_timestamp` | TIMESTAMPTZ | Not yet populated | When the package was first published. | | `updated_timestamp` | TIMESTAMPTZ | Not yet populated | Last update timestamp. | | `library_description` | TEXT | Not yet populated | Package description. | | `keywords` | TEXT | Not yet populated | Package keywords. | | `library_homepage` | TEXT | Not yet populated | Homepage URL. | | `license` | TEXT | Not yet populated | License identifier. | | `version_count` | INT | Not yet populated | Total number of versions published. | | `latest_release_timestamp` | TEXT | Not yet populated | When the latest version was released. | | `latest_release_number` | TEXT | Not yet populated | Latest version number. | | `package_manager_id` | TEXT | Not yet populated | ID on the package manager. | | `dependency_count` | INT | Not yet populated | Number of dependencies this library has. | | `dependent_library_count` | INT | Not yet populated | Number of libraries that depend on this one. | | `primary_language` | TEXT | Not yet populated | Primary language. | | | | | *Standard metadata columns* | --- #### library_dependencies Manifest-level dependency declarations for libraries. | Column | Type | Source | Description | |--------|------|--------|-------------| | `lib_dependency_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `library_id` | BIGINT (FK -> libraries) | Computed | The library. | | `manifest_platform` | TEXT | Not yet populated | Platform of the manifest. | | `manifest_filepath` | TEXT | Not yet populated | Path to the manifest file. | | `manifest_kind` | TEXT | Not yet populated | Kind of manifest (e.g., `"lockfile"`, `"manifest"`). | | `repo_id_branch` | TEXT NOT NULL | Not yet populated | Repository ID and branch identifier. | | | | | *Standard metadata columns* | --- #### library_version Published versions of a library. | Column | Type | Source | Description | |--------|------|--------|-------------| | `library_version_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `library_id` | BIGINT (FK -> libraries) | Computed | The library. | | `library_platform` | TEXT | Not yet populated | Package registry platform. | | `version_number` | TEXT | Not yet populated | Version string (e.g., `"2.1.0"`). | | `version_release_date` | TIMESTAMPTZ | Not yet populated | When this version was published. | | | | | *Standard metadata columns* | --- ### Facade Aggregates (Data Mart) These tables contain pre-aggregated commit statistics computed from the `commits` table. They exist at annual, monthly, and weekly granularity for both per-repo and per-repo-group levels. All are populated by SQL aggregation over the `commits` table. #### dm_repo_annual Annual commit statistics per contributor per repository. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_id` | BIGINT NOT NULL | Computed: SQL aggregation | Repository. | | `email` | TEXT NOT NULL | Computed: SQL aggregation | Contributor email. | | `affiliation` | TEXT | Computed: SQL aggregation | Organizational affiliation. | | `year` | SMALLINT NOT NULL | Computed: SQL aggregation | Calendar year. | | `added` | BIGINT NOT NULL | Computed: SQL aggregation | Total lines added. | | `removed` | BIGINT NOT NULL | Computed: SQL aggregation | Total lines removed. | | `whitespace` | BIGINT NOT NULL | Computed: SQL aggregation | Total whitespace changes. | | `files` | BIGINT NOT NULL | Computed: SQL aggregation | Distinct files changed. | | `patches` | BIGINT NOT NULL | Computed: SQL aggregation | Number of commits/patches. | | | | | *Standard metadata columns* | **Index:** `(repo_id, affiliation)` --- #### dm_repo_monthly Monthly commit statistics per contributor per repository. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_id` | BIGINT NOT NULL | Computed: SQL aggregation | Repository. | | `email` | TEXT NOT NULL | Computed: SQL aggregation | Contributor email. | | `affiliation` | TEXT | Computed: SQL aggregation | Organizational affiliation. | | `month` | SMALLINT NOT NULL | Computed: SQL aggregation | Calendar month (1-12). | | `year` | SMALLINT NOT NULL | Computed: SQL aggregation | Calendar year. | | `added` | BIGINT NOT NULL | Computed: SQL aggregation | Total lines added. | | `removed` | BIGINT NOT NULL | Computed: SQL aggregation | Total lines removed. | | `whitespace` | BIGINT NOT NULL | Computed: SQL aggregation | Total whitespace changes. | | `files` | BIGINT NOT NULL | Computed: SQL aggregation | Distinct files changed. | | `patches` | BIGINT NOT NULL | Computed: SQL aggregation | Number of commits/patches. | | | | | *Standard metadata columns* | --- #### dm_repo_weekly Weekly commit statistics per contributor per repository. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_id` | BIGINT NOT NULL | Computed: SQL aggregation | Repository. | | `email` | TEXT NOT NULL | Computed: SQL aggregation | Contributor email. | | `affiliation` | TEXT | Computed: SQL aggregation | Organizational affiliation. | | `week` | SMALLINT NOT NULL | Computed: SQL aggregation | ISO week number (1-53). | | `year` | SMALLINT NOT NULL | Computed: SQL aggregation | Calendar year. | | `added` | BIGINT NOT NULL | Computed: SQL aggregation | Total lines added. | | `removed` | BIGINT NOT NULL | Computed: SQL aggregation | Total lines removed. | | `whitespace` | BIGINT NOT NULL | Computed: SQL aggregation | Total whitespace changes. | | `files` | BIGINT NOT NULL | Computed: SQL aggregation | Distinct files changed. | | `patches` | BIGINT NOT NULL | Computed: SQL aggregation | Number of commits/patches. | | | | | *Standard metadata columns* | --- #### dm_repo_group_annual Annual commit statistics per contributor per repo group. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_group_id` | BIGINT NOT NULL | Computed: SQL aggregation | Repo group. | | `email` | TEXT NOT NULL | Computed: SQL aggregation | Contributor email. | | `affiliation` | TEXT | Computed: SQL aggregation | Organizational affiliation. | | `year` | SMALLINT NOT NULL | Computed: SQL aggregation | Calendar year. | | `added` | BIGINT NOT NULL | Computed: SQL aggregation | Total lines added. | | `removed` | BIGINT NOT NULL | Computed: SQL aggregation | Total lines removed. | | `whitespace` | BIGINT NOT NULL | Computed: SQL aggregation | Total whitespace changes. | | `files` | BIGINT NOT NULL | Computed: SQL aggregation | Distinct files changed. | | `patches` | BIGINT NOT NULL | Computed: SQL aggregation | Number of commits/patches. | | | | | *Standard metadata columns* | --- #### dm_repo_group_monthly Monthly commit statistics per contributor per repo group. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_group_id` | BIGINT NOT NULL | Computed: SQL aggregation | Repo group. | | `email` | TEXT NOT NULL | Computed: SQL aggregation | Contributor email. | | `affiliation` | TEXT | Computed: SQL aggregation | Organizational affiliation. | | `month` | SMALLINT NOT NULL | Computed: SQL aggregation | Calendar month (1-12). | | `year` | SMALLINT NOT NULL | Computed: SQL aggregation | Calendar year. | | `added` | BIGINT NOT NULL | Computed: SQL aggregation | Total lines added. | | `removed` | BIGINT NOT NULL | Computed: SQL aggregation | Total lines removed. | | `whitespace` | BIGINT NOT NULL | Computed: SQL aggregation | Total whitespace changes. | | `files` | BIGINT NOT NULL | Computed: SQL aggregation | Distinct files changed. | | `patches` | BIGINT NOT NULL | Computed: SQL aggregation | Number of commits/patches. | | | | | *Standard metadata columns* | --- #### dm_repo_group_weekly Weekly commit statistics per contributor per repo group. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_group_id` | BIGINT NOT NULL | Computed: SQL aggregation | Repo group. | | `email` | TEXT NOT NULL | Computed: SQL aggregation | Contributor email. | | `affiliation` | TEXT | Computed: SQL aggregation | Organizational affiliation. | | `week` | SMALLINT NOT NULL | Computed: SQL aggregation | ISO week number (1-53). | | `year` | SMALLINT NOT NULL | Computed: SQL aggregation | Calendar year. | | `added` | BIGINT NOT NULL | Computed: SQL aggregation | Total lines added. | | `removed` | BIGINT NOT NULL | Computed: SQL aggregation | Total lines removed. | | `whitespace` | BIGINT NOT NULL | Computed: SQL aggregation | Total whitespace changes. | | `files` | BIGINT NOT NULL | Computed: SQL aggregation | Distinct files changed. | | `patches` | BIGINT NOT NULL | Computed: SQL aggregation | Number of commits/patches. | | | | | *Standard metadata columns* | --- ### Code Complexity #### repo_labor Per-file code complexity and line count metrics, typically from `scc` (Sloc Cloc and Code) analysis of cloned repositories. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_labor_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Computed | Repository. | | `repo_clone_date` | TIMESTAMPTZ | Computed | When the repo was cloned for analysis. | | `rl_analysis_date` | TIMESTAMPTZ | Computed | When the analysis was run. | | `programming_language` | TEXT | Computed | Language of the file. | | `file_path` | TEXT | Computed | Path within the repo. | | `file_name` | TEXT | Computed | File name. | | `total_lines` | INT | Computed | Total lines in the file. | | `code_lines` | INT | Computed | Lines of code (excluding comments and blanks). | | `comment_lines` | INT | Computed | Lines of comments. | | `blank_lines` | INT | Computed | Blank lines. | | `code_complexity` | INT | Computed | Cyclomatic complexity score. | | `repo_url` | TEXT | Computed | Git URL of the repo. | | | | | *Standard metadata columns* | --- #### repo_meta Key-value metadata store for arbitrary repository attributes. | Column | Type | Source | Description | |--------|------|--------|-------------| | `rmeta_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `rmeta_name` | TEXT | Computed | Metadata key name. | | `rmeta_value` | TEXT | Computed | Metadata value. Default `'0'`. | | | | | *Standard metadata columns* | --- #### repo_stats Numeric statistics for repositories, stored as key-value pairs. | Column | Type | Source | Description | |--------|------|--------|-------------| | `rstat_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository. | | `rstat_name` | TEXT | Computed | Statistic name. | | `rstat_value` | BIGINT | Computed | Statistic value. | | | | | *Standard metadata columns* | --- #### repo_test_coverage Per-file test coverage data. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_id` | BIGSERIAL (PK) | Auto-generated | Primary key (also serves as repo reference). | | `repo_clone_date` | TIMESTAMPTZ | Not yet populated | When the repo was cloned. | | `rtc_analysis_date` | TIMESTAMPTZ | Not yet populated | When coverage analysis was run. | | `programming_language` | TEXT | Not yet populated | Language of the file. | | `file_path` | TEXT | Not yet populated | File path. | | `file_name` | TEXT | Not yet populated | File name. | | `testing_tool` | TEXT | Not yet populated | Test framework used. | | `file_statement_count` | BIGINT | Not yet populated | Total statements in the file. | | `file_subroutine_count` | BIGINT | Not yet populated | Total subroutines. | | `file_statements_tested` | BIGINT | Not yet populated | Statements covered by tests. | | `file_subroutines_tested` | BIGINT | Not yet populated | Subroutines covered by tests. | | | | | *Standard metadata columns* | --- ### CHAOSS Metrics #### chaoss_metric_status Registry of CHAOSS metrics and their implementation status in Aveloxis. | Column | Type | Source | Description | |--------|------|--------|-------------| | `cms_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `cm_group` | TEXT | User input / Augur import | Metric group. | | `cm_source` | TEXT | User input / Augur import | Data source for the metric. | | `cm_type` | TEXT | User input / Augur import | Metric type. | | `cm_backend_status` | TEXT | User input / Augur import | Backend implementation status. | | `cm_frontend_status` | TEXT | User input / Augur import | Frontend implementation status. | | `cm_defined` | BOOLEAN | User input / Augur import | Whether the metric is formally defined by CHAOSS. | | `cm_api_endpoint_repo` | TEXT | User input / Augur import | API endpoint for per-repo queries. | | `cm_api_endpoint_rg` | TEXT | User input / Augur import | API endpoint for per-repo-group queries. | | `cm_name` | TEXT | User input / Augur import | Metric name. | | `cm_working_group` | TEXT | User input / Augur import | CHAOSS working group. | | `cm_info` | JSON | User input / Augur import | Additional metric information as JSON. | | `cm_working_group_focus_area` | TEXT | User input / Augur import | Focus area within the working group. | | | | | *Standard metadata columns* | --- #### chaoss_user User accounts for the CHAOSS metrics interface. | Column | Type | Source | Description | |--------|------|--------|-------------| | `chaoss_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `chaoss_login_name` | TEXT | User input | Login name. | | `chaoss_login_hashword` | TEXT | User input | Hashed password. | | `chaoss_email` | TEXT UNIQUE | User input | Email address. | | `chaoss_text_phone` | TEXT | User input | Phone number for text notifications. | | `chaoss_first_name` | TEXT | User input | First name. | | `chaoss_last_name` | TEXT | User input | Last name. | | | | | *Standard metadata columns* | --- ### Analysis and ML (schema parity, not yet populated) #### message_analysis Per-message sentiment analysis results from ML workers. | Column | Type | Source | Description | |--------|------|--------|-------------| | `msg_analysis_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `msg_id` | BIGINT | Not yet populated | FK to messages table. | | `worker_run_id` | BIGINT | Not yet populated | ID of the worker run that produced this result. | | `sentiment_score` | FLOAT | Not yet populated | Sentiment score (-1.0 to 1.0). | | `reconstruction_error` | FLOAT | Not yet populated | Autoencoder reconstruction error (anomaly score). | | `novelty_flag` | BOOLEAN | Not yet populated | Whether the message was flagged as novel. | | `feedback_flag` | BOOLEAN | Not yet populated | Whether the message was flagged as feedback. | | | | | *Standard metadata columns* | --- #### message_analysis_summary Aggregated sentiment analysis summaries per repository per time period. | Column | Type | Source | Description | |--------|------|--------|-------------| | `msg_summary_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Not yet populated | Repository. | | `worker_run_id` | BIGINT | Not yet populated | Worker run ID. | | `positive_ratio` | FLOAT | Not yet populated | Ratio of positive messages. | | `negative_ratio` | FLOAT | Not yet populated | Ratio of negative messages. | | `novel_count` | BIGINT | Not yet populated | Count of novel messages. | | `period` | TIMESTAMPTZ | Not yet populated | Time period for the summary. | | | | | *Standard metadata columns* | --- #### message_sentiment Duplicate of `message_analysis` for schema parity with Augur. Not yet populated. | Column | Type | Source | Description | |--------|------|--------|-------------| | `msg_analysis_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `msg_id` | BIGINT | Not yet populated | FK to messages. | | `worker_run_id` | BIGINT | Not yet populated | Worker run ID. | | `sentiment_score` | FLOAT | Not yet populated | Sentiment score. | | `reconstruction_error` | FLOAT | Not yet populated | Autoencoder reconstruction error. | | `novelty_flag` | BOOLEAN | Not yet populated | Novelty flag. | | `feedback_flag` | BOOLEAN | Not yet populated | Feedback flag. | | | | | *Standard metadata columns* | --- #### message_sentiment_summary Aggregated sentiment summaries. Duplicate of `message_analysis_summary` for schema parity. | Column | Type | Source | Description | |--------|------|--------|-------------| | `msg_summary_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Not yet populated | Repository. | | `worker_run_id` | BIGINT | Not yet populated | Worker run ID. | | `positive_ratio` | FLOAT | Not yet populated | Positive message ratio. | | `negative_ratio` | FLOAT | Not yet populated | Negative message ratio. | | `novel_count` | BIGINT | Not yet populated | Novel message count. | | `period` | TIMESTAMPTZ | Not yet populated | Summary time period. | | | | | *Standard metadata columns* | --- #### discourse_insights Discourse act classification for messages. Categorizes messages by their communicative function. | Column | Type | Source | Description | |--------|------|--------|-------------| | `msg_discourse_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `msg_id` | BIGINT | Not yet populated | FK to messages. | | `discourse_act` | TEXT | Not yet populated | Discourse act label (e.g., `"question"`, `"answer"`, `"agreement"`). | | | | | *Standard metadata columns* | --- #### lstm_anomaly_models LSTM neural network model definitions for anomaly detection. | Column | Type | Source | Description | |--------|------|--------|-------------| | `model_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `model_name` | TEXT | Not yet populated | Model name. | | `model_description` | TEXT | Not yet populated | Model description. | | `look_back_days` | BIGINT | Not yet populated | Number of days of history the model considers. | | `training_days` | BIGINT | Not yet populated | Number of days of data used for training. | | `batch_size` | BIGINT | Not yet populated | Training batch size. | | `metric` | TEXT | Not yet populated | Which metric this model targets. | | | | | *Standard metadata columns* | --- #### lstm_anomaly_results Results from LSTM anomaly detection runs. | Column | Type | Source | Description | |--------|------|--------|-------------| | `result_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Not yet populated | Repository. | | `repo_category` | TEXT | Not yet populated | Category of the repository. | | `model_id` | BIGINT (FK -> lstm_anomaly_models) | Not yet populated | Model used. | | `metric` | TEXT | Not yet populated | Metric analyzed. | | `contamination_factor` | FLOAT | Not yet populated | Anomaly detection contamination factor. | | `mean_absolute_error` | FLOAT | Not yet populated | MAE of the model predictions. | | `remarks` | TEXT | Not yet populated | Human-readable remarks. | | `metric_field` | TEXT | Not yet populated | Specific field within the metric. | | `mean_absolute_actual_value` | FLOAT | Not yet populated | Mean of actual values. | | `mean_absolute_prediction_value` | FLOAT | Not yet populated | Mean of predicted values. | | | | | *Standard metadata columns* | --- ### Topic Modeling (schema parity, not yet populated) #### topic_model_meta Metadata for trained topic models. Stores model configuration, quality metrics, and file paths. | Column | Type | Source | Description | |--------|------|--------|-------------| | `model_id` | UUID (PK) | Auto-generated (`gen_random_uuid()`) | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Not yet populated | Repository the model was trained on. | | `model_method` | TEXT NOT NULL | Not yet populated | Topic modeling method (e.g., `"LDA"`, `"NMF"`). | | `num_topics` | INT NOT NULL | Not yet populated | Number of topics. | | `num_words_per_topic` | INT NOT NULL | Not yet populated | Words per topic. | | `training_parameters` | JSON NOT NULL | Not yet populated | Full training parameters as JSON. | | `model_file_paths` | JSON NOT NULL | Not yet populated | Paths to saved model files. | | `parameters_hash` | TEXT NOT NULL | Not yet populated | Hash of parameters for deduplication. | | `coherence_score` | FLOAT NOT NULL | Not yet populated | Topic coherence score. | | `perplexity_score` | FLOAT NOT NULL | Not yet populated | Perplexity score. | | `topic_diversity` | FLOAT NOT NULL | Not yet populated | Topic diversity score. | | `quality` | JSON NOT NULL | Not yet populated | Quality metrics as JSON. | | `training_message_count` | BIGINT NOT NULL | Not yet populated | Number of messages used for training. | | `data_fingerprint` | JSON NOT NULL | Not yet populated | Fingerprint of the training data. | | `visualization_data` | JSON | Not yet populated | Pre-computed visualization data. | | `training_start_time` | TIMESTAMPTZ NOT NULL | Not yet populated | When training started. | | `training_end_time` | TIMESTAMPTZ NOT NULL | Not yet populated | When training finished. | | | | | *Standard metadata columns* | --- #### topic_model_event Event log for topic model training and inference runs. | Column | Type | Source | Description | |--------|------|--------|-------------| | `event_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `ts` | TIMESTAMPTZ NOT NULL | Auto-generated | Event timestamp. Default `NOW()`. | | `repo_id` | INT | Not yet populated | Repository. | | `model_id` | UUID | Not yet populated | FK to topic_model_meta. | | `event` | TEXT NOT NULL | Not yet populated | Event description. | | `level` | TEXT NOT NULL | Not yet populated | Log level (e.g., `'INFO'`, `'ERROR'`). | | `payload` | JSONB NOT NULL | Not yet populated | Event payload as JSON. | --- #### topic_words Words and their probabilities within discovered topics. | Column | Type | Source | Description | |--------|------|--------|-------------| | `topic_words_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `topic_id` | BIGINT | Not yet populated | Topic number. | | `word` | TEXT | Not yet populated | Word. | | `word_prob` | FLOAT | Not yet populated | Probability of this word in the topic. | | | | | *Standard metadata columns* | --- #### repo_cluster_messages Cluster assignments for repository messages (content and mechanism clusters). | Column | Type | Source | Description | |--------|------|--------|-------------| | `msg_cluster_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Not yet populated | Repository. | | `cluster_content` | INT | Not yet populated | Content-based cluster assignment. | | `cluster_mechanism` | INT | Not yet populated | Mechanism-based cluster assignment. | | | | | *Standard metadata columns* | --- #### repo_topic Topic distribution for repositories (which topics are associated with a repo). | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_topic_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT (FK -> repos) | Not yet populated | Repository. | | `topic_id` | INT | Not yet populated | Topic number. | | `topic_prob` | FLOAT | Not yet populated | Probability of this topic for the repo. | | | | | *Standard metadata columns* | --- ### Network Analysis (schema parity, not yet populated) #### network_beyond_augur Cross-repository contributor activity network data. Records how contributors act across multiple repos. | Column | Type | Source | Description | |--------|------|--------|-------------| | `cntrb_id` | UUID | Not yet populated | Contributor. | | `repo_git` | TEXT | Not yet populated | Git URL of the repo. | | `repo_name` | TEXT | Not yet populated | Repository name. | | `action` | TEXT | Not yet populated | Action type (e.g., `"commit"`, `"issue"`, `"pr"`). | | `action_year` | FLOAT | Not yet populated | Year of the action. | | `action_quarter` | NUMERIC | Not yet populated | Quarter of the action (1-4). | | `counter` | BIGINT | Not yet populated | Count of actions. | *No primary key defined.* --- #### network_beyond_augur_dependencies Cross-repository dependency network data. Tracks how repos depend on each other via contributor overlap. | Column | Type | Source | Description | |--------|------|--------|-------------| | `cntrb_id` | UUID | Not yet populated | Contributor. | | `repo_git` | TEXT | Not yet populated | Git URL. | | `repo_name` | TEXT | Not yet populated | Repository name. | | `action` | TEXT | Not yet populated | Action type. | | `action_year` | FLOAT | Not yet populated | Year. | | `action_quarter` | NUMERIC | Not yet populated | Quarter (1-4). | | `counter` | BIGINT | Not yet populated | Count. | *No primary key defined.* --- ### Miscellaneous #### exclude Exclusion rules for filtering out certain email addresses or domains from analysis. | Column | Type | Source | Description | |--------|------|--------|-------------| | `id` | INT (PK) | User input | Primary key. | | `projects_id` | INT NOT NULL | User input | Project/repo group to apply the exclusion to. | | `email` | TEXT | User input | Email address to exclude. | | `domain` | TEXT | User input | Email domain to exclude. | --- #### historical_repo_urls Tracks URL changes for repositories over time (e.g., when a repo is renamed or transferred). | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_id` | BIGINT NOT NULL (PK part 1) | Computed | Repository. | | `git_url` | TEXT NOT NULL (PK part 2) | Computed | A historical git URL for this repo. | | `date_collected` | TIMESTAMPTZ | Auto-generated | When this URL was recorded. | **Primary key:** `(repo_id, git_url)` --- #### repos_fetch_log Log of repository fetch attempts and their outcomes. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repos_id` | INT NOT NULL | Computed | Repository ID. | | `status` | TEXT NOT NULL | Computed | Fetch status (e.g., `"success"`, `"error"`). | | `date` | TIMESTAMPTZ NOT NULL | Auto-generated | When the fetch was attempted. | *No primary key defined.* --- #### settings Application settings for the data schema (separate from ops config). | Column | Type | Source | Description | |--------|------|--------|-------------| | `id` | INT (PK) | User input | Primary key. | | `setting` | TEXT NOT NULL | User input | Setting name. | | `value` | TEXT NOT NULL | User input | Setting value. | | `last_modified` | TIMESTAMPTZ NOT NULL | Auto-generated | Last modification timestamp. | --- #### unknown_cache Cache for contributor emails that could not be resolved to an affiliation. Prevents repeated lookups. | Column | Type | Source | Description | |--------|------|--------|-------------| | `type` | TEXT NOT NULL | Computed | Type of unknown entry. | | `repo_group_id` | INT NOT NULL | Computed | Repo group context. | | `email` | TEXT NOT NULL | Computed | Unresolved email. | | `domain` | TEXT | Computed | Email domain. | | `added` | BIGINT NOT NULL | Computed | Lines added by this email. | | | | | *Standard metadata columns* | *No primary key defined.* --- #### utility_log General-purpose utility log for miscellaneous operations. | Column | Type | Source | Description | |--------|------|--------|-------------| | `id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `level` | TEXT NOT NULL | Computed | Log level. | | `status` | TEXT NOT NULL | Computed | Status message. | | `attempted` | TIMESTAMPTZ NOT NULL | Auto-generated | When the operation was attempted. | --- #### working_commits Tracks the current working commit for facade processing per repository. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repos_id` | INT NOT NULL | Computed | Repository ID. | | `working_commit` | TEXT | Git: `git log --numstat` (default branch only, per v0.16.4) | The commit hash currently being processed. | *No primary key defined.* --- ## aveloxis_ops Schema ### Collection Pipeline #### staging Raw API responses land here before being parsed and inserted into `aveloxis_data` tables. Acts as a durable inbox for the ETL pipeline. | Column | Type | Source | Description | |--------|------|--------|-------------| | `staging_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT NOT NULL (FK -> repos) | Computed | Repository this payload is for. | | `platform_id` | SMALLINT NOT NULL (FK -> platforms) | Computed | Platform the data came from. | | `entity_type` | TEXT NOT NULL | Computed | Type of entity (e.g., `"issues"`, `"pull_requests"`, `"releases"`). | | `payload` | JSONB NOT NULL | GitHub REST API / GitLab API v4 | Raw API response body. | | `created_at` | TIMESTAMPTZ | Auto-generated | When the payload was staged. | | `processed` | BOOLEAN | Computed | Whether the payload has been parsed into data tables. Default `FALSE`. | **Index:** `(repo_id, entity_type) WHERE NOT processed` --- #### collection_queue Postgres-backed priority queue that drives the collection pipeline. Each repo has at most one row. Workers claim repos by locking rows. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_id` | BIGINT (PK, FK -> repos) | Computed | Repository. | | `priority` | INT NOT NULL | Computed | Priority (lower = higher priority). Default `100`. | | `status` | TEXT NOT NULL | Computed | Queue status: `'queued'`, `'running'`, `'done'`, `'error'`. | | `due_at` | TIMESTAMPTZ NOT NULL | Computed | When this repo is next due for collection. | | `locked_by` | TEXT | Computed | Worker instance ID that holds the lock. | | `locked_at` | TIMESTAMPTZ | Computed | When the lock was acquired. | | `last_collected` | TIMESTAMPTZ | Computed | When collection last completed. | | `last_error` | TEXT | Computed | Error message from last failed run. | | `last_issues` | INT | Computed | Issues collected in the last run. | | `last_prs` | INT | Computed | PRs collected in the last run. | | `last_messages` | INT | Computed | Messages collected in the last run. | | `last_events` | INT | Computed | Events collected in the last run. | | `last_releases` | INT | Computed | Releases collected in the last run. | | `last_contributors` | INT | Computed | Contributors collected in the last run. | | `last_duration_ms` | BIGINT | Computed | Duration of the last collection run in milliseconds. | | `updated_at` | TIMESTAMPTZ | Auto-generated | Last update timestamp. | **Index:** `(priority, due_at) WHERE status = 'queued'` --- #### collection_status Tracks the overall collection status for each repo across multiple pipeline stages (core, secondary, facade, ML). | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_id` | BIGINT (PK, FK -> repos) | Computed | Repository. | | `core_status` | TEXT | Computed | Status of core data collection (issues, PRs). Default `'Pending'`. | | `core_task_id` | TEXT | Computed | Task ID for the core collection worker. | | `core_data_last_collected` | TIMESTAMPTZ | Computed | When core data was last collected. | | `core_weight` | BIGINT | Computed | Weight/size of core data workload. | | `secondary_status` | TEXT | Computed | Status of secondary data collection (reviews, files). Default `'Pending'`. | | `secondary_task_id` | TEXT | Computed | Task ID for the secondary worker. | | `secondary_data_last_collected` | TIMESTAMPTZ | Computed | When secondary data was last collected. | | `secondary_weight` | BIGINT | Computed | Weight of secondary data workload. | | `facade_status` | TEXT | Computed | Status of facade/git log collection. Default `'Pending'`. | | `facade_task_id` | TEXT | Computed | Task ID for the facade worker. | | `facade_data_last_collected` | TIMESTAMPTZ | Computed | When facade data was last collected. | | `facade_weight` | BIGINT | Computed | Weight of facade workload. | | `event_last_collected` | TIMESTAMPTZ | Computed | When events were last collected. | | `issue_pr_sum` | BIGINT | Computed | Total issues + PRs (used for workload estimation). | | `commit_sum` | BIGINT | Computed | Total commits (used for workload estimation). | | `ml_status` | TEXT | Computed | Status of ML pipeline. Default `'Pending'`. | | `ml_task_id` | TEXT | Computed | Task ID for the ML worker. | | `ml_data_last_collected` | TIMESTAMPTZ | Computed | When ML data was last produced. | | `ml_weight` | BIGINT | Computed | Weight of ML workload. | | `updated_at` | TIMESTAMPTZ | Auto-generated | Last update timestamp. | --- ### API Credentials #### worker_oauth OAuth tokens and API keys used by collection workers to authenticate with GitHub and GitLab APIs. | Column | Type | Source | Description | |--------|------|--------|-------------| | `oauth_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `name` | TEXT NOT NULL | User input | Descriptive name for this credential. | | `consumer_key` | TEXT NOT NULL | User input | OAuth consumer key (for OAuth 1.0 flows). | | `consumer_secret` | TEXT NOT NULL | User input | OAuth consumer secret. | | `access_token` | TEXT NOT NULL | User input | Access token (PAT or OAuth token). | | `access_token_secret` | TEXT NOT NULL | User input | OAuth access token secret. | | `repo_directory` | TEXT | User input | Local directory for git operations using this credential. | | `platform` | TEXT NOT NULL | User input | Platform name: `'github'` or `'gitlab'`. Default `'github'`. | | `rate_limit` | INT | User input | Rate limit for this token. Default `5000`. | | `created_at` | TIMESTAMPTZ | Auto-generated | When this credential was added. | **Unique constraint:** `(access_token, platform)` --- ### Users and Auth #### users User accounts for the Aveloxis web interface and API. | Column | Type | Source | Description | |--------|------|--------|-------------| | `user_id` | SERIAL (PK) | Auto-generated | Primary key. | | `login_name` | TEXT NOT NULL UNIQUE | User input | Username. | | `login_hashword` | TEXT NOT NULL | User input | Hashed password. | | `email` | TEXT NOT NULL UNIQUE | User input | Email address. | | `text_phone` | TEXT UNIQUE | User input | Phone number for notifications. | | `first_name` | TEXT NOT NULL | User input | First name. | | `last_name` | TEXT NOT NULL | User input | Last name. | | `admin` | BOOLEAN NOT NULL | User input | Whether this user is an admin. Default `FALSE`. | | `email_verified` | BOOLEAN NOT NULL | Computed | Whether the email has been verified. Default `FALSE`. | | | | | *Standard metadata columns* | --- #### user_groups Named groups of repositories curated by users. Users can organize repos into groups for easier management. | Column | Type | Source | Description | |--------|------|--------|-------------| | `group_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `user_id` | INT NOT NULL (FK -> users) | Computed | Owning user. | | `name` | TEXT NOT NULL | User input | Group name. | | `favorited` | BOOLEAN NOT NULL | User input | Whether this group is favorited. Default `FALSE`. | **Unique constraint:** `(user_id, name)` --- #### user_repos Join table linking repos to user groups. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_id` | BIGINT NOT NULL (PK part 1) | User input | Repository. | | `group_id` | BIGINT NOT NULL (PK part 2, FK -> user_groups) | User input | User group. | **Primary key:** `(group_id, repo_id)` --- #### client_applications Registered API client applications (for OAuth flows). | Column | Type | Source | Description | |--------|------|--------|-------------| | `id` | TEXT (PK) | Auto-generated | Application ID. | | `api_key` | TEXT NOT NULL | Auto-generated | API key for the application. | | `user_id` | INT NOT NULL (FK -> users) | Computed | User who owns this application. | | `name` | TEXT NOT NULL | User input | Application name. | | `redirect_url` | TEXT NOT NULL | User input | OAuth redirect URL. | --- #### user_session_tokens Active session tokens for authenticated users. | Column | Type | Source | Description | |--------|------|--------|-------------| | `token` | TEXT (PK) | Auto-generated | Session token string. | | `user_id` | INT NOT NULL (FK -> users) | Computed | User this session belongs to. | | `created_at` | BIGINT | Auto-generated | Creation timestamp (Unix epoch). | | `expiration` | BIGINT | Computed | Expiration timestamp (Unix epoch). | | `application_id` | TEXT (FK -> client_applications) | Computed | Client application that initiated this session. | --- #### refresh_tokens Refresh tokens for renewing expired session tokens. | Column | Type | Source | Description | |--------|------|--------|-------------| | `id` | TEXT (PK) | Auto-generated | Refresh token string. | | `user_session_token` | TEXT NOT NULL UNIQUE (FK -> user_session_tokens) | Computed | The session token this refresh token can renew. | --- #### subscription_types Types of event subscriptions available (for webhook/notification systems). | Column | Type | Source | Description | |--------|------|--------|-------------| | `id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `name` | TEXT NOT NULL UNIQUE | User input | Subscription type name. | --- #### subscriptions Links client applications to the event types they are subscribed to. | Column | Type | Source | Description | |--------|------|--------|-------------| | `application_id` | TEXT NOT NULL (PK part 1, FK -> client_applications) | Computed | Client application. | | `type_id` | BIGINT NOT NULL (PK part 2, FK -> subscription_types) | Computed | Subscription type. | **Primary key:** `(application_id, type_id)` --- ### Configuration #### augur_settings Legacy settings store, maintained for compatibility with Augur tooling. | Column | Type | Source | Description | |--------|------|--------|-------------| | `id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `setting` | TEXT | User input / Augur import | Setting name. | | `value` | TEXT | User input / Augur import | Setting value. | | `last_modified` | TIMESTAMPTZ | Auto-generated | Last modification timestamp. | --- #### config Structured configuration store with section/setting hierarchy. | Column | Type | Source | Description | |--------|------|--------|-------------| | `id` | SMALLSERIAL (PK) | Auto-generated | Primary key. | | `section_name` | TEXT NOT NULL | User input | Configuration section (e.g., `"Database"`, `"Server"`). | | `setting_name` | TEXT NOT NULL | User input | Setting name within the section. | | `value` | TEXT | User input | Setting value. | | `type` | TEXT | User input | Value type hint (e.g., `"str"`, `"int"`, `"bool"`). | **Unique constraint:** `(section_name, setting_name)` --- ### GitHub Users (Affiliation Data) #### github_users Cached GitHub user data used for affiliation resolution. Populated by scanning commit authors and looking up their GitHub profiles. | Column | Type | Source | Description | |--------|------|--------|-------------| | `login` | TEXT | GitHub REST: `/search/users`, `/users/{login}` | GitHub login. | | `email` | TEXT | GitHub REST: `/users/{login}` | Email address. | | `affiliation` | TEXT | `aveloxis-commit-resolver` | Resolved organizational affiliation. | | `source` | TEXT | `aveloxis-commit-resolver` | How the affiliation was determined. | | `commits` | TEXT | Computed | Commit count. | | `location` | TEXT | GitHub REST: `/users/{login}` | Profile location. | | `country_id` | TEXT | Computed | Resolved country code. | *No primary key defined.* --- ### Network Weighted Tables Pre-computed weighted contributor-action matrices used for network analysis. #### network_weighted_commits Weighted commit activity per contributor per repository. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_id` | BIGINT | Computed | Repository. | | `cntrb_id` | UUID | Computed | Contributor. | | `weight` | FLOAT | Computed | Weight/score of the activity. | | `action_type` | TEXT | Computed | Type of commit action. | | `user_collection` | TEXT | Computed | Collection method used. | | `data_collection_date` | TIMESTAMPTZ | Auto-generated | When this record was computed. | *No primary key defined.* --- #### network_weighted_issues Weighted issue activity per contributor per repository. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_id` | BIGINT | Computed | Repository. | | `cntrb_id` | UUID | Computed | Contributor. | | `weight` | FLOAT | Computed | Weight/score of the activity. | | `action_type` | TEXT | Computed | Type of issue action. | | `user_collection` | TEXT | Computed | Collection method used. | | `data_collection_date` | TIMESTAMPTZ | Auto-generated | When this record was computed. | *No primary key defined.* --- #### network_weighted_pr_reviews Weighted PR review activity per contributor per repository. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_id` | BIGINT | Computed | Repository. | | `cntrb_id` | UUID | Computed | Contributor. | | `weight` | FLOAT | Computed | Weight/score of the activity. | | `action_type` | TEXT | Computed | Type of review action. | | `user_collection` | TEXT | Computed | Collection method used. | | `data_collection_date` | TIMESTAMPTZ | Auto-generated | When this record was computed. | *No primary key defined.* --- #### network_weighted_prs Weighted pull request activity per contributor per repository. | Column | Type | Source | Description | |--------|------|--------|-------------| | `repo_id` | BIGINT | Computed | Repository. | | `cntrb_id` | UUID | Computed | Contributor. | | `weight` | FLOAT | Computed | Weight/score of the activity. | | `action_type` | TEXT | Computed | Type of PR action. | | `user_collection` | TEXT | Computed | Collection method used. | | `data_collection_date` | TIMESTAMPTZ | Auto-generated | When this record was computed. | *No primary key defined.* --- ### Worker Management #### worker_history Audit log of worker task executions. One row per worker run. | Column | Type | Source | Description | |--------|------|--------|-------------| | `history_id` | BIGSERIAL (PK) | Auto-generated | Primary key. | | `repo_id` | BIGINT | Computed | Repository that was processed. | | `worker` | TEXT NOT NULL | Computed | Worker name/identifier. | | `job_model` | TEXT NOT NULL | Computed | Job model name. | | `oauth_id` | INT | Computed | OAuth credential used. | | `timestamp` | TIMESTAMPTZ NOT NULL | Auto-generated | When the worker ran. | | `status` | TEXT NOT NULL | Computed | Outcome status (e.g., `"Success"`, `"Error"`). | | `total_results` | INT | Computed | Number of records processed. | --- #### worker_job Persistent state for each worker job model. Tracks progress across restarts. | Column | Type | Source | Description | |--------|------|--------|-------------| | `job_model` | TEXT (PK) | Computed | Job model name (e.g., `"issues"`, `"pull_requests"`). | | `state` | INT NOT NULL | Computed | Current state code. Default `0`. | | `zombie_head` | INT | Computed | For zombie detection: expected head position. | | `since_id_str` | TEXT NOT NULL | Computed | Cursor/since-ID for incremental collection. Default `'0'`. | | `description` | TEXT | Computed | Human-readable description. | | `last_count` | INT | Computed | Records processed in the last run. | | `last_run` | TIMESTAMPTZ | Computed | When the job last ran. | | `analysis_state` | INT | Computed | State of any post-collection analysis. | | `oauth_id` | INT NOT NULL | Computed | Default OAuth credential for this job. | --- #### worker_settings_facade Configuration settings specific to the facade (git log) worker. | Column | Type | Source | Description | |--------|------|--------|-------------| | `id` | INT (PK) | User input | Primary key. | | `setting` | TEXT NOT NULL | User input | Setting name. | | `value` | TEXT NOT NULL | User input | Setting value. | | `last_modified` | TIMESTAMPTZ NOT NULL | Auto-generated | Last modification timestamp. | --- ### Fetch Log and Working Commits (ops) #### repos_fetch_log (aveloxis_ops) Operational fetch log (mirrors `aveloxis_data.repos_fetch_log` for the ops schema). | Column | Type | Source | Description | |--------|------|--------|-------------| | `repos_id` | INT NOT NULL | Computed | Repository ID. | | `status` | TEXT NOT NULL | Computed | Fetch status. | | `date` | TIMESTAMPTZ NOT NULL | Auto-generated | Fetch attempt timestamp. | *No primary key defined.* --- #### working_commits (aveloxis_ops) Operational working commit tracker (mirrors `aveloxis_data.working_commits` for the ops schema). | Column | Type | Source | Description | |--------|------|--------|-------------| | `repos_id` | INT NOT NULL | Computed | Repository ID. | | `working_commit` | TEXT | Git: `git log --numstat` (default branch only, per v0.16.4) | Current working commit hash. | *No primary key defined.* --- ## Indexes The schema defines the following additional indexes for query performance: ### aveloxis_data indexes | Index | Table | Columns | Condition | |-------|-------|---------|-----------| | `idx_contributors_login` | contributors | `(cntrb_login)` | `WHERE cntrb_login != ''` | | `idx_commits_repo_hash` | commits | `(repo_id, cmt_commit_hash)` | | | `idx_commits_author_email` | commits | `(cmt_author_email)` | | | `idx_commits_author_raw_email` | commits | `(cmt_author_raw_email)` | | | `idx_commits_committer_raw_email` | commits | `(cmt_committer_raw_email)` | | | `idx_commits_author_affiliation` | commits | `(cmt_author_affiliation)` | | | `idx_dm_repo_annual_repo_aff` | dm_repo_annual | `(repo_id, affiliation)` | | | `idx_issues_repo_id` | issues | `(repo_id)` | | | `idx_issues_updated_at` | issues | `(updated_at)` | | | `idx_pull_requests_repo_id` | pull_requests | `(repo_id)` | | | `idx_pull_requests_updated_at` | pull_requests | `(updated_at)` | | | `idx_messages_repo_id` | messages | `(repo_id)` | | | `idx_issue_events_repo_id` | issue_events | `(repo_id)` | | | `idx_pr_events_repo_id` | pull_request_events | `(repo_id)` | | | `idx_releases_repo_id` | releases | `(repo_id)` | | | `idx_repo_info_repo_id` | repo_info | `(repo_id)` | | | `idx_contributor_identities_cntrb` | contributor_identities | `(cntrb_id)` | | | `idx_commit_parents_cmt` | commit_parents | `(cmt_id)` | | | `idx_repo_labor_repo_id` | repo_labor | `(repo_id)` | | | `idx_repo_deps_libyear_repo_id` | repo_deps_libyear | `(repo_id)` | | | `idx_repo_deps_scorecard_repo_id` | repo_deps_scorecard | `(repo_id)` | | | `idx_repo_dependencies_repo_id` | repo_dependencies | `(repo_id)` | | ### aveloxis_ops indexes | Index | Table | Columns | Condition | |-------|-------|---------|-----------| | `idx_staging_unprocessed` | staging | `(repo_id, entity_type)` | `WHERE NOT processed` | | `idx_queue_due` | collection_queue | `(priority, due_at)` | `WHERE status = 'queued'` |