# Materialized Views Aveloxis creates **20 materialized views + 2 alias views** under the `aveloxis_data` schema. v0.25.5 first reduced the count from 22 by dropping the byte-for-byte duplicate `augur_new_contributors` matview and converting `explorer_libyear_all` to an alias VIEW. v0.25.6 then restored `augur_new_contributors` as a plain VIEW alias (operators query it to identify new contributors), so today's count is 20 matviews + 2 alias views. The matviews pre-compute analytical queries that are too expensive to run live every time an analyst (or a tool like [8Knot](https://github.com/oss-aspen/8Knot)) opens a dashboard. They are rebuilt on a weekly cadence (default Saturday — configurable via `collection.matview_rebuild_day`) and on demand via `aveloxis refresh-views`. Alias views read live from their underlying matview and need no separate refresh. This page explains, for each view, **what a row means**, **what the complete table tells you**, and **how an open source health and sustainability analyst would actually use it**. The audience is operators and analysts, not SQL authors — the goal is to make the catalog useful without requiring a read of the underlying query. --- ## Refresh schedule The full set rebuilds weekly. Operators tune the day via `collection.matview_rebuild_day` (default `saturday`) and can force an out-of-band rebuild with `aveloxis refresh-views`. During a rebuild the scheduler pauses collection workers, refreshes each view sequentially (CONCURRENTLY where possible), then resumes collection. If the underlying data has changed only modestly since the last rebuild, the most-recent view contents continue to be query-able with stale-but-consistent data; consumers don't see partial state. --- ## Quick reference | Category | Views | |---|---| | **Bulk counters** | `api_get_all_repos_commits`, `api_get_all_repos_issues`, `api_get_all_repo_prs` | | **Catalog / navigation** | `explorer_entry_list`, `explorer_user_repos` | | **Contributor activity** | `explorer_contributor_actions`, `explorer_contributor_recent_actions`, `explorer_new_contributors`, `explorer_cntrb_per_file`, `issue_reporter_created_at` | | **Commits over time** | `explorer_commits_and_committers_daily_count` | | **Pull requests** | `explorer_pr_response`, `explorer_pr_response_times`, `explorer_pr_assignments`, `explorer_pr_files` | | **Issues** | `explorer_issue_assignments` | | **Files / code** | `explorer_repo_files`, `explorer_repo_languages` | | **Dependencies** | `explorer_libyear_all`, `explorer_libyear_summary`, `explorer_libyear_detail` | --- ## Bulk counters These three views answer "how big is this repo, by activity volume?" in constant time. Each is a simple GROUP BY count over its source table. Useful as a lightweight first signal before drilling deeper. ### `api_get_all_repos_commits` **One row per repo.** Columns: `repo_id`, `commits_all_time` (distinct commit hashes). **What the table tells you**: total distinct commits ever recorded by Aveloxis for each repo. Distinct on commit hash, so a commit that touches 30 files is counted once (the raw `commits` table stores one row per file per commit). **Health and sustainability use**: - Repo-size baseline. A repo with 50 commits is in a very different lifecycle stage than one with 500,000. - Distinguishing active development from drive-by mirrors. A "repo" that has only 3 commits ever is almost certainly an aborted fork or a documentation-only placeholder. - First-cut comparison across repos in a foundation cohort. Sort the foundation's repos by commit count and you see the "core" projects vs the experimental ones at a glance. ### `api_get_all_repos_issues` **One row per repo.** Columns: `repo_id`, `issues_all_time`. Excludes pull requests via `WHERE pull_request IS NULL` (GitHub's REST API counts PRs as a special kind of issue; this view filters them back out). **Health and sustainability use**: - Community engagement signal. A repo with 5,000 issues has a different user base than one with 5. - Combined with the PR counter below, gives you the issue-to-PR ratio — a project where reported issues vastly outnumber PRs may be under-resourced for code work; the reverse can indicate a tight maintainer team that triages internally rather than via the issue queue. ### `api_get_all_repo_prs` **One row per repo.** Columns: `repo_id`, `pull_requests_all_time`. **Health and sustainability use**: - Throughput proxy. The total PR count over the life of the repo measures how much code-change activity it has absorbed. - Distinguishing repos by collaboration model. A repo with thousands of PRs from external contributors has a different sustainability profile than one with five PRs all from the same employee. --- ## Catalog and navigation ### `explorer_entry_list` **One row per repo.** Columns: `repo_git`, `repo_id`, `repo_name`, `rg_name` (the Augur-era repo group name from `aveloxis_data.repo_groups`). **What the table tells you**: the complete list of tracked repos with their organizational grouping. Effectively the "what's in this Aveloxis instance" picklist that 8Knot's dropdown uses to populate its repo-selector UI. **Health and sustainability use**: - Inventory question: "what's in scope for our analysis?" Run this view, get a 52K-row CSV, hand it to a stakeholder before they start asking questions. - Foundation-level breakdowns by `rg_name` — if you grouped at ingest time by foundation/affiliation/cohort, this view gives you the membership of each group. ### `explorer_user_repos` **One row per (web-UI user, repo) pair.** Columns: `login_name`, `user_id`, `group_id`, `repo_id`. This view is about the **Aveloxis web UI's per-user dashboards**, not about who contributes to repos. It maps which logged-in user has which repos visible on their dashboard via the v0.19.0+ group-approval workflow. **Health and sustainability use**: - Operations / governance: who's allowed to see what in the Aveloxis instance? - Not a contributor analysis view despite the name. Don't reach for this when answering "who contributed to project X." --- ## Contributor activity This group is the largest and most analytically valuable. All five "contributor action" views share the same conceptual model: every event a contributor performs on a repo (commit, issue open, PR open, review, comment, etc.) is captured as a row tagged with what kind of action it was. They differ in **what subset of actions they keep** and **whether they rank**. ### `explorer_contributor_actions` (the parent view) **One row per (contributor, repo, action, timestamp) tuple, with a per-(contributor, repo) rank.** Columns: `cntrb_id`, `created_at`, `repo_id`, `action`, `repo_name`, `login`, `rank`. The `action` column takes one of these values: - `commit` — author of a git commit (resolved via `cmt_ght_author_id`) - `issue_opened` — opened a non-PR issue - `issue_closed` — closed a non-PR issue (from issue_events) - `issue_comment` — wrote a comment on an issue - `pull_request_open` — opened a PR - `pull_request_closed` — closed a PR without merging - `pull_request_merged` — closed a PR by merging it - `pull_request_comment` — wrote a comment on a PR - `pull_request_review_` — submitted a PR review (state = `approved` / `changes_requested` / `commented`) The `rank` column numbers the actions for each (contributor, repo) pair from most-recent-first: rank=1 is the contributor's most recent action on that repo, rank=2 the next most recent, and so on. **What the table tells you**: the complete behavioral history of every contributor on every repo, with implicit per-pair recency ordering. This is the canonical "who did what when" timeline view. **Health and sustainability use**: - **Contributor onboarding analysis**. Filter to `rank` near the maximum per contributor to see each contributor's earliest activity on a repo — when did they show up? With what action type? Did they comment first then commit, or jump straight to code? - **Engagement depth**. Count distinct action types per contributor per repo. A contributor who only commits is a different profile than one who commits, reviews, comments, and opens issues — the latter is part of the social fabric. - **Activity decay**. For a given contributor's repo, the spread between rank=1 and the rank at their first action is their tenure; how far back is the most-recent action tells you whether they're still active. - **Cross-repo profiles**. Aggregate by `cntrb_id` across `repo_id` to see contributors who span multiple projects in your scope. ### `augur_new_contributors` (VIEW alias for `explorer_contributor_actions` as of v0.25.6) Pre-v0.25.5 this was a MATERIALIZED VIEW with SQL byte-for-byte identical to `explorer_contributor_actions`. v0.25.5 dropped it outright; v0.25.6 restored it as a plain VIEW alias (`CREATE OR REPLACE VIEW aveloxis_data.augur_new_contributors AS SELECT * FROM aveloxis_data.explorer_contributor_actions`) because operators query it to identify new contributors. The alias adds zero refresh cost — every query against `augur_new_contributors` transparently reads from the already-materialized `explorer_contributor_actions`. Same shape as the `explorer_libyear_all` alias for `explorer_libyear_summary`. ### `explorer_contributor_recent_actions` **Same shape as `explorer_contributor_actions`, but time-bounded to actions from the last 13 months.** **What the table tells you**: the rolling "what's happening *now*" view. The 13-month window catches a full year of activity plus the previous month for comparison, useful for "year-over-year same month" patterns. **Health and sustainability use**: - **Active contributor count**. Distinct `cntrb_id` over the last year, segmented by action type. A repo with 100 commit-authors but only 5 reviewers has a review-capacity problem. - **Recent newcomers**. Anyone whose earliest row in this view (`rank` near max) is within the last few months is a potential new contributor — useful for outreach lists. - **Activity falloff alerts**. Project that had 50 active contributors a year ago and 5 now is in trouble; this view lets you compute that delta directly. ### `explorer_new_contributors` **Same shape as `explorer_contributor_actions`, filtered to `rank IN (1..7)`** — keeps only the 7 most recent actions per (contributor, repo). The name is misleading inherited Augur terminology — "new contributors" suggests first-time-ever, but the actual filter is "most recent 7 actions," which is information about engagement breadth rather than newness. Use this view when you want a per-(contributor, repo) snapshot capped at a reasonable size for dashboard display. **Health and sustainability use**: - **Per-contributor activity snippet for repo pages**. "What has this person done here recently?" without dumping their entire history into a UI. - **Bus factor / depth-of-engagement**. A contributor with 7 different action types in their last 7 actions on a repo is more deeply engaged than one whose last 7 are all commits — the former is harder to lose because their contribution shape is more varied. ### `explorer_cntrb_per_file` **One row per (repo, file path)** with two CSV-string columns aggregating who's touched the file. Columns: `repo_id`, `file_path`, `cntrb_ids` (CSV of PR-author cntrb_ids who modified the file), `reviewer_ids` (CSV of cntrb_ids who reviewed PRs modifying the file). **What the table tells you**: for every file in every tracked repo, who's authored a change to it via a PR and who's reviewed those PRs. Roughly an "ownership map" at the file level. **Health and sustainability use**: - **Bus factor at file granularity**. A file whose `cntrb_ids` has only one entry across all of history is single-maintainer; if that person leaves, the file is orphaned. - **Review coverage**. A file with 100 PR-authors but only 2 reviewers is under-reviewed — code is being merged without diverse eyes on it. - **Refactoring / migration planning**. To rewrite `src/auth/oauth.py`, who do you need to consult? This view answers it directly. ### `issue_reporter_created_at` **One row per issue**, with the reporter's cntrb_id, the issue's created_at, and the repo_id. A bare projection of the `issues` table. **What the table tells you**: an issue-reporter time series at the repo level. The "legacy" tag in the original Augur schema; preserved for 8Knot compatibility. **Health and sustainability use**: - **Reporter cadence**. Count of distinct reporters per repo per month — how diverse is the issue queue's source? - **Mostly superseded** by `explorer_contributor_actions` filtered to `action = 'issue_opened'`, which carries the same data plus contributor login and other context. --- ## Commits over time ### `explorer_commits_and_committers_daily_count` **One row per (repo, committer_date) pair.** Columns: `repo_id`, `repo_name`, `cmt_committer_date`, `num_of_commits` (count of commit-file rows that day), `num_of_unique_committers` (distinct raw email count that day). Note: `num_of_commits` here counts `cmt_id` (one per file per commit), not distinct commit hashes — so a day with one big commit touching 50 files reports `num_of_commits = 50`. The neighboring `api_get_all_repos_commits` counts distinct hashes; this view trades semantic precision for an indicator that's sensitive to commit *size*. **What the table tells you**: daily activity heartbeat per repo. Plot it on a calendar and you see weekend lulls, holiday gaps, sprint cadences, and "the lead developer went on vacation" troughs. **Health and sustainability use**: - **Bus factor over time**. `num_of_unique_committers` per day, per week, per month — if it's consistently 1, the project relies on a single person. - **Burst-vs-steady patterns**. A project with concentrated bursts followed by months of silence has a different operational profile than one with daily commits; both can be healthy but they fail in different ways. - **Holiday / weekend signal**. The healthy projects with paid maintainers show clear weekend troughs and holiday flatlines; volunteer projects often show the opposite. Tells you something about the funding model without reading the funding documents. --- ## Pull requests ### `explorer_pr_response_times` **One row per PR** with **38 columns** — far and away the widest matview. Includes hours/days to first response, hours/days to close, hours/days to last response, total lines added/removed, file count, commit count, PR meta (base branch label, head/base distinction), and per-event-type counts (assigned, review_requested, labeled, etc.). **What the table tells you**: a comprehensive PR-by-PR analytical record. The single most useful view for understanding PR-driven collaboration health. **Health and sustainability use**: - **Responsiveness metrics**. Median / P90 hours to first response across all PRs in a repo, time-bucketed quarterly, shows whether the maintainers are keeping up with submissions over time. - **Time-to-merge distributions**. Sustainable projects converge to a stable distribution; declining projects show the distribution stretching out as maintenance attention fades. - **External-contributor experience**. Filter to PRs where the `pr_src_author_association` is not `MEMBER`/`OWNER` — those are the people whose first interaction defines whether they'll come back. P90 of their time-to-first-response is the most actionable single metric for community health. - **Review activity**. The `comment_count`, `review_requested_count`, and `merged_count` columns together describe how reviewed the PR was before merge. ### `explorer_pr_response` **One row per PR per response message.** Columns: `pull_request_id`, `id` (repo_id), `cntrb_id` (the PR author), `msg_timestamp`, `msg_cntrb_id` (who responded), `pr_created_at`, `pr_closed_at`. The underlying query unions PR conversation comments with inline review-message comments, so a row is "PR P got a message from user M at time T." **What the table tells you**: the message-level response stream for every PR. Where `explorer_pr_response_times` aggregates to per-PR totals, this view keeps individual response events queryable. **Health and sustainability use**: - **Who responds to whose PRs?** Pivot by (PR author, responder) to map the social graph of collaboration. - **Time-between-responses fine-grained analysis**. Per-PR aggregates hide that one PR can have a 2-day initial response then 6 months of silence; this view lets you reconstruct the actual cadence. ### `explorer_pr_assignments` **One row per (PR, assignment event) pair**, including PRs with no assignment events (LEFT JOIN). Columns: `pull_request_id`, `id` (repo_id), `created` (PR created_at), `closed` (PR closed_at), `assign_date`, `assignment_action` (`assigned` or `unassigned`), `assignee` (cntrb_id), `node_id`. **What the table tells you**: the full assignment history of every PR. A PR can have multiple assign/unassign cycles; this view captures all of them. **Health and sustainability use**: - **Maintainer workload distribution**. Count `assigned` events per `assignee` over time — is one person consistently the catcher? - **Triage churn**. PRs with multiple assign/unassign cycles likely fell through the cracks repeatedly. - **Time-from-open-to-assignment**. `assign_date - created` per PR shows triage responsiveness. ### `explorer_pr_files` **One row per (PR, file path) pair.** Columns: `file_path`, `pull_request_id`, `repo_id`. **What the table tells you**: the file-level fingerprint of every PR. For PR #X, which files did it touch? **Health and sustainability use**: - **Co-edit analysis**. Files that frequently appear together in PRs likely have a hidden coupling — useful for refactoring planning. - **File touch frequency**. Count PRs per file. A file that's modified in 80% of PRs is a hotspot (probably central / fragile); a file modified by 1% of PRs is on the periphery (probably stable or dead code). --- ## Issues ### `explorer_issue_assignments` **Same shape as `explorer_pr_assignments`** but for issues. One row per (issue, assignment event) including issues with no assignments. **Health and sustainability use**: - **Issue triage coverage**. Percentage of issues that ever get an `assigned` event tells you how much of the issue queue receives explicit attention vs being left to drift. - **Assignee load balance**. The cntrb_id with the most `assigned` events is the project's de-facto triage lead — useful for understanding informal governance structure. --- ## Files and code ### `explorer_repo_files` **One row per (repo, file path) for the latest scan only.** Columns: `id` (repo_id), `repo_name`, `repo_path`, `rl_analysis_date`, `file_path`, `file_name`. The view uses `DISTINCT ON (repo_id) ORDER BY rl_analysis_date DESC` to keep only the most-recent scan per repo from the cumulative `repo_labor` table (one row per file per scan per repo). **What the table tells you**: the current file structure of every tracked repo as of its last scancode/scc analysis. Effectively "what's in this repo right now." **Health and sustainability use**: - **Repo size distribution**. Count files per repo. A repo with 5 files and one with 50,000 are at completely different scales. - **Naming pattern surveys**. Pattern-match `file_name` across the corpus to see how common particular conventions are (`Makefile`, `requirements.txt`, `pyproject.toml`, etc.). ### `explorer_repo_languages` **One row per (repo, programming_language) pair from the latest scan.** Columns: `repo_id`, `repo_git`, `repo_name`, `programming_language`, `code_lines` (sum), `files` (count). The "latest scan" rule is computed by finding `MAX(data_collection_date)` per repo and keeping rows within 5 minutes of that max (the "scan-batch" window). **What the table tells you**: language breakdown for every tracked repo. A row per language per repo, plus aggregated line counts and file counts. **Health and sustainability use**: - **Polyglot complexity**. Count distinct languages per repo. A 1-language repo has different maintenance needs than a 10-language one. - **Ecosystem inventory**. "How much Python code is in our foundation's repos?" Sum `code_lines` filtered by `programming_language = 'Python'` across repos in scope. - **Language transition tracking**. By comparing two snapshots, identify repos shifting from old to new languages (Python 2 → 3 migrations were visible this way in 2018-2020). --- ## Dependencies These three views aggregate the `repo_deps_libyear` table (one row per (repo, dependency, scan_date) with the libyear value — how many years out of date the pinned version is compared to the latest available). **Pre-v0.25.5 bug, now fixed**: all three views previously used comma-join syntax without a join condition between `repos` and `repo_deps_libyear` — a CROSS JOIN producing 33-billion-row intermediates at fleet scale AND semantically wrong output (every repo's "average libyear" was the fleet-wide average, not the repo's own). v0.25.5 added the missing `USING (repo_id)` join condition and switched the date column from `repos.data_collection_date` (when the catalog row was last touched) to `repo_deps_libyear.data_collection_date` (when libyear data was gathered). ### `explorer_libyear_summary` **One row per (repo, month, year).** Columns: `repo_id`, `repo_name`, `avg_libyear`, `month`, `year`. **What the table tells you**: track dependency staleness over time per repo. A repo whose `avg_libyear` is climbing month-over-month is letting its deps drift; a repo whose `avg_libyear` is dropping is actively updating. **Health and sustainability use**: - **Dependency-update discipline**. Trend per repo over time. Healthy projects keep this number bounded; declining or abandoned projects let it grow indefinitely. - **Supply-chain risk**. Repos with high `avg_libyear` are running old versions of their dependencies and may be missing security patches — useful as a triage filter for security review. - **Foundation-level comparisons**. Sort all repos in a foundation by current `avg_libyear` to identify the worst-maintained dependency state. ### `explorer_libyear_all` (alias view, v0.25.5+) **Regular VIEW** (not materialized) that selects from `explorer_libyear_summary`. Pre-v0.25.5 was a separate MATERIALIZED VIEW with byte-for-byte identical SQL to `_summary`; v0.25.5 converted it to a view alias so the name remains queryable for downstream tooling while the storage + rebuild cost drops to zero. Queries against `explorer_libyear_all` and `explorer_libyear_summary` return identical results. ### `explorer_libyear_detail` **One row per (repo, dependency, version, requirement).** Columns: `repo_id`, `repo_name`, `name`, `requirement`, `current_version`, `latest_version`, `current_release_date`, `libyear`, `max` (latest collection date). **Intended use**: drill-down view. Where `_summary` tells you "this repo's average libyear is 4.2," `_detail` tells you "specifically these 12 dependencies contribute, with these versions and these libyear values." **Health and sustainability use**: - **Pinpoint the laggard**. The single dep with the highest `libyear` is usually the one to upgrade first. - **Lifecycle classification**. A dep with `libyear = 0` is current; `libyear > 5` is multiple major releases behind. Classify the repo's dep portfolio by these buckets. - **Migration planning**. "We want to drop Python 3.6 support." Filter `_detail` to repos with deps whose `latest_version` requires newer Python and you have the upgrade-path matrix. --- ## Monitoring refresh progress The weekly rebuild logs progress at INFO level: ``` INFO starting weekly materialized view refresh INFO pausing collection workers INFO refreshing api_get_all_repo_prs (CONCURRENTLY) INFO refreshing api_get_all_repos_commits (CONCURRENTLY) ... INFO refreshing explorer_contributor_recent_actions INFO materialized view refresh complete (took 4m23s) INFO resuming collection workers ``` Check freshness via the PostgreSQL catalog: ```sql SELECT schemaname, matviewname, pg_size_pretty(pg_total_relation_size(schemaname || '.' || matviewname)) AS size FROM pg_matviews WHERE schemaname = 'aveloxis_data' ORDER BY matviewname; ``` --- ## Column stability across releases Tooling that queries these matviews — most prominently [8Knot](https://github.com/oss-aspen/8Knot), but also operator dashboards and ad-hoc analyst scripts — needs to know which columns are stable across Aveloxis releases. Internal join-key choices change as we improve resolution coverage; the *output column shape* should not change without operator notice. ### The contract For each matview the **output column list** (names, types, order) is treated as a stable contract within a Aveloxis minor-version line. Breaking changes (renames, removals, type changes, semantic redefinition of a column) trigger a major version bump and are called out explicitly in the CLAUDE.md changelog. Internal-only details — how a column gets *computed*, which underlying table is joined to derive it, what join key links contributors to events — are NOT part of the contract. They change as data quality improves. Operators querying the matviews don't depend on these details and shouldn't notice the changes except via better coverage. ### Example: v0.25.6 commit-branch rewrite of `explorer_new_contributors` The columns (`cntrb_id`, `created_at`, `month`, `year`, `repo_id`, `repo_name`, `full_name`, `login`, `rank`) are byte-for-byte identical to v0.25.5. What changed internally: - Pre-v0.25.6: commit branch joined `contributors.cntrb_canonical = commits.cmt_author_email` (email-canonical chain, ~27% coverage on the production fleet). - v0.25.6: commit branch joins `contributors.cntrb_id = commits.cmt_ght_author_id` (deterministic UUID stamped by the commit resolver, ~92% coverage). The matview's `cntrb_id` column now contains 3-4× more rows of commit activity per (contributor, repo), but the column itself is still a contributor UUID with the same semantic ("the contributor who performed the action") and the same type (UUID). Downstream queries don't need to change. ### Example: v0.25.6 `augur_new_contributors` restoration v0.25.5 dropped the matview outright. v0.25.6 restored it as a plain VIEW alias — same columns, same content (it's now a transparent SELECT from `explorer_contributor_actions`), no refresh cost. Queries against `augur_new_contributors` work the same as before; the only operator-visible difference is that the alias is queryable live (no rebuild window) and adds zero storage. ### Verifying column stability To audit a matview's column list: ```sql SELECT attname, format_type(atttypid, atttypmod) AS coltype, attnum FROM pg_attribute WHERE attrelid = 'aveloxis_data.explorer_new_contributors'::regclass AND attnum > 0 AND NOT attisdropped ORDER BY attnum; ``` Run this once per Aveloxis upgrade and diff against the previous run. Any non-additive change should match a changelog entry; if it doesn't, that's a release-process bug worth filing. --- ## Troubleshooting ### "could not refresh materialized view concurrently" The view does not have a unique index. Aveloxis's built-in views all carry one; custom views you create must include `CREATE UNIQUE INDEX ON ...` before using `CONCURRENTLY`. ### Slow refresh Some views can take hours at fleet scale (especially `explorer_new_contributors`, `explorer_contributor_recent_actions`, `explorer_cntrb_per_file`, and `explorer_pr_response_times` on a 100K-repo / 474M-commit / 117M-PR-file deployment). When a refresh is taking too long: - Check `work_mem` and `maintenance_work_mem`. Increase to at least 256 MB and 1 GB respectively. - Verify the indexes documented in [the audit](../../summary/matview-audit-2026-05-26.md) are present. - A view's underlying tables (especially `commits` and `messages`) growing dramatically since the last rebuild can extend refresh time super-linearly; consider whether the view's defining query has the anti-patterns documented in that audit. ### Views out of date If analytics show stale data: ```bash # Manual refresh aveloxis refresh-views ``` Or wait for the next weekly automatic rebuild. --- ## Cross-references - [Audit findings, 2026-05-26](../../summary/matview-audit-2026-05-26.md) — performance issues + remediation plan for the matviews with scale problems. - [Analysis](analysis.md) — how `repo_labor` (drives `explorer_repo_files` / `explorer_repo_languages`) and `repo_deps_libyear` are populated. - [Scaling](../guide/scaling.md) — database tuning for the rebuild window. - [Overview](overview.md) — system architecture.