# Quick Start

Get Aveloxis collecting open source community health data in five steps.

---

## Prerequisites

Before starting, ensure you have:

- Aveloxis installed (see [Installation](installation.md))
- A running PostgreSQL 14+ instance
- At least one GitHub or GitLab personal access token

---

## Step 1: Create a config file

```bash
cp aveloxis.example.json aveloxis.json
```

Edit `aveloxis.json` with your database credentials:

```json
{
  "database": {
    "host": "localhost",
    "port": 5432,
    "user": "aveloxis",
    "password": "your-password",
    "dbname": "aveloxis",
    "sslmode": "prefer"
  }
}
```

```{important}
**Local development over HTTP:** If you plan to use the web GUI locally (without HTTPS), set `"dev_mode": true` in the `"web"` section of `aveloxis.json`. Without this, session cookies are marked `Secure` and browsers will not send them over plain HTTP, causing login to fail silently. Do not enable `dev_mode` in production.
```

If you do not have a database yet, create one:

```sql
-- Run in psql as a superuser
CREATE DATABASE aveloxis;
CREATE USER aveloxis WITH ENCRYPTED PASSWORD 'password';
GRANT ALL PRIVILEGES ON DATABASE aveloxis TO aveloxis;
ALTER DATABASE aveloxis OWNER TO aveloxis;
```

Or use Docker:

```bash
docker run -d --name aveloxis-db -p 5432:5432 \
  -e POSTGRES_DB=aveloxis \
  -e POSTGRES_USER=aveloxis \
  -e POSTGRES_PASSWORD=aveloxis \
  postgres:16
```

---

## Step 2: Create the database schema

```bash
aveloxis migrate
```

This creates 108 tables and 19 materialized views across two PostgreSQL schemas (`aveloxis_data` and `aveloxis_ops`). It is safe to run repeatedly -- all DDL uses `CREATE ... IF NOT EXISTS`.

---

## Step 3: Store your API keys

```bash
# GitHub token
aveloxis add-key ghp_your_github_token --platform github

# GitLab token (optional)
aveloxis add-key glpat-your_gitlab_token --platform gitlab
```

Keys are stored in `aveloxis_ops.worker_oauth` and loaded automatically on every run. You can add multiple keys for better throughput via round-robin rotation.

---

## Step 4: Add repos to the collection queue

### Add a single repo

```bash
aveloxis add-repo https://github.com/chaoss/augur
```

### Add multiple repos

```bash
aveloxis add-repo \
  https://github.com/torvalds/linux \
  https://github.com/chaoss/grimoirelab \
  https://gitlab.com/fdroid/fdroidclient
```

### Add all repos from a GitHub organization

```bash
aveloxis add-repo https://github.com/chaoss
```

When you pass an organization URL (no repo name), Aveloxis queries the GitHub/GitLab API to discover all repositories in that organization and adds them all to the queue.

Platform is auto-detected from the URL. GitLab nested subgroups are supported:

```
https://gitlab.com/group/subgroup/project
```

---

## Step 5: Start the scheduler

```bash
aveloxis serve --monitor :5555
```

This starts the long-running scheduler that:

- Continuously polls the queue for repos due for collection
- Runs the full staged pipeline (API collection, processing, facade, commit resolution, analysis)
- Serves a web monitoring dashboard

---

## Check the monitoring dashboard

Open your browser to:

```
http://localhost:5555
```

The dashboard shows:

- **Queue statistics** -- total repos, queued, currently collecting
- **Repo table** -- every repo with status, priority, due time, and last run results
- **Boost button** -- push any repo to the front of the queue
- Auto-refreshes every 10 seconds

---

## Verify data in the database

After the first repo finishes collecting, you can verify data with `psql`:

```sql
-- Connect to your database
psql -U aveloxis -d aveloxis

-- Check collected repos
SELECT repo_id, repo_owner, repo_name, primary_language
FROM aveloxis_data.repos;

-- Count issues
SELECT r.repo_name, COUNT(*) AS issue_count
FROM aveloxis_data.issues i
JOIN aveloxis_data.repos r ON r.repo_id = i.repo_id
GROUP BY r.repo_name;

-- Count pull requests
SELECT r.repo_name, COUNT(*) AS pr_count
FROM aveloxis_data.pull_requests pr
JOIN aveloxis_data.repos r ON r.repo_id = pr.repo_id
GROUP BY r.repo_name;

-- Count commits (one row per file per commit)
SELECT r.repo_name, COUNT(DISTINCT cmt_commit_hash) AS commit_count
FROM aveloxis_data.commits c
JOIN aveloxis_data.repos r ON r.repo_id = c.repo_id
GROUP BY r.repo_name;

-- Check contributors
SELECT COUNT(*) AS total_contributors
FROM aveloxis_data.contributors;

-- Check collection queue status
SELECT status, COUNT(*)
FROM aveloxis_ops.collection_queue
GROUP BY status;
```

---

## What happens next

Once `aveloxis serve` is running, it continuously:

1. Collects repos in priority order from the queue
2. Re-collects repos after `days_until_recollect` (default: 1 day)
3. Refreshes materialized views every Saturday
4. Runs contributor breadth discovery every 6 hours
5. Refreshes org membership every 4 hours

You can add more repos at any time without restarting:

```bash
aveloxis add-repo https://github.com/kubernetes/kubernetes
```

To push a specific repo to the front of the queue:

```bash
aveloxis prioritize https://github.com/kubernetes/kubernetes
```

---

## Next steps

- [Configuration](configuration.md) -- fine-tune workers, batch sizes, and clone directories
- [Augur Migration](augur-migration.md) -- import repos and keys from an existing Augur database
- [Commands Reference](../guide/commands.md) -- full CLI documentation
- [Collection Pipeline](../guide/collection-pipeline.md) -- understand what Aveloxis collects and how