> For the complete documentation index, see [llms.txt](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/runbooks/scheduled-aggregate.md).

# Scheduled Aggregate Runbook

AI OS scheduled aggregate is the manual compatibility path for refreshing local operator telemetry and extension projections through the scheduler runner. It preserves the direct `bun run aggregate` command and adds scheduler state, locks, timeout handling, warning summaries, and private logs around the same aggregate implementation.

This runbook covers the full aggregate compatibility job plus the scoped host/local and Trend Finder refresh jobs. Dream has its own scheduler job and runbook; the normal Dream path now consumes fresh generated data or refreshes host/local material through `agent-aggregate` without running Trend Finder.

## Current Status

| Area                     | Current behavior                                          |
| ------------------------ | --------------------------------------------------------- |
| Scheduler job            | `aggregate` is an available AI OS scheduler job           |
| Default cadence          | Every 4 hours when explicitly enabled                     |
| Enablement               | Manual compatibility job; not enabled by default          |
| Manual scheduler run     | `bun run scheduler:run`                                   |
| Status command           | `bun run scheduler:status`                                |
| Direct aggregate command | `bun run aggregate` remains the full writer               |
| Browser projection       | None; status is command-only                              |
| Dream dependency         | Dream uses fresh generated data or scoped agent aggregate |
| Local timer config       | `data/ai-os.scheduler.json`                               |

Scoped jobs are also available:

| Job             | Run command                          | Status command                          | Write scope                           |
| --------------- | ------------------------------------ | --------------------------------------- | ------------------------------------- |
| Agent aggregate | `bun run scheduler:agents:run`       | `bun run scheduler:agents:status`       | Host/local branches except extensions |
| Trend Finder    | `bun run scheduler:trend-finder:run` | `bun run scheduler:trend-finder:status` | `extension-item` for `trend-finder`   |

The direct aggregate command now delegates to the shared script-side orchestration module in `scripts/lib/aggregate-orchestration.ts`. The compatibility path still runs full aggregate behavior and writes the complete generated data artifact through the `full` write scope.

When Trend Finder Apify sources are configured, the full aggregate keeps the collector bounded with `FINDTREND_APIFY_COLLECTION_BUDGET_SECS` (default: 60). Sources that do not fit in that budget are emitted as degraded source rows with warnings instead of blocking the aggregate write.

The same module exposes reusable boundaries for scoped scheduler jobs:

| Boundary                 | Current behavior                                           |
| ------------------------ | ---------------------------------------------------------- |
| Host/local orchestration | Builds host/local live data and skips extension collectors |
| Extension orchestration  | Runs registered extension collectors independently         |

`agent-aggregate` is the default frequent local refresh job. `trend-finder` is the default once-daily extension refresh job. Timer install behavior and browser controls remain deferred.

## Implementation Map

| Piece                   | Role                                                                   | Primary files                              |
| ----------------------- | ---------------------------------------------------------------------- | ------------------------------------------ |
| Registry                | Reviewed job IDs, lifecycle state, cadence, timeouts, network policy   | `scripts/lib/scheduler/registry.ts`        |
| Handler ID contract     | Type-level list of scheduler handlers                                  | `scripts/lib/scheduler/types.ts`           |
| Handler registry        | Runtime map from handler ID to function                                | `scripts/lib/scheduler/handlers.ts`        |
| Runner                  | Locks, run IDs, timeouts, state writes, private logs, exit codes       | `scripts/scheduler-runner.ts`              |
| Status CLI              | Human and JSON-safe command status                                     | `scripts/scheduler-status.ts`              |
| Operator status model   | Copy, command hints, local config overlay, latest-run interpretation   | `scripts/lib/scheduler/operator-status.ts` |
| Generated-data writer   | Full, host/local, and Trend Finder merge scopes with shared write lock | `scripts/lib/aggregate-live-data-write.ts` |
| Aggregate orchestration | Full aggregate, host/local aggregate, and extension orchestration      | `scripts/lib/aggregate-orchestration.ts`   |
| Scheduler handlers      | Job-specific wrappers around aggregate, agent, Trend Finder, and Dream | `scripts/lib/scheduler/*-handler.ts`       |
| Dream execution         | Material-first Dream flow with scoped host/local refresh when required | `scripts/lib/dream/execution.ts`           |

## When To Use It

Use `scheduler:agents:run` for frequent local host, agent, dashboard, and Dream projection refreshes. This path does not run Trend Finder.

Use `scheduler:trend-finder:run` once early in the day when you want the Trend Finder extension refreshed.

Use scheduled aggregate only when you intentionally want the full compatibility writer, including extension collectors, refreshed by the scheduler runner while keeping all private diagnostics on the same machine.

Use direct aggregate when you only need an immediate refresh and do not need scheduler state, lock, or run-log records:

```bash
bun run aggregate
```

Use the scheduler path when you need scheduler state, lock, timeout, and log handling around aggregate:

```bash
bun run scheduler:run
```

Use the Dream scheduler path when you need Dream activation gates, material freshness checks, private output, and continuity:

```bash
bun run scheduler:dream:run
bun run scheduler:dream:status
```

See [AI OS Dream Runbook](/ai-os-and-trend-finder-docs/docs/runbooks/ai-os-dream.md) for Dream enablement, status, private output, compatibility, cleanup, and deferrals.

Use the scoped agent aggregate path when you want host/local data without running extension collectors:

```bash
bun run scheduler:agents:run
bun run scheduler:agents:status
```

Use the scoped Trend Finder path when you want to refresh only Trend Finder extension data. Trend Finder must be enabled locally, for example with `VITE_CLAUDE_OS_ENABLED_EXTENSIONS=trend-finder`.

```bash
bun run scheduler:trend-finder:run
bun run scheduler:trend-finder:status
```

## First Run

From the repository root:

```bash
bun install
bun run scheduler:status
bun run scheduler:run
bun run scheduler:status
```

Before the first scheduler run, status reports `Not run yet`. After a successful run, status reports the latest state, warning count, duration, exit code, trigger source, lock outcome, and generated-data freshness summary.

The status command intentionally does not print raw logs, command output, credentials, auth JSON, source dumps, prompts, transcripts, or private file paths.

## Local Scheduler Config

Copy `data/ai-os.scheduler.example.json` to `data/ai-os.scheduler.json` when you want repo-local timer intent and cadence choices. The private file is gitignored. It may only select reviewed scheduler job IDs, reviewed cadence IDs, booleans, and strict `HH:MM` Dream times.

Keep `.env.local` for secrets, provider runtime settings, auth paths, and process activation such as `AI_OS_DREAM_ENABLED=true`. Scheduler config is not a command surface: raw shell, private paths, and raw systemd calendar expressions are rejected. When the config is missing or invalid, `bun run scheduler:status` falls back to the registry default cadence and prints only sanitized warning counts/details.

## Live Data Write Ownership

The generated browser data artifact is still `src/data/live-data.json`. The scheduler split added merge boundaries that the scoped jobs use for their runnable writes:

| Scope                        | Owned branches                            | Preserved branches                               |
| ---------------------------- | ----------------------------------------- | ------------------------------------------------ |
| Host/local agent aggregate   | Root live-data fields except `extensions` | Existing extension runtime state                 |
| Extension item               | `extensions.items[extensionId]`           | Host/local root fields and other extension items |
| Full aggregate compatibility | Complete artifact through the write gate  | None                                             |

The current scoped Trend Finder job uses the generic extension-item writer with `extensionId` set to `trend-finder`.

All producer writes use the shared generated-data lock at the scheduler-root path `scheduler/locks/generated-live-data.lock`. That lock is separate from per-job scheduler run locks such as `agent-aggregate.lock` and `trend-finder.lock`, so run-state overlap and generated-file overlap can be reported independently.

If `src/data/live-data.json` is missing or invalid, scoped writes use `src/data/live-data.example.json` as the merge base before applying their owned branch. The merged artifact must still pass the aggregate live-data validation and privacy checks before it is written.

## Operating Constraints

Keep these scheduler constraints intact when changing this area:

* Preserve `bun run aggregate` and `bun run scheduler:run` as full aggregate compatibility writers unless a dedicated migration changes that contract.
* Keep `scripts/lib/scheduler/registry.ts` as the reviewed job and cadence contract.
* Route scoped writes through `writeGeneratedLiveDataForScope()` instead of writing `src/data/live-data.json` directly.
* Keep secrets, raw source payloads, prompts, auth paths, private paths, command bodies, and raw logs out of status output and browser data.
* Do not describe timer installation, unit rendering, or unit reconciliation as shipped behavior. Current timer setup is manual local systemd guidance.
* Keep Dream material-first: Dream may refresh stale or missing host/local material through `agent-aggregate`, but it does not run Trend Finder in the normal path.

## Cadence

The aggregate registry includes three reviewed cadence candidates:

| Candidate     | systemd calendar  |
| ------------- | ----------------- |
| Hourly        | `hourly`          |
| Every 2 hours | `*-*-* 0/2:00:00` |
| Every 4 hours | `*-*-* 0/4:00:00` |

The current agent aggregate registry selects every 4 hours as the default frequent local refresh. Trend Finder is separate and defaults to one early daily refresh at 06:00.

## Optional Timer Enablement

AI OS starter metadata marks agent aggregate, Trend Finder, and Dream enabled by default. Full aggregate is a manual compatibility job and is not enabled by default. Setup and status commands do not auto-install or manage timers. If you want Linux systemd user timers, create them explicitly from your local checkout path.

Example service:

```ini
[Unit]
Description=AI OS scheduled agent aggregate

[Service]
Type=oneshot
WorkingDirectory=/path/to/ai-os
ExecStart=/usr/bin/env bun run scheduler:agents:run
Environment=AI_OS_SCHEDULER_TRIGGER_SOURCE=systemd
```

Example timer:

```ini
[Unit]
Description=Run AI OS agent aggregate every 4 hours

[Timer]
OnCalendar=*-*-* 0/4:00:00
Persistent=true

[Install]
WantedBy=timers.target
```

Install those files under your user systemd unit directory, then run:

```bash
systemctl --user daemon-reload
systemctl --user enable --now ai-os-agent-aggregate.timer
systemctl --user list-timers ai-os-agent-aggregate.timer
```

Use your actual checkout path in `WorkingDirectory`. Do not commit machine paths or generated timer files with local account names.

Future timer rendering or reconciliation remains deferred. If it is added, keep it Linux systemd-only at first, start with a dry-run renderer, keep generated unit files out of git, refuse unsafe working directories and unknown commands, show inspect/diff output before install, require explicit install/uninstall commands, report drift from active user timers, and keep the legacy Dream cron installer outside this path.

## Commands

### Inspect Status

```bash
bun run scheduler:status
```

This prints:

* lifecycle and schedule state
* selected cadence
* starter enablement stance
* latest run label and status code
* warning count
* timing, exit code, trigger source, and lock outcome when recorded
* generated-data freshness metadata when recorded
* safe command hints

Use JSON output for local automation:

```bash
bun run scripts/scheduler-status.ts --job aggregate --json
```

Status distinguishes starter defaults from local timer intent. `enabled by default` comes from the reviewed registry metadata. `timerEnabled` comes only from `data/ai-os.scheduler.json` and does not prove that a systemd timer is installed or active. For frequent local refreshes, inspect `scheduler:agents:status`; use `scheduler:status` only for the full aggregate compatibility path.

Dream has a separate status path:

```bash
bun run scheduler:dream:status
bun run scripts/scheduler-status.ts --job dream --json
```

### Run Through The Scheduler

```bash
bun run scheduler:run
```

This runs the aggregate handler through the scheduler runner. The runner:

* validates the job ID
* rejects unavailable jobs before execution
* acquires the aggregate scheduler lock
* applies the 35 minute aggregate timeout
* delegates to the existing aggregate implementation
* writes sanitized latest run state
* writes private scheduler logs
* releases the lock on completion

### Run Direct Aggregate

```bash
bun run aggregate
```

This remains the direct refresh path. It does not write scheduler run state or per-job scheduler run-lock records. It still writes through the generated-data gate, so it may briefly create the generated-data lock while the file is replaced.

## State Meanings

| State              | Meaning                                               | Usual response                             |
| ------------------ | ----------------------------------------------------- | ------------------------------------------ |
| `first-run`        | No private scheduler run state exists yet             | Run `bun run scheduler:run`                |
| `success`          | Latest scheduler run completed cleanly                | No action required                         |
| `warning-only`     | Run completed with sanitized warning summaries        | Inspect private logs locally if repeated   |
| `degraded`         | Run completed with reduced input or fallback coverage | Check configured sources and providers     |
| `failed`           | Handler failed before clean completion                | Check local config and rerun               |
| `timeout`          | Run exceeded the 35 minute timeout                    | Check slow providers or sources            |
| `blocked`          | Existing lock or active run prevented execution       | Wait or rerun to allow stale-lock recovery |
| `skipped`          | Scheduler boundary skipped execution                  | Verify registry and handler wiring         |
| `stale-lock`       | Run completed after stale lock recovery               | Watch for repeated recovery                |
| `unreadable-state` | Latest state could not be parsed safely               | Rerun to replace private state             |

These labels are intentionally stable and safe for operator output.

## Private Files

Scheduled aggregate may create or refresh:

| Path class                    | Purpose                                  |
| ----------------------------- | ---------------------------------------- |
| `src/data/live-data.json`     | Generated private app data               |
| `logs/aggregate-*.jsonl`      | Direct aggregate diagnostic traces       |
| `logs/aggregate.latest.jsonl` | Latest direct aggregate diagnostic trace |
| AI OS scheduler run state     | Latest scheduler status metadata         |
| AI OS scheduler lock          | Non-overlap lock for aggregate runs      |
| AI OS generated-data lock     | Non-overlap lock for live-data writes    |
| AI OS scheduler log           | Private scheduler run log                |

The scheduler state, lock, and log paths are resolved under the local AI OS scheduler root. Status output exposes only labels for those private files.

## Privacy Boundary

The scheduled path must not expose:

* secret environment values
* `.env.local`
* account auth JSON
* raw prompts or runtime transcripts
* raw source dumps or Dataset payloads
* command bodies or full shell output
* unredacted local usernames, emails, configured names, or private paths

Use metadata-only checks before sharing a credentialed run:

```bash
bun run runtime:check-private
```

That command uses git metadata only. It does not read private artifact contents.

## Deletion And Cleanup

To remove generated local data, delete the generated data file and private runtime artifacts from your machine. Keep the committed example data file.

```bash
rm -f src/data/live-data.json
rm -f logs/aggregate.latest.jsonl
rm -f logs/aggregate-*.jsonl
```

If you enabled a user timer, disable it before removing scheduler state:

```bash
systemctl --user disable --now ai-os-aggregate.timer
systemctl --user disable --now ai-os-agent-aggregate.timer
systemctl --user disable --now ai-os-trend-finder.timer
```

Then remove your local scheduler state, lock, and log files from the AI OS scheduler root. Do not commit those files.

## No-Gateway Decision

The scheduler path does not add a browser gateway, live scheduler API, or mutable UI controls. The implemented surfaces are enough for the current local-first need:

* systemd user timers can trigger background execution
* Bun scripts run the job logic
* private scheduler state and logs provide local diagnostics
* `bun run scheduler:status` provides safe operator output

Any future browser write controls or remote scheduler service need a separate threat model and a new implementation spec.

## Dream Boundary

Dream depends on fresh browser-safe generated data and runs through its own AI OS scheduler job:

```bash
bun run scheduler:dream:run
bun run scheduler:dream:status
```

Scheduled aggregate itself still does not write Dream outputs, call a Dream runtime provider, or install a Dream timer. The Dream handler owns those boundaries after it verifies generated-data freshness and refreshes stale or missing host/local material through agent aggregate.

Dream is enabled in scheduler metadata, remains explicit at the activation gate, and stays private-output only. It can be triggered from the local dashboard or CLI, but it does not add a browser gateway, remote scheduler API, daemon, remote control plane, or Trend Finder runtime coupling. See [AI OS Dream Runbook](/ai-os-and-trend-finder-docs/docs/runbooks/ai-os-dream.md).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/runbooks/scheduled-aggregate.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
