> For the complete documentation index, see [llms.txt](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/ongoing-projects/sqlite-observation-store-transition-plan.md).

# SQLite Observation Store Transition Plan

> **Status:** Planned **Captured:** 2026-05-18 **Supersedes:** `docs/ongoing-projects/local-observation-store.md` **Scope:** Move durable Trend Finder and broader dashboard observations from generated JSON/cache files into a local-first SQLite database. **Intent:** Define the migration plan clearly enough that the superseded brainstorming note can be deleted without losing architectural context.

***

## Core Decision

Trend Finder and the broader OS dashboard need historical observations. A system that only overwrites `src/data/live-data.json` can show the current moment, but it cannot reliably answer what is emerging, accelerating, recurring, cooling, predictive, or repeatedly useful.

The target model is:

```
collect signals
  -> normalize observations
  -> append durable records to SQLite
  -> compute rollups and predictions
  -> serve a compact latest dashboard projection
```

SQLite should become the product memory. Runtime projections should remain small, disposable read models.

***

## Current State

The app currently relies on generated runtime files and cache artifacts:

| Layer                                       | Current Role                                                  |
| ------------------------------------------- | ------------------------------------------------------------- |
| `src/data/live-data.json`                   | Gitignored generated runtime payload served by `/__live-data` |
| `src/data/live-data.example.json`           | Committed safe fallback/demo payload                          |
| `.cache/extensions/trend-finder/snapshots/` | Gitignored movement snapshots for previous-run comparison     |
| `logs/aggregate-*.jsonl`                    | Gitignored aggregate diagnostics                              |
| Apify Datasets                              | External raw source output and run provenance                 |

This is workable for an MVP dashboard, but not for a historical intelligence system. The JSON payload encourages whole-state rewrites, makes over-time queries awkward, mixes current UI shape with durable product memory, and pushes retention, dedupe, and pruning into ad hoc file logic.

***

## Target Split

| Layer                  | Target Purpose                                                      |
| ---------------------- | ------------------------------------------------------------------- |
| SQLite                 | Durable local observations, run history, dedupe keys, rollups       |
| `/__live-data`         | Compact latest dashboard projection, eventually read from SQLite    |
| `live-data.json`       | Temporary compatibility/export artifact during migration            |
| `.cache/`              | Rebuildable transient artifacts only                                |
| Apify Datasets         | External raw collection output and source-run provenance            |
| Existing private files | Credentials, account auth, env settings, and sensitive local config |

The browser should not query SQLite directly. It should continue to use a projection boundary, initially the existing `useLiveData()` and `/__live-data` shape. This keeps the UI stable while the storage layer changes underneath it.

***

## Local-First Scope

The first implementation should stay deliberately modest:

* Use one local SQLite database, for example `data/ai-os.sqlite`.
* Gitignore the database and SQLite sidecar files.
* Access SQLite from Bun scripts with `bun:sqlite`.
* Do not add a hosted service, auth system, sync layer, or multi-user permissions.
* Do not store raw private transcripts by default.
* Keep credentials in existing private files, not in SQLite.
* Make deletion simple: deleting the SQLite file should reset product memory.

This should feel like a local product memory layer, not a backend platform.

***

## Sensitivity Model

The migration should not use "non-sensitive" too casually. Some local telemetry is not a credential but is still private. The database remains gitignored local state.

| Class                        | Examples                                                | Default Storage Decision                    |
| ---------------------------- | ------------------------------------------------------- | ------------------------------------------- |
| Secrets and auth material    | API keys, OAuth tokens, account auth JSON, `.env.local` | Never store in SQLite                       |
| Raw private local content    | Claude/Codex transcripts, Obsidian note bodies          | Do not store by default                     |
| Private local observations   | Project counts, model usage, skill usage, cost totals   | Store structured metadata only              |
| Public Trend Finder evidence | URL, title, snippet, source, public metrics             | Store normalized facts and observations     |
| Raw public source payloads   | Full Apify dataset rows, scraped page bodies            | Keep external or short-retention debug only |
| Derived projections/rollups  | Daily trend/source/project/skill summaries              | Store as regenerable derived data           |

The default should be normalized facts, counts, timestamps, hashes, categories, and outcomes rather than raw source dumps.

***

## Trend Finder History

Trend Finder should be the first migration path because it has the clearest need for history and the lowest privacy risk when limited to normalized public evidence.

Historical observations unlock questions such as:

* Did a topic appear once or persist?
* Is mention velocity increasing?
* Is acceleration increasing?
* Did the topic spread from developer sources into creator, launch, research, news, or discussion sources?
* Which sources tend to surface signals early?
* Which scoring patterns predicted later mainstream attention?
* Which topics died despite high initial activity?

The first Trend Finder schema should support:

* source runs and source health over time
* normalized evidence items
* repeated observations of the same evidence URL
* source metrics at each observation time
* topic assignments and score history
* source spread across roles
* prediction records
* later outcome labels

Suggested initial tables:

```
trend_runs
trend_sources
trend_source_runs
trend_evidence
trend_evidence_observations
trend_topics
trend_topic_observations
trend_topic_evidence
trend_watchlist_items
trend_predictions
trend_prediction_outcomes
```

Normalized evidence records can include:

* public URL
* normalized URL hash
* title
* snippet
* source ID and source role
* published timestamp
* first observed timestamp
* last observed timestamp
* public metrics at observation time
* Apify actor/run/dataset provenance
* content hash or URL hash for dedupe

Raw Apify payloads should remain in Apify by default. If local raw payload capture is ever needed, it should require an explicit short-retention debug setting and a pruning command.

***

## Broader OS History

After the Trend Finder path proves the storage pattern, the broader dashboard can move sanitized local observations into SQLite. A latest scan answers "what is true right now." A historical observation store answers "what keeps happening" and "what changed."

Useful over-time OS patterns include:

* project momentum
* recurring blockers
* repeated task classes
* context switching patterns
* tool and skill adoption curves
* cost and token trends
* model usage changes
* automation health
* memory/index growth
* integration readiness drift
* recommendations accepted or ignored

Suggested OS tables:

```
os_runs
project_observations
session_observations
tool_usage_observations
skill_usage_observations
model_usage_observations
cost_observations
automation_observations
memory_store_observations
integration_observations
warning_events
recommendation_events
```

The OS side should store structured observations, counts, timestamps, categories, outcomes, and hashes where useful. It should not store full private transcripts or private source-file bodies by default.

***

## Rollups

SQLite should include derived rollups so the dashboard does not scan raw event history for every render.

Potential rollup tables:

```
daily_trend_rollups
daily_source_rollups
daily_project_rollups
daily_skill_rollups
daily_model_rollups
daily_cost_rollups
```

Rollups are derived data. They should be regenerable from observation tables. Retention rules can decide whether rollups survive pruning of lower-level observations.

***

## Projection Boundary

The dashboard should continue consuming a compact current projection. The projection can include:

* latest summary stats
* latest Trend Finder topics
* current source health
* current runtime readiness
* latest OS activity summary
* recent warnings
* capped recent lists

It should not include:

* full historical evidence
* all previous runs
* raw payloads
* raw private source content
* unbounded arrays
* credentials or token-derived values

During migration, `live-data.json` can remain a compatibility artifact. The final target is for `/__live-data` to build or read the compact projection from SQLite, with `live-data.json` retained only as an optional export/debug output if still useful.

***

## Migration Plan

### Phase 0: Guardrails And Decisions

1. Add gitignore coverage for `data/*.sqlite`, `data/*.sqlite-*`, and any chosen store path.
2. Extend the private runtime artifact check to include the SQLite database and sidecars.
3. Document the database path, deletion/reset behavior, and what must never be stored.
4. Decide whether the first database is `data/ai-os.sqlite` or a path under `.cache/`. Prefer `data/ai-os.sqlite` for explicit product memory.

### Phase 1: Store Foundation

1. Add `scripts/lib/observation-store/`.
2. Implement a small `bun:sqlite` connection helper with explicit path resolution and directory creation.
3. Add a migrations table and idempotent migration runner.
4. Add a script command for migration/status checks.
5. Add focused tests using temporary SQLite files.

Recommended module shape:

```
scripts/lib/observation-store/
|-- connection.ts
|-- migrations.ts
|-- schema.ts
|-- trend-writes.ts
|-- trend-queries.ts
|-- projection.ts
`-- __tests__/
```

### Phase 2: Trend Finder Dual-Write

1. Add initial Trend Finder tables for runs, sources, source runs, evidence, evidence observations, topics, topic observations, topic/evidence links, and generated watchlist rows.
2. After the Trend Finder collector validates its payload, insert normalized observations into SQLite.
3. Continue writing `src/data/live-data.json` unchanged.
4. Treat SQLite write failure as a warning during this phase unless it would corrupt output.
5. Add tests for idempotent inserts, URL/hash dedupe, repeated observations, and source metric changes.

### Phase 3: Replace Snapshot Files

1. Replace `.cache/extensions/trend-finder/snapshots/latest.json` reads with previous-run queries from SQLite.
2. Derive `previousScore`, `velocity`, `firstSeenAt`, `lastSeenAt`, and movement status from database history.
3. Keep the snapshot file writer temporarily as a compatibility fallback.
4. Add tests proving movement labels match the old snapshot behavior.
5. Remove snapshot file dependency once the database-backed path is stable.

### Phase 4: SQLite-Backed Projection

1. Add a projection query that returns the existing `LiveData` shape or a strongly compatible subset.
2. Teach `/__live-data` to prefer the SQLite-backed projection and fall back to `live-data.json`, then `live-data.example.json`.
3. Keep `useLiveData()` unchanged at first.
4. Add tests for projection fallback order and validator compatibility.
5. After confidence, stop writing `src/data/live-data.json` by default or make it an explicit export/debug command.

### Phase 5: Broader OS Observation Store

1. Add `os_runs` and the lowest-risk OS observation tables first.
2. Start with aggregate metrics that are already emitted in sanitized `LiveData`: daily usage, model usage, costs, skill usage, integration status, memory store counts, automation health, and warnings.
3. Do not store raw session transcript lines, raw note bodies, or raw local source files.
4. Add rollups for daily project, skill, model, and cost trends.
5. Keep Hermes bridge reads separate unless a later plan explicitly defines what Hermes observations are safe to store.

### Phase 6: Retention, Rollups, And Prediction

1. Add pruning commands by table group and sensitivity class.
2. Decide default retention windows for normalized public evidence, private OS observations, logs, and optional raw debug payloads.
3. Generate rollups during aggregate or through a dedicated rollup command.
4. Add simple transparent prediction features:
   * velocity
   * acceleration
   * source diversity
   * persistence
   * source quality weighting
   * recency decay
   * comparison against topic baselines
5. Add prediction outcome labels only when there is a clear manual or automated outcome source.

***

## Validation Strategy

Focused checks for early phases:

```bash
bun run typecheck:scripts
bun run test -- scripts/lib/observation-store
bun run test -- scripts/extensions/trend-finder/__tests__/collector.test.ts
bun run test -- scripts/lib/ai-runtime/__tests__/scoring.test.ts
```

Projection and UI phases should also run:

```bash
bun run typecheck
bun run test -- src/lib/__tests__/use-live-data.test.tsx
bun run test -- src/lib/__tests__/validate-live-data.test.ts
bun run test -- src/lib/__tests__/trend-finder-dashboard.test.tsx
```

When `/__live-data` behavior changes, add a targeted middleware/server test and run the Trend Finder E2E flow:

```bash
bun run test:e2e -- tests/e2e/trend-finder.spec.ts
```

Manual checks:

1. Delete the SQLite file and confirm the app returns to safe fallback/demo behavior.
2. Run `bun run aggregate` twice and confirm repeated evidence observations append without duplicating canonical evidence rows.
3. Confirm movement labels use database history.
4. Confirm `src/data/live-data.json` can be deleted without losing dashboard state once the SQLite projection phase is complete.
5. Confirm private artifact checks catch the database and sidecar files.

***

## Non-Goals

* No hosted database.
* No remote sync.
* No multi-user permission model.
* No browser-direct SQLite access.
* No raw private transcript hoarding.
* No credential storage in SQLite.
* No full raw Apify payload retention by default.
* No replacement of React Query or route rendering as part of the storage migration.

***

## Open Questions

* What should the default retention window be for normalized public Trend Finder evidence?
* Should private OS observations have separate retention windows by category?
* Should rollups survive pruning of lower-level observations?
* Should `live-data.json` remain as an explicit export command after migration?
* Do we want one database for all local observations or a separate public-trend database and private-OS database?
* Should optional raw debug payload capture be implemented at all, or should raw payloads stay external-only?

***

## Completion Criteria

The migration is complete when:

* Durable Trend Finder history lives in SQLite.
* Movement, previous scores, and first/last seen fields come from SQLite.
* `/__live-data` can serve the dashboard from a compact SQLite projection.
* `src/data/live-data.json` is no longer the system of record.
* Snapshot files are no longer required for Trend Finder movement.
* Private artifact checks cover the SQLite database and sidecars.
* Docs describe implemented behavior separately from planned OS-history work.
* Deleting the SQLite database resets local product memory without breaking a clean fallback dashboard.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/ongoing-projects/sqlite-observation-store-transition-plan.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
