> For the complete documentation index, see [llms.txt](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/extensions/trend-finder/runtime-and-provenance.md).

# Trend Finder Runtime And Provenance

This guide explains optional AI analysis, deterministic fallbacks, output validation, runtime readiness, provenance states, and sanitized Engine Replay trace states and modes.

## AI Analysis And Fallback Topics

AI analysis is optional at runtime. Trend Finder always needs to produce a bounded, explainable payload, so it has deterministic fallbacks for every AI stage that matters.

When AI analysis runs, the trend analyst receives:

* run ID and generated timestamp
* the active Creator Lens
* saved public competitor names from the Creator Lens
* normalized analyst-ready evidence
* a previous snapshot summary when one exists
* an output schema hint

The analyst is instructed to cluster related evidence into emerging AI trend topics, cite only provided evidence IDs, reject weak or irrelevant evidence, and avoid invented sources, metrics, dates, or URLs. The validator then checks the output. It rejects unknown evidence IDs, source mismatches, invented URLs, invented dates, invented source names, invented metrics, malformed JSON, empty output, and schema mismatches. Some validation failures get one repair retry; invalid output falls back to deterministic generation.

Fallback topic generation is deliberately bounded. It first uses the reviewed AI keyword map when titles contain known AI terms. For unmatched evidence, it can cluster related titles with dependency-free feature-hash similarity instead of grouping by source ID. It names the topic from bounded keywords or evidence title text, then builds summary/why-now/hooks/questions from the evidence and Creator Lens. Fallback topics are useful for transparent local operation, but they are still deterministic fallback output, not validated analyst clustering.

The same reviewed metadata now includes canonical topic seeds for stable evergreen subjects such as MCP, RAG, GraphRAG, local LLMs, AI coding agents, and reasoning models. The identity resolver uses those seeds only when prior snapshot and historical matches are unavailable. Seed matches are exposed as `reviewed-seed`, carry bounded confidence, and remain deterministic metadata, not a ticker board or a scoring factor.

Fallback topic values include:

| Field              | Fallback Source                                                                          |
| ------------------ | ---------------------------------------------------------------------------------------- |
| `novelty`          | `0.62` for experimental risk, otherwise `0.48`, then adjusted by history during scoring. |
| `nicheFit`         | Base `0.44`, plus Creator Lens topic-focus matches, plus a risk-level bonus.             |
| `creatorPotential` | Niche fit, evidence relevance/engagement, and whether the lens has a content format.     |
| `confidence`       | Average evidence relevance multiplied by `0.75`.                                         |
| `watchlistReason`  | Empty unless produced by AI; other watchlist rules can still select the topic.           |

Phase 27 expanded analyst and fallback output with these bounded fields:

| Field                                                 | Provenance Boundary                                                                                                          |
| ----------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| `contrarianAngle`, `plainExplainer`, `suggestedTitle` | AI output is validator-bounded; fallback copy is deterministic and labeled through the same analysis provenance.             |
| `themeLabel`, `themeKeywords`                         | AI theme labels are validated against topic rows; fallback themes use local token similarity and Creator Lens focus terms.   |
| `demandBrief` and demand cluster copy                 | AI output is bounded and tied to observed question-shaped evidence; fallback uses deterministic question heuristics.         |
| `outlierIdea`                                         | AI or fallback copy is limited to top-N source-local outlier evidence and can reuse private safe enrichment cache entries.   |
| `targetDate`, `targetLifecycle`                       | Prediction writer validates UTC target dates and known lifecycle stages; retros do not grade dated rows before they are due. |

Derived labels also carry explicit provenance or unavailable states. Lifecycle, saturation, convergence, velocity dynamics, demand growth, theme grouping, download deltas, risk flags, and source-local outliers are deterministic projections over current evidence and private bounded history. They are not raw AI claims, and they are not allowed to expose private snapshot paths, prompts, provider responses, raw source rows, or cache roots.

## Phase 28 Provenance Closeout

Phase 28 adds more deterministic projections, but it does not add a new runtime trust boundary. Dedup, quality scoring, visibility bands, signal aging, saturation refinement, lifecycle score movement, research-only risk, action verdicts, keyword coverage QA, direct readiness, and static Brief QA all use bounded inputs that were already collected, reviewed, or generated inside the Trend Finder pipeline.

Derived labels must stay explainable at the nearest surface:

| Label Family                            | Provenance Rule                                                                                                                |
| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| Scoring and action labels               | Derived from validated topic, evidence, source, history, quality, and aging fields.                                            |
| Source readiness and fallback labels    | Derived from reviewed source declarations, direct adapter outcomes, compliance gates, and credential/config presence booleans. |
| Browser-local pin, note, and tag labels | Derived from operator localStorage only and never written to generated payloads.                                               |
| Static Brief QA labels                  | Derived from the projected report contract, rendered HTML, and manifest candidate before output promotion.                     |
| Reference mode labels                   | Derived from committed Markdown registry metadata only.                                                                        |

Validation-time leak scanning is part of the runtime boundary. Static Brief QA, private-artifact checks, payload/budget checks, and Reference doc phrase tests must fail or warn before release notes imply that raw prompts, provider responses, raw source dumps, raw logs, Actor inputs, Dataset rows, billing payloads, credential values, private cache paths, local triage notes, or local filesystem paths are browser output.

## Phase 29 Provenance Closeout

Phase 29 adds comparison-derived context without adding a new runtime trust boundary. Attention pattern, aggregate reception, corroboration, evidence rationales, run narratives, industry events, security relevance, One to Watch, and pre-run estimate rows are projections over existing payload, source, history, scheduler, and validation state.

Every optional AI-produced addition has a deterministic fallback or an explicit degraded/unavailable state. Analyst output is still validator-gated against known topic IDs, evidence IDs, source IDs, dates, metrics, and bounded copy. Invalid, unavailable, or ungrounded output degrades to deterministic summary, omission, or unavailable labels instead of exposing raw model output.

Engine Replay remains sanitized. Phase 29 narration rows can say that a stage validated, retried, degraded, or fell back, but they cannot include prompts, provider responses, request IDs, stack traces, raw normalized rows, transcript bodies, comment bodies, local paths, cache paths, credentials, or private diagnostics. Reference mode remains committed-docs only and does not read generated reports, private caches, scheduler files, source setup config, raw traces, or browser localStorage.

Seed review and podcast/audio posture are provenance-sensitive. Seed-candidate review artifacts stay private and feed only bounded reviewed metadata. Podcast themes remain deferred by the Session 16 compliance decision, so no transcript, media, podcast source declaration, podcast cache, or browser podcast payload is runtime output in Phase 29.

## Feature-Hash Text Vectors

Phase 28 ships feature-hash text vectors as the accepted no-model-dependency path from ADR 0002. These vectors are deterministic local math, not a local embedding model and not a vector database.

The shared feature layer normalizes bounded text, filters stopwords, expands a small reviewed synonym map, and emits weighted tokens, stems, bigrams, trigrams, and character 4-grams. Collector scripts hash those feature keys with Node SHA-256 into signed 384-dimension vectors and L2-normalize them for cosine similarity. Browser code mirrors the hash derivation with Web Crypto over already-loaded payload strings and falls back to lexical ranking when vector work is unavailable, stale, or timed out.

Feature-hash similarity is used only for deterministic fallback behavior and UI ranking:

* theme-rollup fallback can merge unlabeled topics after valid analyst labels are applied and before keyword-overlap fallback
* deterministic fallback topic extraction can cluster similar unmatched evidence titles without grouping by source ID
* the search palette can rank local projected documents without a server search endpoint

Generated browser payloads must not contain 384-float vector arrays, raw source payloads, private cache paths, prompts, provider responses, credentials, or token-shaped strings. Vectors are transient implementation details on the collector or in the browser ranking pass.

## AI Runtime And Fallback Meaning

Trend Finder has explicit provenance labels because the same UI can render different analysis modes.

| Analysis State                  | Meaning                                                                                                 |
| ------------------------------- | ------------------------------------------------------------------------------------------------------- |
| AI-analyzed run                 | A configured non-mock AI runtime returned validated analyst output.                                     |
| Mock runtime analysis           | Mock runtime produced deterministic local output for testing or demos.                                  |
| AI runtime disabled             | Runtime was not ready and no analyst-ready evidence existed, so no AI analysis was requested.           |
| Deterministic fallback analysis | Topics were grouped and scored without validated AI analyst output, usually because AI was unavailable. |
| Unknown                         | Legacy payload without explicit analysis state.                                                         |

When AI output is used, validators reject invented evidence IDs, source IDs, URLs, dates, source names, and metrics. If the AI output is missing or invalid, Trend Finder falls back to deterministic output and surfaces warnings.

Runtime readiness is separate from analysis provenance:

| Runtime Field | Values Shown In Payload                                                                          |
| ------------- | ------------------------------------------------------------------------------------------------ |
| provider      | `disabled`, `mock`, `openai-api`, `codex-account`, or `unknown`                                  |
| status        | `ready`, `disabled`, `auth-required`, `expired-auth`, `invalid-auth`, `rate-limited`, or `error` |
| isReady       | Whether the collector may call the configured runtime for AI stages.                             |

For `codex-account`, expired stored account auth is refreshed automatically before readiness is reported. If automatic refresh succeeds, readiness is `ready`; if OpenAI rejects the refresh token or refresh otherwise fails, readiness remains `expired-auth` and includes recovery commands.

The collector can still finish when `isReady` is false. In that case the run uses deterministic fallback stages and carries warnings/next steps into the browser payload.

## Browser-Only Signal Workbench Triage

Signal Workbench triage is not part of Trend Finder runtime provenance. It is a browser-local operator annotation layer stored under `ai-os.trend-finder.signal-workbench.v1`. The only persisted values are topic IDs, local triage state, bounded notes, and update timestamps.

The local states are `acted-on`, `ignored`, `revisit`, and `watching`. No Workbench triage or watching state is written to generated `TrendFinderData`, generated Watchlist rows, source summaries, score fields, static exports, source caches, sanitized trace payloads, scheduler status, live progress, logs, prompts, provider responses, Actor inputs, Dataset rows, credentials, or local file paths.

Workbench storage is parsed defensively in the browser. Malformed JSON resets to an empty local config, unsupported versions are ignored, invalid states and empty topic IDs are dropped, duplicate topic rows are deduped, note length is capped, and total saved rows are bounded. These parse results can produce browser-visible warnings, but they do not change generated output or trigger source collection.

## Static Brief Export Provenance

The static Brief export is script-side and reads only the current validated Trend Finder payload from generated live data. It projects a smaller report contract before rendering HTML, then privacy-scans the projected report, rendered HTML, and manifest before final output is promoted.

The report can show the same safe provenance labels used by the dashboard: data origin, analysis mode, source state, and bounded provenance notes. It can also show source health, score/actionability labels, public evidence links, prediction and retro rows, spend labels, enrichment counts, and asset fallback labels. Phase 27 report projections can also include today's pick, movement groups, demand centers, angle-pack copy, bounded sparklines, theme labels, outlier ideas, and Story Log summaries when they are already present in the validated Trend Finder payload.

The report must not expose raw prompts, provider responses, raw source dumps, raw logs, Actor inputs, Dataset rows, billing payloads, account IDs, tokens, credentials, account auth, private cache paths, local triage notes, visibility state, or local filesystem paths. Generated reports live under `.cache/extensions/trend-finder/static-brief/` by default and are not committed or hosted by default.

## Engine Replay

Engine Replay is a dedicated route, not a creator workflow tab:

```
/extensions/trend-finder/engine
```

It has two modes. Replay explains the latest sanitized run stages: source collection, evidence normalization, runtime analysis, scoring, movement analysis, demand/theme/outlier enrichment, prediction writer, retro evaluator, and artifact writing. Reference renders the curated Trend Finder Markdown manuals in app. Replay is for proof and debugging; Reference is explanatory documentation. Neither mode is the creator workflow for deciding what content to create.

Engine Replay renders browser-safe trace data from the generated payload. It does not read raw logs, prompts, provider responses, Actor/Dataset internals, or account auth files.

Source-death alarms appear in Engine Replay as source rail health metrics and per-source warning labels. These values come from the sanitized trace or the browser payload collection-health fields. They are proof that a previously live configured source returned zero accepted evidence in the current run, but they do not expose the private baseline path, baseline file name, prior accepted count, raw diagnostics, cache roots, tokens, or credentials.

Live run progress uses the same browser-safe boundary. While a local run is active, `/__run_trend_finder_status` may return a current Engine Replay stage, completed stage IDs, elapsed time, lifecycle state, cancellation/overlap state, and bounded per-source counts. It never returns raw stdout/stderr, process logs, private paths, prompts, provider payloads, Actor inputs, Dataset rows, credential values, or account auth. If the status endpoint cannot provide progress, the UI shows an unavailable-progress fallback and keeps the coarse run lifecycle visible.

Replay also shows source-local baseline counters when the generated payload has them. These counters are aggregate labels only:

| Counter       | Meaning                                                                 |
| ------------- | ----------------------------------------------------------------------- |
| Ready         | Evidence rows with an available source-local baseline.                  |
| Unavailable   | Evidence rows with explicit low-sample, missing, or unsupported states. |
| Entities      | Stable public source entities represented by the summary.               |
| Excluded      | Pinned, stickied, promoted, sponsored, or ad-like rows excluded.        |
| Down-weighted | Rows retained with reduced source-local weight.                         |

Unavailable and zero-count states are displayed directly so a run does not imply source-local support when no reviewed baseline exists. Excluded and down-weighted counters are the only browser-facing placement proof; Engine Replay does not render the raw pinned, stickied, promoted, sponsored, ad-like, or unknown source rows that caused those labels.

Replay also projects enrichment cache and spend summaries from the generated payload or sanitized trace. These are operational counters only:

| Label                 | Meaning                                                               |
| --------------------- | --------------------------------------------------------------------- |
| Cache saved           | Count of unchanged safe enrichment summaries reused from cache.       |
| Cache degraded        | Cache read, write, validation, or pruning had safe warning counts.    |
| Cache no eligible     | Current evidence did not have stable identity needed for cache reuse. |
| Run spend exact       | Paid source runners reported actual usage.                            |
| Run spend estimated   | Exact usage was unavailable and configured charge caps were used.     |
| Run spend mixed       | Exact usage and configured cap estimates both appear in the run.      |
| Run spend unavailable | Paid source spend could not be summarized safely.                     |
| Recurring exact       | Selected cadence can project exact per-run usage.                     |
| Recurring estimated   | Selected cadence projects configured charge caps.                     |
| Recurring mixed       | Selected cadence projects exact and estimated paid source values.     |
| Recurring unavailable | Spend exists but cannot be safely projected.                          |
| Cadence unavailable   | Recurring projection cannot run until a cadence is selected.          |

Engine Replay never renders raw billing payloads, Actor internals, Dataset rows, private cache paths, account IDs, tokens, prompts, or provider responses. Cache and spend warnings are summarized as counts and safe notes.

Scheduler status provenance is separate from analysis provenance. The browser may show whether the scoped scheduler appears configured, first-run, skipped, failed, blocked, unreadable, or unavailable. It may also show reviewed cadence labels, timer intent, latest-run status, warning counts, generated-data freshness, and command hints. It must not expose local scheduler config paths, raw calendar expressions, run-state files, logs, or any secret-bearing value.

Replay also projects evidence asset summaries. These are aggregate labels only:

| Label          | Meaning                                                                  |
| -------------- | ------------------------------------------------------------------------ |
| Available      | Reviewed local assets can be fetched through the protected bridge.       |
| Blocked        | Source fields exist but compliance does not approve browser exposure.    |
| Missing/failed | Manifest or file checks failed closed; no private paths are displayed.   |
| Pruned         | Stale private asset entries/files were removed from the active keep set. |

Asset summaries can come from the generated payload or sanitized trace. The browser sees counts, status labels, source IDs, compliance doc paths, and opaque bridge URLs only. It never sees manifest file paths, cache roots, raw source media URLs, transcripts, Actor inputs, Dataset rows, account auth, or logs.

Static Brief export uses the same asset safety posture but cannot depend on the loopback bridge after the report is opened outside AI OS. It copies only manifest-approved, path-contained assets into the generated report's local `assets/` directory. Unsupported, blocked, failed, missing, pruned, bridge-only, or manifest-missing assets render fallback labels. Path escapes or private asset references fail closed before output promotion.

Reference mode reads only Markdown imported from the committed `docs/extensions/trend-finder-*.md` files. The Markdown files remain the source of truth for those manuals. Registered Trend Finder Markdown links switch to their Reference tabs with the requested heading hash when possible. Links to non-registered local docs remain code references, and external docs stay normal links. Reference mode does not discover files dynamically and does not read private caches, generated reports, raw traces, or local scheduler/source setup state.

Engine Replay trace states mean:

| State          | Meaning                                                                   |
| -------------- | ------------------------------------------------------------------------- |
| `missing`      | No sanitized trace is available; the UI derives only safe static context. |
| `sanitized`    | Browser-safe latest-run trace summaries are available.                    |
| `fixture-demo` | Committed or hand-authored demo trace data is active.                     |
| `stale`        | A trace exists but belongs to an older aggregate run.                     |
| `disabled`     | Trend Finder or the trace/runtime path is disabled.                       |
| `error`        | Trace generation failed and only safe fallback labels may render.         |


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/extensions/trend-finder/runtime-and-provenance.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
