> For the complete documentation index, see [llms.txt](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/.spec_system/archive/sessions/phase28-session03-calibration-version-and-confidence-dampener/spec.md).

# Session Specification

**Session ID**: `phase28-session03-calibration-version-and-confidence-dampener` **Phase**: 28 - Trend Finder Trends-Finderz Adoption **Status**: Not Started **Created**: 2026-06-14

***

## 1. Session Overview

This session adds scoring provenance and low-sample calibration to Trend Finder's history, prediction, and score-display path. Sessions 01 and 02 made evidence identity, duplicate pressure, source coverage, and cross-source signal quality explicit. The next risk is that later scoring changes in this phase could make old and new scores look comparable when they were produced by different formulas.

The implementation introduces one exported `SCORING_VERSION` constant, stamps it into snapshots and prediction records, and makes movement and retro grading flag cross-version comparisons with an explicit provenance note. Those comparisons still run; the system does not hide movement or retro rows just because a version changed.

The session also adds a bounded small-sample confidence dampener. Trend Finder already has banded analyst confidence labels and source-local low-sample states; this work adds the numeric sample-confidence value that those surfaces can show and uses it to adjust only the overall opportunity score. The six-factor score breakdown remains intact, and the final named-contribution convention is deferred to Session 06.

***

## 2. Objectives

1. Export a single Trend Finder `SCORING_VERSION` constant and stamp it into new snapshots, topics, and prediction records.
2. Treat missing historical versions as `unknown` and flag, not suppress, snapshot movement and prediction retro comparisons across scoring versions.
3. Derive a bounded 0-100 sample-confidence value from evidence count, distinct sources, and scoring-evidence share.
4. Apply a bounded confidence dampener to the final opportunity score and show the confidence value in the score breakdown.

***

## 3. Prerequisites

### Required Sessions

* [x] `phase28-session01-cross-source-signal-identity-and-dedup` - provides deterministic deduplication, syndication metadata, and bounded evidence counts consumed by confidence math.
* [x] `phase28-session02-signal-quality-score-and-collection-health` - provides shared cross-source evidence quality and run-level collection health that help separate thin evidence from strong evidence.
* [x] `phase27-session08-dated-predictions-and-story-log` - provides the dated prediction and retro archive surfaces that this session stamps and flags.

### Required Tools/Knowledge

* Bun 1.3.14, Vitest, and script typechecking.
* Existing Zod additive-default patterns in `src/extensions/trend-finder/schema.ts`.
* Existing scoring, snapshot, movement, prediction, and retro contracts in `scripts/lib/ai-runtime/`.
* Existing score breakdown and Engine Replay view-model patterns in `src/extensions/trend-finder/`.
* Trends-Finderz references under `EXAMPLES/trends-finderz/lib/scoring/`.

### Environment Requirements

* Work from the repo root.
* Do not add runtime dependencies.
* Do not introduce new source adapters, network collection paths, credential flows, hosted storage, or database migrations.
* Keep generated snapshots, prediction archives, raw source data, and private cache paths out of browser-visible output.

***

## 4. Scope

### In Scope (MVP)

* Trend Finder exports `SCORING_VERSION` from `scripts/lib/ai-runtime/scoring.ts` and uses one value across script-side scoring, snapshot, prediction, movement, and retro paths.
* Snapshots and browser topic payloads carry additive scoring-version and score-confidence fields with legacy defaults.
* Prediction records carry the scoring version at filing time; legacy predictions parse as `unknown`.
* Movement analysis receives current and previous scoring-version metadata and flags cross-version comparisons with an explicit note while preserving the normal movement status.
* Retro grading compares each prediction's filed scoring version with the current run version and flags cross-version outcomes without changing due-state rules.
* Sample confidence is derived from evidence count, distinct scoring sources, and scoring-evidence share, then clamped to 0-100.
* The final opportunity score is adjusted by a bounded dampener; raw weighted score, adjusted score, sample confidence, and dampener delta remain available for display and later Session 06 named contributions.
* Score breakdown UI shows sample confidence and dampener context without hiding the six-factor breakdown or inventing new score labels.
* Tests cover version stamping, legacy unknown handling, cross-version flags, dampener bounds, thin-evidence vs strong-evidence behavior, and UI labels.

### Out of Scope (Deferred)

* Rewriting the six-factor score weights - *Reason: this session wraps the final weighted score with confidence calibration, not a factor redesign.*
* Migrating historical snapshot or prediction files - *Reason: legacy records are intentionally parsed as `unknown` and flagged on comparison.*
* The named-contribution array `{label, delta, kind}` - *Reason: Session 06 owns the shared contribution convention for dampener, noise, and lifecycle adjustments.*
* Topic noise gates, visibility bands, aging half-lives, saturation refinement, lifecycle multipliers, research-only flags, cache retention, action verdicts, or keyword/source adapter work - *Reason: those are Sessions 04-14.*
* New AI calls, source calls, dependencies, durable storage, or database schema work - *Reason: this is deterministic calibration over already-loaded data.*

***

## 5. Technical Approach

### Architecture

Add `SCORING_VERSION` and a small set of confidence-calibration helpers in `scripts/lib/ai-runtime/scoring.ts`. `scoreTrendTopics()` should calculate the raw six-factor score exactly as today, derive sample confidence from the scoring evidence set, apply a bounded dampener to produce the final `score`, and attach the confidence metadata to each topic. The implementation should keep the raw weighted score and dampener delta available for downstream rendering and later named-contribution work.

Extend `src/extensions/trend-finder/schema.ts` with additive defaults for scoring version, raw score, sample confidence, and dampener delta on topics. Additive defaults must preserve old generated `live-data.example.json`, snapshot, prediction, and retro fixtures. Use a shared unknown-version default instead of throwing when an archived record predates this session.

Stamp snapshots at write time and include `scoringVersion` in the parsed snapshot contract. Extend the lightweight previous-snapshot projection used by scoring and movement so previous topics retain the version that produced their scores. Movement analysis should receive current and previous versions, attach a comparison flag/note, and continue emitting the derived or AI movement status.

Prediction records should store the scoring version at filing time in both deterministic fallback and sanitized AI output paths. Retro grading should pass the current run version with current topics, compare it with the prediction version, and attach a cross-version flag/note to retro records. The retro outcome still uses existing due-state and observed-direction rules.

### Design Patterns

* Pure helper first: confidence math and version-comparison helpers should be deterministic and tested before collector/UI wiring.
* Additive schema defaults: new fields parse old snapshots, predictions, retros, fixtures, and live-data payloads.
* Boundary stamping: script-side write paths stamp trusted local constants; AI output and browser payloads do not decide their own scoring version.
* Flag, do not suppress: movement and retro rows remain visible with a clear provenance note when versions differ.
* Browser-safe projection: expose version strings, booleans, labels, and bounded numeric deltas only; never paths or raw archived payloads.
* UI through existing score surfaces: extend the score breakdown and Engine Replay/view-model language instead of creating a new scoring panel.

### Technology Stack

* TypeScript on Bun.
* Zod for browser and archive payload parsing.
* React 19 and Tailwind CSS 4 for score-breakdown rendering.
* Vitest and Testing Library for script, schema, view-model, and component coverage.

***

## 6. Deliverables

### Files to Create

| File                                                           | Purpose                                                       | Est. Lines |
| -------------------------------------------------------------- | ------------------------------------------------------------- | ---------- |
| `scripts/lib/ai-runtime/__tests__/scoring-calibration.test.ts` | Focused tests for scoring version constants and dampener math | \~180      |

### Files to Modify

| File                                                                        | Changes                                                                          | Est. Lines |
| --------------------------------------------------------------------------- | -------------------------------------------------------------------------------- | ---------- |
| `scripts/lib/ai-runtime/scoring.ts`                                         | Add `SCORING_VERSION`, confidence derivation, dampener, and topic metadata       | \~170      |
| `scripts/lib/ai-runtime/snapshots.ts`                                       | Add snapshot scoring version defaults and comparison metadata                    | \~100      |
| `scripts/lib/ai-runtime/movement-analyst.ts`                                | Add scoring-version comparison fields to input, fallback, prompt, and sanitize   | \~120      |
| `scripts/lib/ai-runtime/predictions.ts`                                     | Stamp prediction records and archive parsing with scoring version                | \~90       |
| `scripts/lib/ai-runtime/retros.ts`                                          | Flag prediction/current scoring-version mismatches in retro records              | \~110      |
| `scripts/extensions/trend-finder/collector.ts`                              | Stamp payload/snapshot/prediction/retro flows and pass comparison metadata       | \~160      |
| `src/extensions/trend-finder/schema.ts`                                     | Add additive topic, movement, prediction, retro, and summary version fields      | \~150      |
| `src/extensions/trend-finder/view-model.ts`                                 | Add score-confidence and scoring-version warning labels                          | \~90       |
| `src/extensions/trend-finder/components/score-breakdown.tsx`                | Render sample confidence and dampener context with accessible labels             | \~70       |
| `src/extensions/trend-finder/components/engine-score-panel.tsx`             | Render scoring-version mismatch or unknown-version provenance where available    | \~60       |
| `scripts/lib/ai-runtime/__tests__/snapshots.test.ts`                        | Cover snapshot version defaults and legacy unknown handling                      | \~70       |
| `scripts/lib/ai-runtime/__tests__/movement-analyst.test.ts`                 | Cover cross-version movement flags and prompt/fallback behavior                  | \~90       |
| `scripts/lib/ai-runtime/__tests__/predictions.test.ts`                      | Cover prediction version stamping and legacy parsing                             | \~70       |
| `scripts/lib/ai-runtime/__tests__/retros.test.ts`                           | Cover cross-version retro flags and due-state preservation                       | \~90       |
| `scripts/extensions/trend-finder/__tests__/collector.test.ts`               | Cover end-to-end stamping through payload, snapshot, prediction, and retro paths | \~120      |
| `src/extensions/trend-finder/__tests__/view-model.test.ts`                  | Cover confidence labels and version-warning labels                               | \~80       |
| `src/extensions/trend-finder/components/__tests__/score-breakdown.test.tsx` | Cover visible sample-confidence rendering and accessible labels                  | \~70       |

***

## 7. Success Criteria

### Functional Requirements

* [ ] Every new snapshot records the current scoring version.
* [ ] Every new prediction record records the current scoring version at filing time.
* [ ] Legacy snapshots and predictions parse as `unknown` version without crashing.
* [ ] Movement comparisons across different or unknown scoring versions are flagged with provenance text and still return normal movement statuses.
* [ ] Retro comparisons across different or unknown scoring versions are flagged with provenance text and still preserve due-state behavior.
* [ ] Thin-evidence topics receive a bounded confidence adjustment; strong evidence topics are minimally affected.
* [ ] Score breakdown UI shows the sample-confidence value and dampener context alongside the existing factor rows.

### Testing Requirements

* [ ] Unit tests written and passing for confidence derivation, dampener clamps, and thin-vs-strong evidence behavior.
* [ ] Snapshot tests cover version stamping, default `unknown`, and comparison metadata.
* [ ] Movement and retro tests cover cross-version flagging without suppression.
* [ ] Prediction tests cover deterministic and sanitized AI-output stamping.
* [ ] Schema, view-model, and component tests cover legacy defaults and visible confidence/version labels.

### Non-Functional Requirements

* [ ] No new runtime dependencies.
* [ ] Browser-visible payload additions are bounded, additive, and free of raw cache paths or source payloads.
* [ ] Existing Trend Finder confidence label vocabulary remains intact.
* [ ] Current security and GDPR posture is unchanged.

### Quality Gates

* [ ] All files ASCII-encoded.
* [ ] Unix LF line endings.
* [ ] Code follows project conventions.
* [ ] Focused Vitest suites pass.
* [ ] `bun run typecheck:scripts` passes for script-side contract changes.

***

## 8. Implementation Notes

### Key Considerations

* The existing `confidence` field on `TrendTopic` is analyst confidence in the AI/deterministic topic output. Add a distinct sample-confidence field for scoring evidence, not a replacement for analyst confidence.
* The final named-contribution list is intentionally deferred. Record enough dampener metadata now so Session 06 can migrate it into `{label, delta, kind}` rows without recomputing archived scores.
* Missing versions should be treated as `unknown`, not invalid. This is required for every snapshot and prediction created before Session 03.
* AI prediction and movement outputs should not be trusted to stamp or override the local scoring version. Stamp trusted local metadata after validation.
* Cross-version movement and retro warnings should be clear but compact; they must not bury the ranked trend surface or story log.

### Potential Challenges

* Field naming overlap with existing `confidence`: use explicit `sampleConfidence` or equivalent language and update labels carefully.
* Score dampener can accidentally boost low raw scores if it pulls toward a floor: clamp and test the formula so low-confidence calibration cannot create misleading high scores.
* Movement and retro prompts already have strict validation: add version metadata as local input and output projection without expanding AI authority beyond existing schemas.
* Legacy fixture drift can hide parser regressions: include explicit old-payload fixture assertions in schema and archive tests.

### Relevant Considerations

* \[P00] **Stack conventions**: Keep Bun, Vite, React, Zod, and Cloudflare Worker compatibility; no new runtime dependency is needed.
* \[P02] **Extension payloads and labels stay bounded**: Version and confidence fields must be bounded primitives and labels only.
* \[P24] **Browser-safe export and triage boundaries**: Do not expose raw snapshot or prediction archive paths in UI or Engine Replay labels.
* \[P27] **Trend Finder payload growth needs release checks**: Additive metadata should remain small and covered by later closeout payload checks.
* \[P01] **Extract pure functions, then test, then wire**: Apply this to confidence math and version-comparison helpers.
* \[P27] **Deterministic fallback before AI enrichment**: Confidence dampening and version flags should work with AI disabled.

### Behavioral Quality Focus

Checklist active: Yes Top behavioral risks for this session:

* Legacy archives without version fields could crash movement, prediction, or retro flows.
* Low-sample dampening could silently change rankings without a visible trust cue.
* AI output could accidentally become the authority for local scoring metadata.

***

## 9. Testing Strategy

### Unit Tests

* `scoring.ts`: `SCORING_VERSION`, sample-confidence derivation, dampener bounds, raw-vs-adjusted score metadata, and thin-vs-strong evidence fixtures.
* `snapshots.ts`: current-version stamping, `unknown` defaults, and comparison metadata.
* `movement-analyst.ts`: cross-version fallback records, AI output sanitation, and unchanged movement statuses.
* `predictions.ts` and `retros.ts`: prediction version stamping, retro cross-version flags, due-state preservation, and legacy archive parsing.

### Integration Tests

* `collector.test.ts`: verify a collect flow stamps payload topics, snapshots, predictions, retros, movement inputs, and warnings consistently with no raw paths in browser data.
* Schema/view-model tests: verify old payloads parse, labels are compact, and confidence/version data projects consistently.

### Manual Testing

* Run a Trend Finder fixture path or focused collector test with a prior versionless snapshot and confirm movement still appears with an `unknown` version note.
* Inspect the score breakdown for a thin-evidence topic and a strong-evidence topic to confirm the confidence value is visible and factor rows remain clear.

### Edge Cases

* Zero scoring evidence.
* One evidence row from one source.
* Many evidence rows from one source.
* Many evidence rows from multiple sources.
* Missing or malformed previous snapshot.
* Prediction filed before version stamping and graded after stamping.
* AI movement or prediction output missing local version fields.

***

## 10. Dependencies

### External Libraries

* None.

### Other Sessions

* **Depends on**: `phase28-session01-cross-source-signal-identity-and-dedup`, `phase28-session02-signal-quality-score-and-collection-health`, and Phase 27 prediction/story-log baseline.
* **Depended by**: `phase28-session05-signal-aging-half-lives-and-saturation-refinement`, `phase28-session06-lifecycle-multiplier-and-named-contributions`, `phase28-session07-research-only-calibration-and-cache-retention`, and `phase28-session13-keyword-packs-rotation-and-coverage`.

***

## Next Steps

Run the implement workflow step to begin AI-led implementation.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/.spec_system/archive/sessions/phase28-session03-calibration-version-and-confidence-dampener/spec.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
