> For the complete documentation index, see [llms.txt](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/.spec_system/archive/sessions/phase29-session16-podcast-compliance-package/spec.md).

# Session Specification

**Session ID**: `phase29-session16-podcast-compliance-package` **Phase**: 29 - Trend Finder TrendingAI Comparison Adoption **Status**: Not Started **Created**: 2026-06-21

***

## 1. Session Overview

This session creates the compliance package for any future podcast or audio source work in Trend Finder. Its purpose is to decide, explicitly, whether podcast transcription can be collected, cached, summarized, and used for cross-show themes under the project's local-first and browser-safe posture.

The session is intentionally a gate, not an implementation pass. Phase 29 has already shipped the deterministic comparison improvements through Session 15; Session 17 remains blocked until this package records an approve, defer, or reject decision for podcast media handling.

The work matters because podcast transcription crosses a higher-risk source and media boundary than the currently reviewed public API and feed adapters. The output must leave future implementers with clear allowed fields, blocked fields, retention rules, spend labeling, attribution requirements, and the exact Session 17 handoff state.

***

## 2. Objectives

1. Create a podcast/audio source compliance document that follows existing source-compliance conventions.
2. Explicitly classify transcript bodies, speaker names, comments, thumbnails, media URLs, raw audio/video, episode URLs, show names, and summary fields as allowed or blocked.
3. Record an approve, defer, or reject decision with rationale and concrete conditions for Session 17.
4. Update source, media, and phase planning docs so they describe the decision without presenting planned podcast behavior as implemented.

***

## 3. Prerequisites

### Required Sessions

* [x] `phase29-session15-pre-run-estimate` - Completes the prior Phase 29 operator-polish session before podcast compliance begins.
* [x] `phase29-session04-corroboration-gate` - Provides the convergence model Session 17 must reuse if podcast themes are approved.
* [x] `phase28-session14-direct-first-party-source-adapters` - Provides the current direct-source compliance and fallback posture to preserve.

### Required Tools/Knowledge

* Existing source compliance document structure under `docs/sources/`.
* Current media policy and Trend Finder source documentation.
* Current Apify/source onboarding rules and browser-safe payload boundaries.

### Environment Requirements

* No source credential, media file, transcript body, cache path, token, or raw provider response should be created or committed during this session.
* Documentation must remain ASCII-only and use Unix LF line endings.

***

## 4. Scope

### In Scope (MVP)

* Maintainer can review the podcast/audio boundary - Create `docs/sources/source-compliance-podcasts.md` with terms, source candidates, data boundary, attribution, retention, deletion, private cache, and spend policy.
* Maintainer can make an explicit decision - Record approve, defer, or reject with rationale, unresolved questions, and conditions for changing the decision later.
* Future implementer can safely handle Session 17 - Update the Phase 29 session stubs so Session 17 is either unblocked with strict conditions or marked deferred/rejected.
* Operator-facing docs stay truthful - Update Trend Finder source/media docs only to describe the approved or deferred boundary, not unimplemented collector behavior.

### Out of Scope (Deferred)

* Any podcast collector, adapter, transcription, clustering, or UI implementation - Reason: Session 16 is compliance/documentation only.
* Downloading, caching, or committing audio, video, transcript, thumbnail, or comment data - Reason: the boundary has not been approved before the session starts.
* Adding dependencies such as `yt-dlp`, `ffmpeg`, Whisper/Groq clients, or new source runner code - Reason: implementation belongs to Session 17 only if the boundary is approved.
* Approving X, TikTok, Instagram, Bluesky, or other broader social collection - Reason: Phase 29 keeps those as documented non-goals under the current posture.

***

## 5. Technical Approach

### Architecture

Use the existing source-compliance documentation pattern as the architecture for the decision. The podcast compliance doc should separate source/provider review, allowed browser-safe fields, explicitly blocked fields, private cache rules, retention/deletion policy, attribution, rate/cost/spend labels, and implementation conditions.

If the decision approves any podcast path, the doc must define the minimal browser-safe payload Session 17 may publish: theme label, bounded summary, and per-episode safe attribution only. If the decision defers or rejects the path, the doc must state why and update Session 17 so implementation does not proceed by implication.

### Design Patterns

* Compliance-first source activation: No source/media boundary is collected before its compliance doc exists.
* Browser-safe projection: Only bounded labels and safe attribution may be considered for browser or static Brief surfaces.
* Explicit blocked fields: Raw transcripts, media, comments, credentials, private cache paths, and unreviewed source payloads stay out of committed and browser-visible artifacts.
* Truthful documentation: Planned or rejected behavior must not be described as implemented.

### Technology Stack

* Markdown documentation under `docs/` and `.spec_system/PRD/`.
* Existing Bun/Prettier documentation formatting checks.
* Shell-based text scans for ASCII, source-code drift, and private-artifact claims.

***

## 6. Deliverables

### Files to Create

| File                                         | Purpose                                                                                                                                    | Est. Lines |
| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ---------- |
| `docs/sources/source-compliance-podcasts.md` | Podcast/audio compliance package with explicit decision, allowed/blocked fields, retention, attribution, spend, and Session 17 conditions. | \~180      |

### Files to Modify

| File                                                                 | Changes                                                                                                                           | Est. Lines |
| -------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | ---------- |
| `docs/extensions/trend-finder-sources.md`                            | Add podcast boundary status and explain whether Session 17 is approved, deferred, or rejected.                                    | \~35       |
| `docs/media-policy.md`                                               | Update only if the decision approves audio/transcript/media handling; otherwise keep a no-change rationale in the compliance doc. | \~25       |
| `docs/sources/apify-source-onboarding.md`                            | Update only if a provider/source path is approved; otherwise keep provider onboarding deferred.                                   | \~25       |
| `.spec_system/PRD/phase_29/session_16_podcast_compliance_package.md` | Record implementation status and final decision summary when shipped.                                                             | \~20       |
| `.spec_system/PRD/phase_29/session_17_podcast_themes_enrichment.md`  | Reflect the decision by keeping Session 17 blocked, unblocking it with conditions, or marking it deferred/rejected.               | \~20       |
| `.spec_system/PRD/phase_29/PRD_phase_29.md`                          | Update the phase tracker and folded comparison notes truthfully after the decision.                                               | \~30       |

***

## 7. Success Criteria

### Functional Requirements

* [ ] `docs/sources/source-compliance-podcasts.md` exists and records an explicit approve, defer, or reject decision.
* [ ] Transcript bodies, speaker names, comments, thumbnails, media URLs, raw audio/video, episode URLs, show names, and summary fields are each classified as allowed or blocked.
* [ ] Retention, deletion, private cache, attribution, and spend-label rules are documented.
* [ ] Session 17 is either unblocked with strict implementation conditions or marked deferred/rejected.
* [ ] No documentation claims podcast collection, transcription, or theme clustering is implemented unless Session 17 later implements it.

### Testing Requirements

* [ ] Documentation formatting check passes for touched Markdown files.
* [ ] ASCII validation passes for new and touched spec/docs files.
* [ ] Text scan confirms no source code, collector, dependency, transcript, media, or credential artifacts were added by this session.
* [ ] Manual review confirms the decision and Session 17 handoff are unambiguous.

### Non-Functional Requirements

* [ ] No raw transcript, prompt, provider response, token, private cache path, media file, thumbnail, comment body, or local private artifact is committed or described as browser-safe.
* [ ] The source boundary remains local-first and browser-safe.
* [ ] Broader social reach remains documented as a non-goal unless a future phase changes the compliance posture.

### Quality Gates

* [ ] All files ASCII-encoded.
* [ ] Unix LF line endings.
* [ ] Documentation follows project conventions.
* [ ] Source/media behavior remains docs-only in this session.

***

## 8. Implementation Notes

### Key Considerations

* The decision should be explicit even if it is "defer"; silence would accidentally unblock Session 17.
* Existing YouTube compliance already blocks transcripts, comments, audiovisual content, and cached thumbnails for browser-safe artifacts; podcast policy must either preserve that strict posture or explain any narrower approved exception.
* If a provider is named, the doc should distinguish provider feasibility from collection approval. A provider mention is not approval unless the decision says so.
* Media-policy changes are conditional. If the decision is defer or reject, the compliance doc should state why `docs/media-policy.md` did not need a change.

### Potential Challenges

* Podcast platform terms are source-specific: Prefer defer/reject over a broad approval if terms, retention, or attribution cannot be verified.
* Transcript summaries may still reveal sensitive content: Publish only bounded theme summaries and safe attribution if the path is approved.
* Provider spend can be unclear: Use estimated or paid labels, and avoid hard cap claims unless a later runner enforces them.
* Session 17 handoff can drift: Update both the session stub and phase PRD so the workflow cannot skip the compliance decision.

### Relevant Considerations

* \[P02] **Extension payloads and labels stay bounded**: Any future podcast payload must keep live, fallback, degraded, blocked, and unavailable states explicit and bounded.
* \[P24] **Browser-safe export and triage boundaries**: Static Brief and browser surfaces must not expose raw evidence, private paths, transcripts, comments, media, or local triage data.
* \[P28] **Direct public source scope is narrow**: Do not approve a new source path by implication; source-specific compliance and parser tests come first.
* \[P28] **Deferred source candidates remain gated**: Podcast work stays blocked unless this session explicitly approves the boundary.
* \[P00] **Do not document planned features as implemented**: Source docs must describe the decision and current behavior accurately.

***

## 9. Testing Strategy

### Unit Tests

* None required unless implementation discovers existing doc-check helpers.

### Integration Tests

* Run focused documentation formatting for all touched Markdown files.
* Run text scans proving no source runner, collector, media artifact, transcript artifact, or new dependency was introduced.

### Manual Testing

* Read the compliance doc and verify the approve/defer/reject decision is easy to find.
* Read Session 17 and verify its blocked, unblocked, deferred, or rejected state matches the compliance decision.
* Read Trend Finder source docs and verify they describe current behavior, not planned podcast implementation.

### Edge Cases

* Decision is deferred because terms or provider boundaries are incomplete.
* Decision rejects transcription but allows podcast episode metadata only.
* Decision approves summary-only themes but blocks transcript bodies and media.
* Provider cost labels are estimated rather than exact.
* Media policy does not need an update because no media handling is approved.

***

## 10. Dependencies

### External Libraries

* None.

### Other Sessions

* **Depends on**: `phase29-session15-pre-run-estimate`, `phase29-session04-corroboration-gate`, `phase28-session14-direct-first-party-source-adapters`.
* **Depended by**: `phase29-session17-podcast-themes-enrichment`, `phase29-session18-documentation-validation-and-release`.

***

## Next Steps

Run the implement workflow step to begin AI-led implementation.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/.spec_system/archive/sessions/phase29-session16-podcast-compliance-package/spec.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
