> For the complete documentation index, see [llms.txt](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/sources/source-compliance-podcasts.md).

# Source Compliance: Podcasts And Audio

> Reviewed 2026-06-21. Decision: defer podcast/audio collection and transcription implementation. This document is a compliance gate for Phase 29 Session 17; it does not approve a source declaration, collector, transcript cache, media download, transcription provider, browser payload, or UI surface.

***

## Decision Summary

| Field                           | Value                                                                                                                          |
| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| Decision                        | Defer                                                                                                                          |
| Scope decided                   | Podcast/audio source and transcript handling for Trend Finder                                                                  |
| Session 17 state                | Deferred; do not implement in Phase 29                                                                                         |
| Approved runtime source         | None                                                                                                                           |
| Approved transcription provider | None                                                                                                                           |
| Approved browser payload        | None for podcast themes in Phase 29                                                                                            |
| Reopen condition                | Source-specific terms, provider, cache, retention, attribution, spend, parser, and leak tests are reviewed in a future session |

The safest available decision is defer. Podcast metadata sources can expose public show and episode metadata, but the Session 17 goal requires cross-show themes from audio or transcripts. That crosses a higher-risk boundary than Trend Finder's current reviewed public metadata adapters. No current provider path in this review cleanly approves audio download, transcript retention, transcript summarization, and browser-safe theme publication together.

Until a future source-specific review changes this decision, Trend Finder must not collect, cache, transcribe, summarize, cluster, or publish podcast/audio content.

## Source And Provider Candidate Review

| Candidate                             | Terms and access                                                                                                                                            | Data available                                                                                                  | Rate, quota, or spend                                                                                                                 | Decision                                                                                                                  |
| ------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| Apple iTunes Search API               | Public search and lookup API for Store catalog metadata; terms tie promotional assets to store promotion and badges.                                        | Podcast, podcast author, and podcast episode catalog metadata. No transcript body.                              | Apple documents an approximate 20 calls per minute Search API limit, subject to change.                                               | Metadata candidate only; not enough for Session 17 themes.                                                                |
| Apple Podcasts content policy         | Creator-side content guidelines require accurate metadata and prohibit misleading or unauthorized content.                                                  | Policy context for metadata accuracy, not a collection API.                                                     | N/A.                                                                                                                                  | Use as source-quality context only; does not approve scraping or transcription.                                           |
| Spotify Web API podcast endpoints     | OAuth catalog metadata endpoint; Spotify policy notes prohibit downloading Spotify content and require attribution for Spotify metadata and visual content. | Show and episode metadata. No approved audio/transcript download path for this project.                         | Development mode and extended quota mode are app-review gated; extended quota requests are organization-only as of the reviewed docs. | Deferred; do not use for Session 17.                                                                                      |
| Podcast Index API                     | Public podcast directory API with OpenAPI documentation.                                                                                                    | Feed, show, and episode metadata depending on endpoint. Transcript availability is not approved by this review. | Requires implementation-time auth, rate, and terms review.                                                                            | Deferred; needs source-specific terms and parser review.                                                                  |
| Public RSS or Atom feeds              | Feed metadata can include episode titles, descriptions, links, dates, and enclosure URLs.                                                                   | Public feed metadata; enclosure URLs are media pointers, not approved media inputs.                             | Per-feed terms and cadence vary; source list would need allowlist review.                                                             | Deferred; metadata-only candidate for a future session.                                                                   |
| Apify podcast Actors                  | Community Actors can expose podcast metadata and may charge per event.                                                                                      | Depends on Actor output; reviewed sample showed show and episode metadata, not a reviewed transcript path.      | Paid runner risk; Actor output and pricing can change.                                                                                | Deferred; no static declaration or runtime config approved.                                                               |
| Groq speech-to-text                   | Transcription API for caller-provided audio files.                                                                                                          | Text transcript generated from audio input.                                                                     | Paid API/provider path with API credential and audio upload boundary.                                                                 | Blocked until an audio source, consent/rights posture, retention policy, and provider data terms are separately approved. |
| Existing YouTube creator-video source | Reviewed Apify metadata path already exists.                                                                                                                | Public creator video metadata and aggregate metrics only.                                                       | Existing low-tier caps and Apify spend labels apply.                                                                                  | Not a podcast transcript path; transcripts, comments, thumbnails, audio, and video remain excluded.                       |

Provider feasibility is not approval. A provider may be technically able to return metadata or transcripts and still remain blocked for Trend Finder until the source, media, cache, retention, parser, attribution, and spend boundaries are approved together.

## Data Boundary

| Data element                                  | Current classification                                                 | Rationale                                                                                                                                                                                                              |
| --------------------------------------------- | ---------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Transcript bodies                             | Blocked                                                                | Raw transcript text is high-risk source content and must not be collected, cached, committed, summarized into a source payload, or exposed to browser surfaces under this decision.                                    |
| Speaker names from transcript or diarization  | Blocked                                                                | Diarized speaker identity can expose person-level data not present in safe metadata and is not required for current Trend Finder evidence.                                                                             |
| Comments, reviews, replies, and listener text | Blocked                                                                | Comment-body collection is outside the current privacy boundary and repeats the already rejected social-reach risk.                                                                                                    |
| Thumbnails, cover art, and artwork files      | Blocked                                                                | No podcast artwork is approved for download, caching, transformation, or committed asset use. External artwork URLs are not approved browser payload fields for this source.                                           |
| Media URLs and enclosure URLs                 | Blocked                                                                | Audio/video/enclosure URLs are media pointers and must not be emitted to browser payloads, Engine Replay, static Brief, or source setup.                                                                               |
| Raw audio or video files                      | Blocked                                                                | Downloading, uploading, caching, or committing media is not approved.                                                                                                                                                  |
| Provider responses and raw API rows           | Blocked                                                                | Raw rows can include unreviewed fields and must not be committed or exposed.                                                                                                                                           |
| Episode URLs                                  | Allowed only as future safe attribution after source-specific approval | A canonical public episode or show page URL can be browser-safe attribution, but no runtime collection is approved by this decision. Tracking, redirect, media, localhost, private, or provider-run URLs stay blocked. |
| Episode titles                                | Allowed only as future safe attribution after source-specific approval | Public episode titles may be safe metadata, subject to parser bounds and source terms. No Session 17 payload may add them in Phase 29.                                                                                 |
| Show names                                    | Allowed only as future safe attribution after source-specific approval | Public show names may be safe source labels, subject to parser bounds and source terms.                                                                                                                                |
| Publisher or feed names                       | Allowed only as future safe attribution after source-specific approval | Public publisher names may be safe source labels when the source terms permit metadata use.                                                                                                                            |
| Theme labels                                  | Blocked for Phase 29 runtime                                           | A future approval may allow bounded derived labels only after transcript/media and provider rules are approved.                                                                                                        |
| Theme summaries                               | Blocked for Phase 29 runtime                                           | A future approval may allow short derived summaries only if they do not quote transcripts, reveal speaker-level claims, or imply unreviewed transcript retention.                                                      |
| Per-episode angle                             | Blocked for Phase 29 runtime                                           | A future approval may allow a bounded angle tied to safe attribution, but Session 17 remains deferred now.                                                                                                             |
| Spend labels                                  | Allowed as future operational metadata only                            | If a future provider is approved, spend must be labeled as exact, estimated, mixed, unavailable, or not applicable without raw billing payloads.                                                                       |

## Browser-Safe Payload Boundary

No podcast payload is approved for Phase 29.

If a future session reopens the path and approves it, the maximum browser-safe projection must be limited to:

* theme label
* bounded theme summary
* per-episode safe attribution with show name, episode title, canonical public episode or show URL, and bounded angle
* source status, item counts, and spend labels

Even in a future approval, browser surfaces must not receive transcript bodies, quotes, speaker diarization, comments, thumbnails, artwork, audio/video URLs, raw provider rows, prompt text, model responses, credential-shaped strings, local cache paths, Actor IDs, Dataset IDs, run IDs, or private diagnostics.

## Retention, Deletion, And Private Cache Rules

Because this decision defers implementation, no podcast cache may be created in Phase 29 and no deletion flow is needed for a runtime artifact that must not exist.

Minimum conditions for a future approval:

| Area                 | Required condition before approval                                                                                                                                                                             |
| -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Transcript retention | Raw transcript text must be disabled by default. Any temporary local transcript cache needs explicit max age, local-only storage, gitignore coverage, deletion command, and tests proving no browser exposure. |
| Media retention      | Audio/video downloads remain blocked unless a separate media-rights review approves an ephemeral local processing path.                                                                                        |
| Metadata retention   | Generated browser data may contain only approved public metadata fields and must be overwritten by normal aggregate output rules.                                                                              |
| Private cache        | Cache paths must be local, gitignored, redacted from traces, and absent from browser payloads and static Brief exports.                                                                                        |
| Deletion path        | A future implementation must document how to delete generated data, private transcript cache, media temp files, source snapshots, and provider artifacts.                                                      |
| Termination path     | If a provider account, source license, or platform access is suspended or terminated, collection must stop and generated podcast-derived data must be deleted.                                                 |
| Backups              | No backups or durable stores are approved for transcript or media content.                                                                                                                                     |

## Attribution Rules

No runtime attribution is approved in Phase 29.

If a future source-specific review approves metadata collection, every browser-visible podcast attribution must include:

* source ID and source name
* public show name when allowed by source terms
* public episode title when allowed by source terms
* canonical public episode or show URL
* provider-specific attribution requirements, such as store badges or service logos, only when those requirements are approved for the UI surface
* no implication of platform, publisher, show, host, or provider endorsement

Attribution must not include speaker diarization, full transcript quotes, private feed URLs, enclosure URLs, tracking URLs, credential-bearing URLs, provider run URLs, cache paths, thumbnails, artwork files, comments, ratings, or review text.

## Spend And Rate Policy

No spend-bearing podcast or transcription provider is approved in Phase 29.

A future approval must define:

* provider identity and source role
* exact or estimated spend label
* cap semantics and whether the cap is enforced by Trend Finder or by the provider
* timeout, retry, and stop-on-cap behavior
* no retry storms and no automatic paid fan-out across shows
* whether a zero-cost public metadata path can skip a paid fallback
* redaction rules for billing payloads, account IDs, run IDs, Dataset IDs, and provider logs

Spend summaries may expose only bounded labels, capped USD values, provider class, source ID, and reason. They must not expose billing records, account identifiers, raw usage payloads, tokens, private config paths, or provider run URLs.

## Approval, Defer, And Reject Criteria

### Approve Criteria

A future session may approve a podcast path only when all of these are true:

1. The exact source is named and its current terms, API docs, attribution rules, rate limits, and retention rules are reviewed.
2. The path avoids raw audio/video download, or a separate media-rights review explicitly approves an ephemeral local media processing path.
3. Transcript text is either not collected or is handled only in a documented local private cache with deletion tooling, max age, and leak tests.
4. Parser fixtures prove only approved public metadata and bounded derived summaries reach browser-safe payloads.
5. Tests prove transcripts, comments, thumbnails, artwork, media URLs, provider responses, prompts, credentials, and private paths are excluded from browser payloads, Engine Replay, static Brief, and committed artifacts.
6. Spend labels and caps are explicit before any paid provider can run.
7. Session 17 or a replacement session updates docs and schemas without describing planned behavior as implemented.

### Defer Criteria

Defer when the source may become usable later but one or more required reviews, provider choices, parser fixtures, cache rules, spend labels, or leak tests are missing. That is the current decision.

### Reject Criteria

Reject a provider or source path if it requires any of these:

* scraping or downloading protected platform media without reviewed permission
* storing raw audio/video or full transcript bodies durably
* collecting comments, reviews, replies, ratings text, private feeds, or authorized user data
* exposing media URLs, artwork, thumbnails, transcript quotes, speaker names, private paths, credentials, provider responses, Actor IDs, Dataset IDs, or run IDs to browser or static Brief surfaces
* adding a browser raw-input editor, unreviewed source setup target, or provider credential surface

## Session 17 Handoff

Session 17 is deferred by this decision.

The following work is not allowed in Phase 29:

* podcast source declaration
* direct podcast adapter
* Apify podcast Actor declaration or starter config
* podcast RSS allowlist
* audio or video downloader
* transcript hydration, diarization, or summarization code
* Groq, Whisper, or other speech-to-text provider integration
* transcript cache, media cache, or podcast snapshot file
* browser podcast theme surface
* static Brief podcast theme section
* source setup fields for podcast sources
* new podcast env keys or package dependencies

A future session may replace Session 17 with a narrower compliance-first plan, for example "metadata-only podcast directory review" or "bounded transcript cache proof," but that work must start with a new source-specific spec.

## No-Change Rationale For Related Docs

`docs/media-policy.md` does not need a Phase 29 update because no audio, transcript, thumbnail, artwork, media URL, media download, committed asset, or browser-visible media handling is approved.

`docs/sources/apify-source-onboarding.md` does not need a Phase 29 update because no podcast Actor, provider path, source declaration, setup target, fixture, normalizer, spend cap, or runtime config is approved.

## Compliance Checklist

* [x] Existing YouTube compliance boundary reviewed.
* [x] Existing Apify/source onboarding gate reviewed.
* [x] Current Trend Finder source docs reviewed.
* [x] Candidate podcast metadata and transcription paths reviewed at a compliance-gate level.
* [x] Final decision recorded as defer.
* [x] Transcript bodies classified.
* [x] Speaker names classified.
* [x] Comments classified.
* [x] Thumbnails and artwork classified.
* [x] Media URLs and raw audio/video classified.
* [x] Episode URLs, episode titles, show names, and summary fields classified.
* [x] Retention, deletion, private cache, attribution, and spend policy recorded.
* [x] Session 17 handoff recorded as deferred.
* [x] Media policy and Apify onboarding no-change rationale recorded.
* [ ] Future source-specific approval, parser fixtures, and leak tests before any podcast implementation.

***

*This document must be reviewed again before adding any podcast source, podcast metadata collector, transcript cache, audio/video processor, speech-to-text provider, podcast theme clustering, or podcast browser surface.*


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/sources/source-compliance-podcasts.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.