> For the complete documentation index, see [llms.txt](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/sources/source-compliance-rss-news.md).

# Source Compliance: RSS And News Feeds

> Implementation-time review completed 2026-05-17. Direct feed re-review completed 2026-06-14 for reviewed public feed URLs only. The Apify source declaration remains a reviewed fallback.

***

## Source Overview

| Field               | Value                                                                     |
| ------------------- | ------------------------------------------------------------------------- |
| Source Name         | RSS/news-style feeds                                                      |
| Format Reference    | <https://www.rssboard.org/rss-profile>                                    |
| Feed Terms          | Publisher-specific terms, robots guidance, and feed metadata              |
| Authentication      | None for public feeds unless publisher states otherwise                   |
| Data Format         | RSS/Atom XML from publisher; Apify Dataset JSON after Actor normalization |
| Trend Finder Status | Direct reviewed-feed adapter approved; Apify declaration remains fallback |
| Default Source ID   | `rss-ai-news`                                                             |

***

## Phase 06 Candidate Declaration

| Field                   | Value                                                                           |
| ----------------------- | ------------------------------------------------------------------------------- |
| Source ID               | `rss-ai-news`                                                                   |
| Primary Apify candidate | `xtech/feed-extractor`                                                          |
| Fallback candidate      | none recorded                                                                   |
| Validation status       | Fixture-backed and live tiny-validated on 2026-05-17; rerun is credential-gated |
| Direct adapter status   | Approved for reviewed public feed URLs with current-only item caps              |

***

## Terms and Use Boundary

RSS is a syndication format, not a universal license to republish all feed content. The RSS Best Practices Profile warns that missing copyright metadata does not make feed content public domain. Trend Finder may collect only compact metadata needed to link back to the publisher and rank a trend signal.

Approved generic RSS/news evidence:

* Feed item title.
* Canonical article URL.
* Short description or excerpt.
* Published timestamp.
* Feed/source name when available.

Not approved unless a per-feed review allows it:

* Full article bodies.
* Paywalled content.
* Author email addresses from feed metadata.
* Comment feeds or comment bodies.
* Media enclosures copied into local assets.
* Feeds that prohibit automated reuse, caching, or AI analysis.

***

## Rate, Fetch, and Cache Requirements

| Requirement  | Policy                                                         |
| ------------ | -------------------------------------------------------------- |
| Feed cadence | Respect feed `ttl`, `skipHours`, and `skipDays` when available |
| HTTP caching | Prefer ETag and Last-Modified validators in direct adapters    |
| Item cap     | Keep per-source item caps low and deterministic                |
| Retry policy | No aggressive retry loops on publisher errors                  |
| Attribution  | Link to the canonical publisher article URL                    |

The current Session 08 implementation uses Apify Dataset normalization only. Any direct feed adapter must add conditional request handling before it is enabled.

## Reviewed Direct Feed Allowlist

Direct RSS collection is approved only for the reviewed allowlist compiled into the script-side adapter. Operators must not add arbitrary feed URLs through the browser.

| Feed                  | URL                             | Approval                                                                    |
| --------------------- | ------------------------------- | --------------------------------------------------------------------------- |
| HNRSS AI newest posts | `https://hnrss.org/newest?q=AI` | Approved for title, canonical link, description snippet, and item date only |

Additional feeds require a document update, source-specific terms check, publisher cadence review, item cap, and tests before they can be added to the allowlist.

## Direct Adapter Approval

The 2026-06-14 re-review approves a direct adapter only under these conditions:

* Fetch only allowlisted public `http` or `https` feed URLs with no username, password, private host, local host, or non-standard scheme.
* Keep the stable source ID `rss-ai-news`, source role `news`, and quality tier `secondary`.
* Respect feed `ttl`, `skipHours`, `skipDays`, `ETag`, and `Last-Modified` metadata when available. Store only bounded conditional metadata in script memory for a run; do not persist raw feed bodies.
* Normalize only title, canonical link, bounded description or summary snippet, item date, and feed/source name.
* Exclude full article bodies, comment bodies, author email fields, media enclosures, private/authenticated feeds, raw XML, and publisher content that prohibits automated reuse.
* Return disabled or degraded readiness before collection if compliance status is not reviewed, a feed is not allowlisted, the URL is private/local, or a feed is malformed, timed out, empty, or rate limited.
* Preserve the Apify source declaration as the fallback if the direct adapter is blocked or produces no usable reviewed evidence.
* Emit a zero-cost public API spend label for direct rows.

***

## Data Collection Boundary

| Data Element        | Stored As           | PII Risk | Status                   |
| ------------------- | ------------------- | -------- | ------------------------ |
| Item title          | Evidence title      | Low      | Approved                 |
| Canonical link      | Evidence URL        | None     | Approved                 |
| Description/excerpt | Evidence snippet    | Low      | Approved with length cap |
| Published date      | `publishedAt`       | None     | Approved                 |
| Feed title          | Source name/context | Low      | Approved                 |

**Not approved**: full post body, comment content, author email, managing editor email, media downloads, profile URLs, private feeds, or publisher content that requires authentication.

***

## Retention and Attribution

| Policy               | Value                                                                                         |
| -------------------- | --------------------------------------------------------------------------------------------- |
| Storage location     | `src/data/live-data.json` and private cache snapshots                                         |
| Retention period     | `live-data.json` overwritten on each collection run; snapshots retained locally until deleted |
| Historical retention | Local snapshots only; no database-backed retention                                            |
| Deletion path        | Delete generated Trend Finder data and snapshots                                              |
| Attribution          | Show source ID `rss-ai-news`, feed/source name, and canonical article link                    |

***

## Phase 14 Historical Window Stance

| Field                  | Value                                                                                        |
| ---------------------- | -------------------------------------------------------------------------------------------- |
| Historical support     | Current-only                                                                                 |
| Source ID              | `rss-ai-news`                                                                                |
| Safe override fields   | None                                                                                         |
| Unsupported reason     | The reviewed feed input reads current feed items and has no reviewed historical archive API. |
| Compliance declaration | `historicalWindowSupport.supported = false`                                                  |

Do not treat feed URLs as historical sources unless each feed has a separate publisher-specific review and a bounded archive contract. This stance does not approve persistent RSS/news history or full-content archival collection.

***

## Implementation Notes

* Phase 06 Session 01 declares `rss-ai-news` with `xtech/feed-extractor` as the primary feed metadata Actor candidate.
* Phase 06 Session 02 adds fixture-backed normalizer coverage for source article URLs, feed labels, published dates, content snippets, and full-body or contact-field exclusion. The 2026-05-17 live tiny validation returned 5 items and 5 public URLs for the configured source.
* Session 08 normalizers prefer canonical article URLs and reject Apify-hosted URLs as browser evidence URLs.
* Email-like strings are stripped from emitted snippets.
* The reviewed candidate declaration uses `https://hnrss.org/newest?q=AI` as a small public feed input. Additional feed URLs still need per-feed review before they are treated as approved sources.

***

## Compliance Checklist

* [x] Generic RSS format guidance reviewed on 2026-05-17
* [x] Publisher-specific review requirement documented
* [x] Metadata-only data boundary documented
* [x] Retention and deletion path documented
* [x] Attribution requirements documented
* [x] Normalizer excludes comment bodies, author emails, and media downloads
* [x] Phase 06 primary Apify candidate recorded
* [x] Phase 06 fixture-backed normalizer validation completed
* [x] Live tiny capped Phase 06 Actor/Dataset validation completed on 2026-05-17
* [x] Phase 14 historical-window stance recorded as current-only
* [x] Direct reviewed-feed approval completed on 2026-06-14
* [x] Initial reviewed direct feed allowlist recorded
* [x] Per-feed approval required before adding more feed URLs


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/sources/source-compliance-rss-news.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
