> For the complete documentation index, see [llms.txt](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/sources/source-compliance-reddit.md).

# Source Compliance: Reddit

> Implementation-time review completed 2026-05-17. Trend Finder has a configured Apify source declaration for Reddit public discussion metadata. Direct Reddit API notes below are implementation guidance for a future direct adapter, not a blocker for the Apify Actor/Dataset path.

***

## Source Overview

| Field             | Value                                                                                  |
| ----------------- | -------------------------------------------------------------------------------------- |
| Source Name       | Reddit                                                                                 |
| API Base URL      | `https://oauth.reddit.com/`                                                            |
| API Documentation | <https://www.reddit.com/dev/api/>                                                      |
| Data API Guidance | <https://support.reddithelp.com/hc/en-us/articles/16160319875092-Reddit-Data-API-Wiki> |
| Terms             | <https://redditinc.com/policies/data-api-terms>                                        |
| Authentication    | OAuth with registered client                                                           |
| Data Format       | JSON                                                                                   |
| Adapter Status    | Configured Apify source declaration; reviewed for Apify Actor adaptation               |

***

## Phase 06 Candidate Declaration

| Field                   | Value                                                                                    |
| ----------------------- | ---------------------------------------------------------------------------------------- |
| Source ID               | `reddit-ai-discussions`                                                                  |
| Primary Apify candidate | `clearpath/reddit-search-scraper`                                                        |
| Fallback candidate      | `iskander/fast-reddit-scraper`                                                           |
| Validation status       | Fixture-backed and live tiny-validated on 2026-05-17; rerun is credential-gated          |
| Direct adapter status   | Deferred; OAuth, deletion, staleness, and rate-limit handling required before direct use |

***

## Terms of Service

Direct Reddit Data API use is governed by Reddit's Data API Terms, Developer Terms, User Agreement, Responsible Builder Policy, and developer documentation. The current guidance requires OAuth authentication, a descriptive User-Agent, and compliance with deletion and rate-limit rules.

Trend Finder's hackathon source path uses configured Apify Actors and Apify Datasets. Those direct API notes should not be treated as a blanket disable gate for the Apify Actor integration.

**Key obligations for any Reddit source path**:

* Use the official authenticated Data API; do not scrape Reddit web pages.
* Register an OAuth client and send a unique, descriptive User-Agent.
* Do not exceed API call limits or evade throttling.
* Do not use Reddit content to train AI or machine-learning models without the required rights.
* Remove stored content or author-identifying data when Reddit content or an account is deleted.
* Do not store comment bodies, user profiles, author IDs, flair, avatars, or other author-identifying data unless a later compliance review explicitly approves that collection.

***

## Rate Limits

| Parameter               | Value                                                                |
| ----------------------- | -------------------------------------------------------------------- |
| Free access limit       | 100 queries per minute per OAuth client ID                           |
| Window behavior         | Averaged over a 10-minute window                                     |
| Required headers        | `X-Ratelimit-Used`, `X-Ratelimit-Remaining`, `X-Ratelimit-Reset`     |
| Unauthenticated traffic | May be blocked; default free limit does not apply                    |
| Required implementation | Header-aware pacing, exponential backoff, and stop-on-limit behavior |

A future adapter must keep concurrency low, respect the reset header, and stop fetching rather than retrying aggressively when remaining quota is exhausted.

***

## Data Collection Boundary

Approved browser-safe evidence boundary for the Apify Actor path:

| Data Element          | API Field          | Stored As                   | PII Risk                   |
| --------------------- | ------------------ | --------------------------- | -------------------------- |
| Post title            | `title`            | Evidence title              | Low; may contain user text |
| Post URL or permalink | `url`, `permalink` | Evidence URL                | Low                        |
| Score                 | `score`            | Relevance input             | None                       |
| Comment count         | `num_comments`     | Relevance input             | None                       |
| Created timestamp     | `created_utc`      | `publishedAt`               | None                       |
| Subreddit name        | `subreddit`        | Source/context label        | Low                        |
| Post ID               | `name` or `id`     | Source-prefixed evidence ID | Low                        |

Excluded from Trend Finder browser-safe artifacts: comment bodies, author usernames, author IDs, user profile URLs, avatars, flair, DMs, moderator data, deleted content, private subreddit content, or any OAuth scope beyond read-only public listing access.

***

## Data Retention

| Policy                  | Value                                                                                         |
| ----------------------- | --------------------------------------------------------------------------------------------- |
| Storage location        | `src/data/live-data.json` and private cache snapshots                                         |
| Retention period        | `live-data.json` overwritten on each collection run; snapshots retained locally until deleted |
| Maximum stale retention | Future Reddit evidence should be refreshed or removed within 48 hours                         |
| Historical retention    | Local snapshots only; no reviewed Reddit historical source window support                     |
| Deletion path           | Delete generated Trend Finder data and snapshots                                              |
| Backup                  | None                                                                                          |

Reddit's deletion rules make long-lived local caches risky. A Reddit adapter must either refresh frequently enough to remove deleted material or avoid storing Reddit-derived text beyond the current generated data file.

***

## Phase 14 Historical Window Stance

| Field                  | Value                                                                                                    |
| ---------------------- | -------------------------------------------------------------------------------------------------------- |
| Historical support     | Current-only                                                                                             |
| Source ID              | `reddit-ai-discussions`                                                                                  |
| Safe override fields   | None                                                                                                     |
| Unsupported reason     | The reviewed Actor input uses relative search filters such as `timeFilter`, not bounded start/end dates. |
| Compliance declaration | `historicalWindowSupport.supported = false`                                                              |

Do not translate a requested historical window into Reddit relative filters. Those filters are not precise historical bounds and Reddit deletion/staleness obligations make unreviewed historical backfill unsafe. This stance does not approve persistent Reddit history.

***

## 2026-06-18 Reddit `.rss` Decision

TrendHustler uses subreddit `.rss` URLs as a no-token workaround after Reddit JSON endpoints return bot-blocking failures. Trend Finder is not adopting that path.

Decision: decline direct Reddit `.rss` collection and keep Reddit collection on the reviewed Apify Actor paths until a future direct adapter uses Reddit's official authenticated Data API with OAuth, header-aware rate limiting, deletion handling, and the existing no-author/no-comment-body evidence boundary.

Rationale:

* Reddit's Data API Terms require access through the documented access information, including OAuth where provided, and prohibit circumventing or exceeding API limitations.
* The TrendHustler reference explicitly frames `.rss` as a way around `.json` blocking, which conflicts with Trend Finder's compliance-first source posture.
* Existing Trend Finder Reddit compliance already defers direct API work until OAuth, deletion, staleness, and rate-limit handling are implemented.

The reviewed `reddit-ai-discussions` and `reddit-ai-subreddit-feed` Apify sources remain the active Reddit paths. Both stay capped, metadata-only, and must continue excluding authors, profile fields, comment bodies, and persistent Reddit history.

***

## Privacy and GDPR Assessment

| Criterion                 | Status                 | Notes                                     |
| ------------------------- | ---------------------- | ----------------------------------------- |
| PII collected             | Planned no             | Do not store authors or profiles          |
| User consent needed       | No app users           | Public source collection only             |
| Data subject rights       | Deletion path required | Delete generated Trend Finder data/cache  |
| Cross-border transfer     | N/A                    | Data remains local                        |
| Data processor agreements | N/A                    | No third-party processing by Trend Finder |
| Legitimate interest basis | Possible               | Public trend analysis for self-use        |

***

## Attribution

When Reddit-sourced evidence is displayed, the UI must show:

* Source identifier: `reddit-ai-discussions`
* Source name: "Reddit AI discussions"
* Link to the original Reddit post or canonical outbound URL
* Subreddit context when stored

***

## Implementation Notes

* `reddit-ai-discussions` is a configured Apify source declaration with `reviewed` compliance status for the Apify Actor/Dataset path. Phase 06 Session 01 records the Clearpath primary candidate and Iskander fallback.
* Phase 06 Session 02 adds fixture-backed normalizer coverage for canonical post URLs, subreddit labels, score/points, comments, created timestamps, and selftext/profile exclusion. The 2026-05-17 live tiny validation returned 5 items and 5 public URLs; raw shape field names included `body`, which remains excluded from emitted evidence.
* Env config may enable the source when `APIFY_TOKEN` and a concrete Actor ID are configured, provided the Actor ID is not a known unresolved placeholder.
* Normalization should keep post-level public metadata and aggregate metrics, while dropping author/profile/comment-body fields from browser-safe evidence.
* Future direct API work must use OAuth and a descriptive User-Agent.

***

## Compliance Checklist

* [x] Terms and developer guidance reviewed
* [x] Rate limits documented
* [x] Data retention policy drafted
* [x] PII minimization boundary drafted
* [x] Attribution requirements documented
* [x] Implementation-time terms re-review completed on 2026-05-17
* [x] Apify Actor path separated from future direct API requirements
* [x] Normalizer tests prove author/body exclusions
* [x] Phase 06 primary and fallback Apify candidates recorded
* [x] Phase 06 fixture-backed normalizer validation completed
* [x] Live tiny capped Phase 06 Actor/Dataset validation completed on 2026-05-17
* [x] Phase 14 historical-window stance recorded as current-only
* [x] 2026-06-18 direct `.rss` bypass decision recorded as declined
* [ ] Future direct API adapter enforces rate limits if implemented

***

*This document must be reviewed again before adding a direct Reddit API adapter.*


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/sources/source-compliance-reddit.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
