> For the complete documentation index, see [llms.txt](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/apify.md).

# Apify Integration

Last reviewed: 2026-05-27

This document is the map for Trend Finder's Apify integration: what is already implemented, which files own each layer, how Actors are configured and run, and what still needs attention before broadening Apify-backed sources.

## Current Status

* `apify-client` is installed in `package.json` at `^2.23.3`.
* The generic script runtime is implemented under `scripts/lib/apify/`.
* The Trend Finder adapter is implemented under `scripts/extensions/trend-finder/sources/`.
* Product aggregation can run configured Apify Actors through `bun run aggregate`, but only when the Trend Finder extension is enabled, `APIFY_TOKEN` exists, and a runtime source list is provided through `data/trend-finder.apify-actors.json`, `FINDTREND_APIFY_ACTORS_PATH`, or the legacy `FINDTREND_APIFY_ACTORS` inline override.
* Static source declarations document reviewed source IDs, candidate Actor metadata, compliance docs, and pending validation status. They do not currently create the runnable source list by themselves.
* Phase 06 Session 02 adds a fixture-backed normalizer coverage set and a `--tiny-validate` smoke mode. Local tests use safe hand-authored fixtures; live tiny validation is credential-gated and explicit-config-gated.
* Phase 06 Session 07 adds dashboard provenance labels and a repeatable demo workflow so live, fixture/demo, deterministic fallback, degraded, and blocked states are visible in the browser.
* Phase 06 Session 08 certifies the local submission path with credential-free tests, targeted browser proof, private artifact checks, and final Loom-safe wording in [Hackathon Submission](/ai-os-and-trend-finder-docs/docs/hackathon/hackathon-submission.md).
* The Phase 05 six-source runtime set was smoke-validated on 2026-05-17 with `bun run apify:smoke -- --check-sources --json`, but Phase 06 and the 2026-05-27 next-source batch have updated several primary Actor candidates. Treat the current ten-source candidate set as reviewed declarations plus fixture-backed normalizer coverage. A live tiny validation run on 2026-05-17 succeeded for the original six configured sources with 5 public evidence URLs each. A 2026-05-27 follow-up replaced the empty Google News candidate with `thescrappa/google-news-scraper`; single-source live validation then returned 5 public URLs for `google-ai-news`.
* Actor schemas and pricing can change in the Apify Store, so rerun the smoke source check after changing Actor IDs, source inputs, item caps, or charge caps.

## Official Apify References

* [Apify API client for JavaScript](https://docs.apify.com/api/client/js/docs)
* [JavaScript client reference](https://docs.apify.com/api/client/js/reference/class/ApifyClient)
* [Apify REST API v2](https://docs.apify.com/api/v2)
* [Getting started with Apify API](https://docs.apify.com/api/v2/getting-started)
* [API tokens and integrations](https://docs.apify.com/platform/integrations/api)
* [Actors platform docs](https://docs.apify.com/platform/actors)
* [Dataset storage docs](https://docs.apify.com/platform/storage/dataset)
* [Apify Store](https://apify.com/store)

## Local References

* [Brainstorm Apify worker research](/ai-os-and-trend-finder-docs/docs/hackathon/brainstorm-hackathon.md#apify-worker-research)
* [Scripts README Apify runtime foundation](/ai-os-and-trend-finder-docs/scripts/readme_scripts.md#apify-runtime-foundation)
* [Testing matrix](/ai-os-and-trend-finder-docs/docs/testing.md)
* [Environment docs](/ai-os-and-trend-finder-docs/docs/environments.md)
* [Trend Finder demo workflow](/ai-os-and-trend-finder-docs/docs/hackathon/trend-finder-demo.md)
* [Adding Apify sources to Trend Finder](/ai-os-and-trend-finder-docs/docs/sources/apify-source-onboarding.md)
* [GitHub source compliance](/ai-os-and-trend-finder-docs/docs/sources/source-compliance-github.md)
* [Reddit source compliance](/ai-os-and-trend-finder-docs/docs/sources/source-compliance-reddit.md)
* [arXiv source compliance](/ai-os-and-trend-finder-docs/docs/sources/source-compliance-arxiv.md)
* [Product Hunt source compliance](/ai-os-and-trend-finder-docs/docs/sources/source-compliance-producthunt.md)
* [YouTube source compliance](/ai-os-and-trend-finder-docs/docs/sources/source-compliance-youtube.md)
* [RSS/news source compliance](/ai-os-and-trend-finder-docs/docs/sources/source-compliance-rss-news.md)
* [Hugging Face Papers source compliance](/ai-os-and-trend-finder-docs/docs/sources/source-compliance-huggingface-papers.md)
* [Hugging Face Models source compliance](/ai-os-and-trend-finder-docs/docs/sources/source-compliance-huggingface-models.md)
* [Reddit subreddit feed source compliance](/ai-os-and-trend-finder-docs/docs/sources/source-compliance-reddit-subreddit-feed.md)
* [Google News source compliance](/ai-os-and-trend-finder-docs/docs/sources/source-compliance-google-news.md)

## Runtime Flow

The normal product path is:

```
bun run aggregate
  -> scripts/aggregate.ts
  -> scripts/lib/extensions/runner.ts
  -> scripts/extensions/trend-finder/collector.ts
  -> scripts/extensions/trend-finder/sources/apify-adapter.ts
  -> scripts/lib/apify/client.ts
  -> scripts/lib/apify/actors.ts
  -> scripts/lib/apify/datasets.ts
  -> scripts/extensions/trend-finder/sources/apify-normalizers.ts
  -> src/data/live-data.json
```

Detailed ownership:

| Layer                      | File                                                             | Responsibility                                                                                                          |
| -------------------------- | ---------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------- |
| Package dependency         | `package.json`                                                   | Declares `apify-client`.                                                                                                |
| Env loading                | `scripts/lib/load-env.ts`                                        | Loads `.env.local` into script processes without overriding existing env vars.                                          |
| Aggregate entrypoint       | `scripts/aggregate.ts`                                           | Loads env, runs enabled extension collectors, writes generated live data.                                               |
| Extension enablement       | `scripts/lib/extensions/runner.ts`                               | Runs only extension IDs listed in `VITE_CLAUDE_OS_ENABLED_EXTENSIONS`.                                                  |
| Trend Finder collector     | `scripts/extensions/trend-finder/collector.ts`                   | Runs built-in source adapters, then Apify collection, then scoring/analysis.                                            |
| Trend Finder Apify config  | `scripts/extensions/trend-finder/sources/apify-source-config.ts` | Enriches configured Apify sources with Trend Finder source metadata, candidate metadata, placeholder checks, and gates. |
| Compliance gate            | `scripts/extensions/trend-finder/sources/source-compliance.ts`   | Allows only `reviewed` sources to collect in the product path.                                                          |
| Apify adapter              | `scripts/extensions/trend-finder/sources/apify-adapter.ts`       | Creates client, runs each source, reads datasets, emits evidence/source rows.                                           |
| Generic Apify config       | `scripts/lib/apify/config.ts`                                    | Loads the Apify source JSON file or inline override, then parses defaults, token presence, and warnings.                |
| Generic Apify client       | `scripts/lib/apify/client.ts`                                    | Creates `ApifyClient` with retries and redacted readiness.                                                              |
| Actor runner               | `scripts/lib/apify/actors.ts`                                    | Calls `client.actor(source.actorId).call(source.input, ...)`.                                                           |
| Dataset reader             | `scripts/lib/apify/datasets.ts`                                  | Reads bounded Dataset items with `client.dataset(datasetId).listItems(...)`.                                            |
| Generic normalization      | `scripts/lib/apify/normalize.ts`                                 | Separates raw items from provenance and generic normalized fields.                                                      |
| Trend Finder normalization | `scripts/extensions/trend-finder/sources/apify-normalizers.ts`   | Maps Dataset rows to browser-safe Trend Finder evidence.                                                                |

The actual Actor call is in `scripts/lib/apify/actors.ts`:

```ts
client.actor(source.actorId).call(source.input, {
  memory: source.memoryMb,
  timeout: timeoutSeconds(source),
  waitSecs: timeoutSeconds(source),
  ...(source.maxTotalChargeUsd === undefined
    ? {}
    : { maxTotalChargeUsd: source.maxTotalChargeUsd }),
  log: null,
});
```

After a successful run, `scripts/lib/apify/datasets.ts` reads the Actor's default Dataset with `listItems({ limit, offset: 0, desc: false, clean: true, skipEmpty: true, skipHidden: true })`.

`maxItems` is intentionally used as the Dataset read cap, not as a generic Actor-call option. Some pay-per-event Actors treat actor-level item and charge options as billing limits; sending a generic `maxItems` caused aborts in live smoke checks.

## Required Switches

Three separate switches have to line up before normal evidence Actors run in the product path.

| Switch                              | Required value              | Why it matters                                                              |
| ----------------------------------- | --------------------------- | --------------------------------------------------------------------------- |
| `VITE_CLAUDE_OS_ENABLED_EXTENSIONS` | `trend-finder`              | Enables the Trend Finder collector during `bun run aggregate`.              |
| `APIFY_TOKEN`                       | A real Apify API token      | Allows `createApifyClient` to create an `ApifyClient`; missing token skips. |
| Apify source JSON file or override  | Reviewed source definitions | Provides the runnable source list. Static declarations are not auto-loaded. |

`APIFY_TOKEN` belongs in `.env.local` or the shell environment. Do not expose it through any `VITE_` variable.

The optional `google-trends-demand` path is separate: it runs only when `FINDTREND_GOOGLE_TRENDS_DEMAND_ENABLED` is true-like and does not require or create a normal Apify source JSON entry because it emits metric-only demand signals, not browser evidence.

## Environment Variables

| Variable                                              | Default                                           | Used by                         | Notes                                                                              |
| ----------------------------------------------------- | ------------------------------------------------- | ------------------------------- | ---------------------------------------------------------------------------------- |
| `APIFY_TOKEN`                                         | none                                              | Client factory                  | Required for live Actor runs.                                                      |
| `FINDTREND_APIFY_ACTORS_PATH`                         | `data/trend-finder.apify-actors.json`, if present | Source config                   | Points to an editable JSON source file; relative paths resolve from the repo root. |
| `FINDTREND_APIFY_ACTORS`                              | none                                              | Source config                   | Legacy inline override; JSON array/object or compact `source-id=actor-id` entries. |
| `FINDTREND_APIFY_DEFAULT_MEMORY_MB`                   | `1024`                                            | Generic source parser           | Clamped from `128` to `32768`.                                                     |
| `FINDTREND_APIFY_DEFAULT_TIMEOUT_SECS`                | `120`                                             | Generic source parser/client    | Clamped from `1` to `900`.                                                         |
| `FINDTREND_APIFY_MAX_ITEMS_PER_SOURCE`                | `50`                                              | Generic source parser           | Clamped from `1` to `500`; Trend Finder caps further by quality.                   |
| `FINDTREND_GOOGLE_TRENDS_DEMAND_ENABLED`              | off                                               | Demand enrichment               | Opts into paid Google Trends metric enrichment; not a normal evidence source.      |
| `FINDTREND_GOOGLE_TRENDS_DEMAND_TERMS`                | Creator Lens focus terms                          | Demand enrichment               | Optional comma, semicolon, or newline separated search terms.                      |
| `FINDTREND_GOOGLE_TRENDS_DEMAND_MAX_ITEMS`            | `1`                                               | Demand enrichment               | Clamped to at most 5 terms per run.                                                |
| `FINDTREND_GOOGLE_TRENDS_DEMAND_MAX_TOTAL_CHARGE_USD` | `0.20`                                            | Demand enrichment               | Charge ceiling for the opt-in demand Actor; very low caps may abort.               |
| `VITE_CLAUDE_OS_ENABLED_EXTENSIONS`                   | empty/disabled                                    | Extension runner and browser UI | Use `trend-finder` for this integration.                                           |

If both `FINDTREND_APIFY_ACTORS` and a source file are present, the inline env value wins and the loader emits an `actors-config-shadowed` warning.

## Source Configuration

The recommended local setup is to copy `data/trend-finder.apify-actors.example.json` to the gitignored private file:

```bash
cp data/trend-finder.apify-actors.example.json data/trend-finder.apify-actors.json
```

Then put only the token and optional path in `.env.local`:

```bash
APIFY_TOKEN=<your-token>
FINDTREND_APIFY_ACTORS_PATH=data/trend-finder.apify-actors.json
```

When `FINDTREND_APIFY_ACTORS_PATH` is omitted, the loader also checks the default `data/trend-finder.apify-actors.json` path if that file exists.

The JSON file can be either an array or an object with a `sources` or `actors` array:

```json
{
  "sources": [
    {
      "id": "github-ai-repositories",
      "actorId": "automation-lab/github-trending-scraper",
      "role": "trend-source",
      "enabled": true,
      "input": {
        "language": "typescript",
        "since": "weekly",
        "maxResults": 10
      },
      "memoryMb": 4096,
      "timeoutSecs": 120,
      "maxItems": 10
    }
  ]
}
```

`FINDTREND_APIFY_ACTORS` remains supported as an inline override for CI or one-off shell runs. It supports two forms.

## UI-Backed Private Source Setup

The local Vite dev server registers `/__trend_finder_source_setup` for the Sources tab. It reads and writes only the private `data/trend-finder.apify-actors.json` file. The endpoint is loopback-only, requires the same per-run refresh token used by other privileged local bridges, sets `Cache-Control: no-store`, rejects oversized or malformed bodies, and returns redacted diagnostics.

The UI can show that `APIFY_TOKEN` is present or missing, but it never receives the token value. The UI also does not receive the private file path, raw Actor input dumps, Actor logs, Dataset rows, account IDs, auth headers, or token-like strings.

Supported UI mutations are intentionally narrow:

* enable or disable a reviewed source
* set or clear a reviewed target field inside `input`

Unsupported mutations fail closed. This includes arbitrary JSON edits, unreviewed source IDs, placeholder Actor IDs, private or localhost feed URLs, credential-shaped text, raw Actor options, direct account setup, and historical window shortcuts that are not declared in the source compliance doc.

If the UI creates a source record, it seeds the parser-supported fields from the reviewed static declaration and then mutates only the requested target. If the record already exists, unrelated parser-supported fields are preserved.

Compact form:

```bash
FINDTREND_APIFY_ACTORS=github-ai-repositories=automation-lab/github-trending-scraper;arxiv-ai-papers=easyapi/arxiv-search-scraper
```

Compact form is useful for quick readiness checks, but it cannot provide Actor input, source-level memory, timeouts, item caps, charge caps, or metadata overrides. Most real Trend Finder sources should use JSON.

Inline JSON form:

```bash
FINDTREND_APIFY_ACTORS='[{"id":"github-ai-repositories","actorId":"automation-lab/github-trending-scraper","role":"trend-source","enabled":true,"input":{"language":"typescript","since":"weekly","maxResults":10},"memoryMb":4096,"timeoutSecs":120,"maxItems":10}]'
```

Accepted per-source JSON fields:

| Field                              | Required | Notes                                                                             |
| ---------------------------------- | -------- | --------------------------------------------------------------------------------- |
| `id` or `sourceId`                 | Yes      | Must match a reviewed static declaration to run in the product path.              |
| `actorId` or `actor`               | Yes      | Apify Actor ID, usually `username/actor-name`.                                    |
| `role`                             | No       | One of `trend-source`, `evidence-source`, `watchlist-source`, `discovery-source`. |
| `enabled`                          | No       | Defaults to `true`; compliance can still force it off.                            |
| `input`                            | No       | JSON object passed directly to `actor.call`.                                      |
| `memoryMb` or `memory`             | No       | Source memory override.                                                           |
| `timeoutSecs` or `timeout`         | No       | Source timeout override in seconds.                                               |
| `maxItems` or `maxItemsPerSource`  | No       | Dataset read cap before Trend Finder quality caps.                                |
| `maxTotalChargeUsd`                | No       | Optional per-run Apify charge cap for paid Actors.                                |
| `name` or `sourceName`             | No       | Trend Finder display override. JSON only.                                         |
| `sourceRole` or `trendRole`        | No       | Trend Finder role override. JSON only.                                            |
| `qualityTier` or `quality`         | No       | Trend Finder quality override. JSON only.                                         |
| `complianceStatus` or `compliance` | No       | Cannot self-approve unknown sources as `reviewed`.                                |
| `complianceDoc`                    | No       | Link to local compliance doc. JSON only.                                          |

## Reviewed Trend Finder Sources

The static declarations live in `scripts/extensions/trend-finder/sources/apify-source-config.ts`.

| Source ID                   | Source role | Quality   | Primary candidate                          | Fallback candidate                     | Max items | Charge cap | Compliance doc                                            |
| --------------------------- | ----------- | --------- | ------------------------------------------ | -------------------------------------- | --------- | ---------- | --------------------------------------------------------- |
| `github-ai-repositories`    | developer   | primary   | `automation-lab/github-trending-scraper`   | `crawlerbros/github-repo-intelligence` | 10        | none       | `docs/sources/source-compliance-github.md`                |
| `reddit-ai-discussions`     | discussion  | community | `clearpath/reddit-search-scraper`          | `iskander/fast-reddit-scraper`         | 5         | `$0.03`    | `docs/sources/source-compliance-reddit.md`                |
| `arxiv-ai-papers`           | research    | primary   | `easyapi/arxiv-search-scraper`             | `gentle_cloud/arxiv-paper-search`      | 5         | none       | `docs/sources/source-compliance-arxiv.md`                 |
| `producthunt-ai-launches`   | launch      | secondary | `muzafferkadir/product-hunt-leaderboard`   | `thirdwatch/producthunt-scraper`       | 5         | `$0.05`    | `docs/sources/source-compliance-producthunt.md`           |
| `youtube-ai-creator-videos` | creator     | low       | `api-ninja/youtube-search-scraper`         | `powerai/youtube-video-search-scraper` | 8         | none       | `docs/sources/source-compliance-youtube.md`               |
| `rss-ai-news`               | news        | secondary | `xtech/feed-extractor`                     | none recorded                          | 5         | none       | `docs/sources/source-compliance-rss-news.md`              |
| `huggingface-ai-papers`     | research    | primary   | `parseforge/huggingface-papers-scraper`    | none recorded                          | 10        | `$0.12`    | `docs/sources/source-compliance-huggingface-papers.md`    |
| `huggingface-ai-models`     | developer   | primary   | `parseforge/hugging-face-model-scraper`    | none recorded                          | 10        | `$0.08`    | `docs/sources/source-compliance-huggingface-models.md`    |
| `reddit-ai-subreddit-feed`  | discussion  | community | `clearpath/reddit-subreddit-posts-scraper` | none recorded                          | 10        | `$0.05`    | `docs/sources/source-compliance-reddit-subreddit-feed.md` |
| `google-ai-news`            | news        | secondary | `thescrappa/google-news-scraper`           | none recorded                          | 10        | `$0.05`    | `docs/sources/source-compliance-google-news.md`           |

Important implementation detail: these declaration values are currently metadata, not runtime defaults. If the runtime source file or inline override lists `github-ai-repositories` without an `input`, the runtime passes `{}` to the Actor. It does not merge the declaration's sample input yet.

## Candidate Research

The 2026-05-17 Apify Store research is recorded in [docs/hackathon/brainstorm-hackathon.md](/ai-os-and-trend-finder-docs/docs/hackathon/brainstorm-hackathon.md#apify-worker-research). The Phase 06 reviewed candidate set is:

* GitHub Trending: `automation-lab/github-trending-scraper`
* Reddit search: `clearpath/reddit-search-scraper`
* arXiv search: `easyapi/arxiv-search-scraper`
* Product Hunt leaderboard: `muzafferkadir/product-hunt-leaderboard`
* YouTube search: `api-ninja/youtube-search-scraper`
* RSS/feed extraction: `xtech/feed-extractor`
* Hugging Face Papers: `parseforge/huggingface-papers-scraper`
* Hugging Face Models: `parseforge/hugging-face-model-scraper`
* Reddit subreddit feed: `clearpath/reddit-subreddit-posts-scraper`
* Google News: `thescrappa/google-news-scraper`

Fallback and rejected candidates are documented in the brainstorm file. The placeholder IDs that looked like official examples, such as `apify/github-public-search`, did not resolve as public Apify Actors during research and are blocked by the Trend Finder source config wrapper.

## Product Path vs Smoke Path

There are two separate ways Apify code can run.

| Path                | Command                                  | What it uses                                                                                      | What it proves                                                                                                       |
| ------------------- | ---------------------------------------- | ------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
| Product aggregation | `bun run aggregate`                      | Extension runner, Trend Finder collector, Trend Finder Apify config, compliance gate, normalizers | Actors can contribute browser-safe Trend Finder evidence and source summaries.                                       |
| Smoke check         | `bun run apify:smoke -- --json`          | Trend Finder Apify config wrapper and client readiness                                            | Env parsing, token readiness, declaration counts, compliance gates, and placeholder blocking without running Actors. |
| Smoke source check  | `bun run apify:smoke -- --check-sources` | Trend Finder Apify config wrapper and `runConfiguredApifyActors`                                  | Configured reviewed Actors can run. It still does not prove Dataset rows normalize cleanly.                          |
| Tiny validation     | `bun run apify:smoke -- --tiny-validate` | Token-gated and explicit-config-gated Actor runs, tiny Dataset reads, redacted shape summaries    | Configured reviewed Actors can run with tiny caps and emit browser-safe field-shape status without raw item output.  |

The smoke source check may consume Apify credits. Treat smoke success as an Actor runtime signal, not proof that every reviewed source will be present in a demo run or that Dataset rows map cleanly into Trend Finder evidence.

For final hackathon certification, live Apify checks are optional. The credential-free path must still pass with fixture/demo data, deterministic fallback labels, blocked/degraded source visibility, and no private artifact tracking.

## Tiny Validation And Fixture Fallback

Session 02 adds two complementary validation paths:

| Path                 | Credentials                              | Output                                                                                                | Status                                                                                                                 |
| -------------------- | ---------------------------------------- | ----------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| Fixture-backed tests | None                                     | Safe hand-authored public shape records under `scripts/extensions/trend-finder/sources/__fixtures__/` | Implemented for all ten reviewed Apify declarations, plus an optional HN note                                          |
| Live tiny validation | `APIFY_TOKEN` and explicit source config | Redacted source status, item count, public URL count, warning codes, and top-level field names only   | Implemented; original six succeeded on 2026-05-17; the 2026-05-27 Google News replacement succeeded with 5 public URLs |
| Demand live sample   | `APIFY_TOKEN` and opt-in demand env      | Internal demand-signal count and redacted warnings only                                               | Implemented for `google-trends-demand`; it remains outside source declarations and browser evidence                    |

The tiny validation command applies:

* Maximum 5 Dataset items per source.
* Maximum 60 seconds per Actor/Dataset step.
* At most 2 attempts for failed or timed-out external calls, with backoff.
* Explicit blocking when token, source config, enabled sources, or safe input requirements are missing.
* No raw Actor input, Dataset item, token value, response body, transcript, comments, prompt payload, account auth, or private provenance in output.

Example:

```bash
bun run apify:smoke -- --tiny-validate --json
```

If token or explicit source config is missing, the command returns blocked per-source summaries instead of failing local tests.

In the dashboard, missing token or missing explicit source config should appear as blocked source provenance, not as a live reviewed-source claim. If the AI runtime is unavailable, generated topics should be labeled as deterministic fallback analysis.

Latest recorded live tiny validation: 2026-05-17 for the original six-source set. GitHub, Reddit, arXiv, Product Hunt, YouTube, and RSS/news all returned `succeeded` with 5 items and 5 public URLs. Reddit and arXiv output shapes included prohibited top-level field names (`body`, `comment`), and the normalizer tests assert those fields are not emitted to browser evidence or analyst payloads. The 2026-05-27 source additions are fixture-backed; Hugging Face Papers, Hugging Face Models, Reddit subreddit feed, and the replacement Google News candidate also passed live tiny validation. The original `easyapi/google-news-scraper` Google News candidate ran successfully but returned zero Dataset items.

## Dashboard Refresh Path

The browser does not directly call Apify.

In dev, some refresh controls post to Vite middleware at `/__refresh_data`. That endpoint calls `startRefresh`, whose default command is:

```
bun run scripts/aggregate.ts
```

`useRefreshLiveData` only invalidates React Query's `["live-data"]` cache after aggregation. It does not run actors by itself. The Trend Finder run control uses the separate `/__run_trend_finder` middleware path, which runs the scoped `scripts/scheduler-runner.ts --job trend-finder` job after saving the Creator Lens through the local bridge; the script-side collector decides whether configured reviewed Apify sources can run.

## Normalization Boundary

Apify Dataset rows are converted into Trend Finder evidence through `apify-normalizers.ts`.

Allowed browser-safe evidence fields:

* Public evidence URL
* Title or name
* Short description, abstract, tagline, or snippet
* Published/created/updated timestamp
* Public source label
* Aggregate metrics such as points, comments, downloads, stars, forks, upvotes, or views

The normalizer refuses Apify API URLs as public evidence URLs and tries to derive canonical public links for GitHub, arXiv, Hugging Face models, Product Hunt, YouTube, and Reddit-style permalinks. Google News rows must already carry publisher URLs; Google redirect and cache URLs are rejected until a safe resolver is reviewed.

Fields intentionally excluded from text selection include:

* Emails
* Author/user profile URLs
* Full bodies
* Comment text
* Reddit `selftext`
* Transcripts
* Contact/profile fields
* Thumbnails and media URLs
* Hugging Face file lists and model-card bodies

Direct Reddit API and direct YouTube API rules are documented for future direct adapters. They are not blanket blockers for the Apify Actor path, but the browser-safe boundary above still applies.

## Quality Caps

Trend Finder applies a stricter cap after generic Apify parsing:

| Quality tier | Max items |
| ------------ | --------- |
| `primary`    | 50        |
| `secondary`  | 35        |
| `community`  | 20        |
| `low`        | 8         |

The hard Trend Finder ceiling is `50` items per source, even though the generic Apify parser allows up to `500`.

## Running It

Minimum readiness check:

```bash
bun run apify:smoke -- --json
```

Without `APIFY_TOKEN`, this exits `0` with `status: "setup-required"`, `readiness.status: "missing-token"`, and `secretValuesPrinted: false`. Missing token is expected in credential-free certification.

Run configured Actors as a smoke check:

```bash
bun run apify:smoke -- --check-sources
```

Run tiny capped validation with redacted shape summaries:

```bash
bun run apify:smoke -- --tiny-validate --json
```

Without `APIFY_TOKEN`, tiny validation exits `0`, reports `tinyValidation.ranLive: false`, and marks reviewed sources as blocked with `reason: "missing-token"`.

Run the product aggregation path:

```bash
VITE_CLAUDE_OS_ENABLED_EXTENSIONS=trend-finder bun run aggregate
```

For a real product run, use `.env.local` or shell env to provide:

```bash
VITE_CLAUDE_OS_ENABLED_EXTENSIONS=trend-finder
APIFY_TOKEN=<your-token>
FINDTREND_APIFY_ACTORS_PATH=data/trend-finder.apify-actors.json
```

## Starter All-Source JSON

The committed starter config lives in `data/trend-finder.apify-actors.example.json`. Copy it to `data/trend-finder.apify-actors.json` and edit that private file for local runs. Treat it as a reviewed starter config, not proof that every source will run in every demo. A credentialed demo still needs `APIFY_TOKEN`, explicit config, and fresh tiny capped validation after changing Actor IDs or inputs.

```json
{
  "sources": [
    {
      "id": "github-ai-repositories",
      "actorId": "automation-lab/github-trending-scraper",
      "role": "trend-source",
      "enabled": true,
      "input": {
        "language": "typescript",
        "since": "weekly",
        "maxResults": 10
      },
      "memoryMb": 4096,
      "timeoutSecs": 120,
      "maxItems": 10
    },
    {
      "id": "reddit-ai-discussions",
      "actorId": "clearpath/reddit-search-scraper",
      "role": "trend-source",
      "enabled": true,
      "input": {
        "query": "AI agents LLM workflow",
        "contentType": "posts",
        "sort": "new",
        "timeFilter": "month",
        "subreddit": "LocalLLaMA",
        "autoDiscoverSubreddits": false,
        "maxResults": 5
      },
      "memoryMb": 1024,
      "timeoutSecs": 120,
      "maxItems": 5,
      "maxTotalChargeUsd": 0.03
    },
    {
      "id": "arxiv-ai-papers",
      "actorId": "easyapi/arxiv-search-scraper",
      "role": "evidence-source",
      "enabled": true,
      "input": {
        "mode": "search",
        "query": "artificial intelligence agents",
        "max_results": 5,
        "sort_by": "submitted_date",
        "sort_order": "descending",
        "request_timeout": 30
      },
      "memoryMb": 4096,
      "timeoutSecs": 120,
      "maxItems": 5
    },
    {
      "id": "producthunt-ai-launches",
      "actorId": "muzafferkadir/product-hunt-leaderboard",
      "role": "discovery-source",
      "enabled": true,
      "input": {
        "productUrls": [],
        "searchTerms": ["AI"],
        "topics": [],
        "dateFrom": "",
        "dateTo": "",
        "limit": 5,
        "delivery": "dataset",
        "webhookUrl": "",
        "dryRun": false
      },
      "memoryMb": 1024,
      "timeoutSecs": 180,
      "maxItems": 5,
      "maxTotalChargeUsd": 0.05
    },
    {
      "id": "youtube-ai-creator-videos",
      "actorId": "api-ninja/youtube-search-scraper",
      "role": "watchlist-source",
      "enabled": true,
      "input": {
        "searchQueries": ["AI agents tutorial"],
        "type": "Video",
        "uploadDate": "ThisMonth",
        "sortBy": "Relevance",
        "maxPage": 1,
        "country": "US"
      },
      "memoryMb": 512,
      "timeoutSecs": 120,
      "maxItems": 8
    },
    {
      "id": "rss-ai-news",
      "actorId": "xtech/feed-extractor",
      "role": "evidence-source",
      "enabled": true,
      "input": {
        "feedUrls": ["https://hnrss.org/newest?q=AI"],
        "maxArticles": 5,
        "fetchFullContent": false
      },
      "memoryMb": 4096,
      "timeoutSecs": 120,
      "maxItems": 5
    },
    {
      "id": "huggingface-ai-papers",
      "actorId": "parseforge/huggingface-papers-scraper",
      "role": "evidence-source",
      "enabled": true,
      "input": {
        "mode": "trending",
        "maxItems": 10
      },
      "memoryMb": 4096,
      "timeoutSecs": 180,
      "maxItems": 10,
      "maxTotalChargeUsd": 0.12
    },
    {
      "id": "huggingface-ai-models",
      "actorId": "parseforge/hugging-face-model-scraper",
      "role": "trend-source",
      "enabled": true,
      "input": {
        "query": "llm",
        "sort": "downloads",
        "direction": "desc",
        "maxItems": 10
      },
      "memoryMb": 4096,
      "timeoutSecs": 180,
      "maxItems": 10,
      "maxTotalChargeUsd": 0.08
    },
    {
      "id": "reddit-ai-subreddit-feed",
      "actorId": "clearpath/reddit-subreddit-posts-scraper",
      "role": "trend-source",
      "enabled": true,
      "input": {
        "subreddits": ["LocalLLaMA", "MachineLearning", "OpenAI"],
        "sort": "rising",
        "maxPostsPerSubreddit": 10,
        "includeComments": false
      },
      "memoryMb": 512,
      "timeoutSecs": 120,
      "maxItems": 10,
      "maxTotalChargeUsd": 0.05
    },
    {
      "id": "google-ai-news",
      "actorId": "thescrappa/google-news-scraper",
      "role": "evidence-source",
      "enabled": true,
      "input": {
        "q": "artificial intelligence",
        "gl": "us",
        "hl": "en",
        "page": 1,
        "so": 1
      },
      "memoryMb": 1024,
      "timeoutSecs": 120,
      "maxItems": 10,
      "maxTotalChargeUsd": 0.05
    }
  ]
}
```

This JSON intentionally uses low item caps and, where needed, low charge caps. Expect Actor inputs to need revalidation after changing Actor IDs because Store metadata is not a guarantee of each Actor's accepted input schema.

## Warning And Failure Meanings

| Warning or state            | Meaning                                                     | Usual fix                                                                                            |
| --------------------------- | ----------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
| `missing-token`             | `APIFY_TOKEN` is absent or placeholder-like.                | Add a real token in `.env.local` or shell env.                                                       |
| `invalid-actors`            | Apify source config is malformed or empty.                  | Fix the JSON source file or inline override.                                                         |
| `actors-config-read-failed` | `FINDTREND_APIFY_ACTORS_PATH` points to an unreadable file. | Fix the path, permissions, or file location.                                                         |
| `actors-config-shadowed`    | Inline `FINDTREND_APIFY_ACTORS` overrides a source file.    | Remove the inline override to use the JSON file.                                                     |
| `missing-source-id`         | A source object has no `id` or `sourceId`.                  | Add the source ID.                                                                                   |
| `missing-actor-id`          | A source object has no `actorId` or `actor`.                | Add the Actor ID.                                                                                    |
| `source-unreviewed`         | Source ID has no static Trend Finder declaration.           | Add a declaration and compliance doc before product use.                                             |
| `source-restricted`         | Source compliance status is not collectable.                | Review the source or mark it intentionally disabled.                                                 |
| `source-disabled`           | Source is disabled by config or compliance gate.            | Set `enabled: true` and confirm compliance is `reviewed`.                                            |
| `placeholder-actor`         | Actor ID is a known unresolved placeholder.                 | Replace it with a reviewed primary or fallback candidate.                                            |
| `actor-timed-out`           | Actor did not finish within source timeout.                 | Increase timeout or reduce input/max items.                                                          |
| `actor-aborted`             | Actor reached an Apify terminal aborted state.              | Check timeout, pricing/charge caps, and Actor input.                                                 |
| `actor-failed`              | Actor rejected input or failed at runtime.                  | Verify the Actor exists and input matches its schema.                                                |
| `dataset-read-timed-out`    | Dataset item read did not finish in time.                   | Reduce `maxItems` or increase timeout.                                                               |
| `missing-public-url`        | Dataset row could not produce a public evidence URL.        | Update field maps or derive a canonical public URL.                                                  |
| `missing-explicit-config`   | Tiny validation has no source file or inline config.        | Provide `data/trend-finder.apify-actors.json`, `FINDTREND_APIFY_ACTORS_PATH`, or an inline override. |
| `no-enabled-sources`        | Tiny validation has no enabled reviewed sources.            | Enable at least one reviewed source.                                                                 |
| `unsafe-source-input`       | Tiny validation input requested prohibited collection.      | Remove full-content, comments, transcripts, profiles, contact, or media options.                     |

## Known Gaps

These are documentation-level facts, not restrictions on adapting the Actors:

* Static declarations are not auto-loaded as runnable sources.
* Static declaration runtime defaults, including sample `input`, are not merged into env-provided sources.
* Actor input schemas and pricing can drift in Apify Store, so rerun live tiny capped validation after Actor ID, input, timeout, cap, or pricing changes.
* The smoke source check applies Trend Finder compliance gates, but still does not prove Dataset row normalization.
* The dashboard run control triggers the aggregate path; the browser still does not call Apify directly.
* Generated HTML report delivery is deferred; the dashboard Brief view is the implemented handoff surface.
* Credential-free demos should use fixture/demo data and deterministic fallback labels instead of claiming live Apify-backed collection.

## Validation Commands

Focused checks for this integration:

```bash
bun run test -- scripts/lib/apify/__tests__
bun run test -- scripts/extensions/trend-finder/sources/__tests__
bun run typecheck:scripts
```

Broader Trend Finder validation:

```bash
bun run test -- scripts/lib/apify/__tests__ scripts/lib/ai-runtime/__tests__ scripts/extensions/trend-finder/__tests__ scripts/extensions/trend-finder/sources/__tests__ src/lib/__tests__/trend-finder-schema.test.ts src/lib/__tests__/trend-finder-dashboard.test.tsx src/lib/__tests__/trend-finder-collector.test.ts
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/apify.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
