> For the complete documentation index, see [llms.txt](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/sources/apify-source-onboarding.md).

# Adding Apify Sources To Trend Finder

Last reviewed: 2026-06-14

This document records what it takes to add another Trend Finder source when a suitable Apify Actor exists. The Actor is only one part of the work. Trend Finder treats source identity, compliance review, runtime configuration, normalization, and browser-safe evidence as separate gates.

For the full integration map, see [Apify Integration](/ai-os-and-trend-finder-docs/docs/apify.md). For current Apify platform semantics, verify against:

* [Running Actors](https://docs.apify.com/platform/actors/running)
* [Actor client call behavior](https://docs.apify.com/api/client/js/reference/class/ActorClient#call)
* [Actor call options](https://docs.apify.com/api/client/js/reference/interface/ActorCallOptions)
* [Dataset list item options](https://docs.apify.com/api/client/js/reference/interface/DatasetClientListItemOptions)
* [Dataset storage](https://docs.apify.com/platform/storage/dataset)
* [Runs, statuses, and retention](https://docs.apify.com/platform/actors/running/runs-and-builds)

## Current Source Path

The normal product path is:

```
bun run aggregate
  -> scripts/aggregate.ts
  -> scripts/lib/extensions/runner.ts
  -> scripts/extensions/trend-finder/collector.ts
  -> scripts/extensions/trend-finder/sources/apify-adapter.ts
  -> scripts/lib/apify/client.ts
  -> scripts/lib/apify/actors.ts
  -> scripts/lib/apify/datasets.ts
  -> scripts/lib/apify/normalize.ts
  -> scripts/extensions/trend-finder/sources/apify-normalizers.ts
  -> src/data/live-data.json
```

The browser never calls Apify directly. The browser run control triggers the local aggregate path, and the script-side collector decides whether configured reviewed sources can run.

## Direct First-Party Adapter Gate

Some reviewed source roles can now run through direct first-party public endpoints before Apify is considered:

| Source ID                | Direct path                                                   | Apify stance               |
| ------------------------ | ------------------------------------------------------------- | -------------------------- |
| `arxiv-ai-papers`        | arXiv Atom API metadata                                       | Reviewed fallback retained |
| `github-ai-repositories` | GitHub REST repository search                                 | Reviewed fallback retained |
| `rss-ai-news`            | Reviewed public RSS/Atom feed URLs                            | Reviewed fallback retained |
| `hackernews`             | HN Algolia keyword search, with Firebase top stories fallback | No Apify declaration       |

A direct path can run only when the matching compliance document records a reviewed direct adapter boundary. The collector resolves direct readiness before network collection starts and freezes that decision for the run. A source that is unreviewed, disabled, compliance-blocked, misconfigured, rate limited, timed out, malformed, empty, or otherwise degraded without usable evidence must not silently disappear.

Fallback rules:

* If a direct source succeeds with usable reviewed evidence, the matching Apify source ID is skipped for that run to prevent duplicate collection and paid duplicate spend.
* If a direct source is disabled, blocked, offline, rate limited, timed out, or degraded without usable evidence, the reviewed Apify declaration remains eligible when Apify credentials and private source config are available.
* If Apify is skipped because direct collection succeeded, the source summary must record a bounded skipped-fallback diagnostic rather than an Actor run.
* If Apify is unavailable because `APIFY_TOKEN` or private source config is missing, the direct source row and the Apify fallback status must both stay visible as bounded source setup/readiness labels.

Direct source rows are zero-cost public API rows. Spend summaries must keep those sources visible with `provider: "public-api"` and `state: "not-applicable"` instead of dropping them from the run spend table. Browser and trace output may show source ID, direct readiness status, fallback policy, bounded diagnostics, item counts, and safe spend labels. They must not show raw API responses, tokens, private config paths, Actor IDs, Dataset IDs, raw rate limit headers, raw XML/JSON payloads, stack traces, or credential-shaped text.

## Hard Requirements

1. Verify the current Apify Actor ID, default build/tag, input schema, output shape, and pricing. Actor builds can change input/output behavior, and Trend Finder does not currently pass a pinned `build` option to `actor.call`. Actor inputs, result fields, and paid-run behavior can change in the Apify Store.
2. Add a static source declaration in `scripts/extensions/trend-finder/sources/apify-source-config.ts`.
3. Add or update a source compliance document under `docs/sources/`.
4. Add runnable source config to the gitignored `data/trend-finder.apify-actors.json` file or another file selected by `FINDTREND_APIFY_ACTORS_PATH`; declarations do not auto-run.
5. If the source should be part of the committed starter set, add a sanitized entry to `data/trend-finder.apify-actors.example.json`.
6. Ensure the Actor output normalizes into browser-safe Trend Finder evidence.
7. Add fixture-backed tests for source metadata and normalization behavior.
8. If reviewed declarations or the starter example change, sync the committed reference docs after fixture-backed normalization passes: `docs/apify.md` and `docs/extensions/trend-finder/sources.md`.
9. Run source, smoke, and aggregate validation.

## Static Declaration

Each new Apify source needs an entry in `TREND_FINDER_APIFY_SOURCE_DECLARATIONS` with:

* Stable source `id`.
* Concrete `actorId`.
* Apify role: `trend-source`, `evidence-source`, `watchlist-source`, or `discovery-source`.
* Safe sample `input`. This is reviewed metadata only; it is not currently merged into runtime config.
* `memoryMb`, `timeoutMs`, `maxItems`, and optional `maxTotalChargeUsd`. Current Apify JavaScript client docs describe `maxTotalChargeUsd` as applying only to pay-per-event Actors.
* Display `name`.
* Trend Finder source role: `developer`, `discussion`, `research`, `launch`, `creator`, or `news`.
* Quality tier: `primary`, `secondary`, `community`, or `low`.
* `complianceStatus: "reviewed"` only after the compliance review is complete.
* `complianceDoc`.
* `historicalWindowSupport`.
* `provenanceLabel`.
* `actorCandidates`, each with `kind`, `actorId`, `label`, `notes`, and `validationStatus`.

Unknown source IDs are forced to `restricted` and disabled by the compliance gate, even if runtime config tries to self-declare them as reviewed. Known placeholder Actor IDs are also disabled before collection.

`historicalWindowSupport.complianceDocPath`, when present, must match the source `complianceDoc`. Supported historical declarations need non-empty `safeOverrideFields`, `minWindowDays`, and `maxWindowDays`.

## Runtime Configuration

Three switches must line up before a configured Actor runs in the product path:

```bash
VITE_CLAUDE_OS_ENABLED_EXTENSIONS=trend-finder
APIFY_TOKEN=<real-token>
FINDTREND_APIFY_ACTORS_PATH=data/trend-finder.apify-actors.json
```

Copy `data/trend-finder.apify-actors.example.json` to the gitignored private `data/trend-finder.apify-actors.json` file and edit the JSON there. An explicit `FINDTREND_APIFY_ACTORS_PATH` is preferred. If the path env var is omitted, the loader checks the default `data/trend-finder.apify-actors.json` path only after some Apify env is configured.

The source JSON may be an array or an object with a `sources` or `actors` array. Accepted per-source fields are:

| Field                              | Required | Notes                                                                                      |
| ---------------------------------- | -------- | ------------------------------------------------------------------------------------------ |
| `id` or `sourceId`                 | Yes      | Must match a reviewed static declaration to run.                                           |
| `actorId` or `actor`               | Yes      | Apify Actor ID, usually `username/actor-name`.                                             |
| `role`                             | No       | One of `trend-source`, `evidence-source`, `watchlist-source`, `discovery-source`.          |
| `enabled`                          | No       | Defaults to `true`; compliance can still force it off.                                     |
| `input`                            | No       | JSON object passed directly to `actor.call`; non-objects become `{}`.                      |
| `memoryMb` or `memory`             | No       | Source memory override in MB.                                                              |
| `timeoutSecs` or `timeout`         | No       | Source timeout override in seconds.                                                        |
| `maxItems` or `maxItemsPerSource`  | No       | Dataset read cap before Trend Finder quality caps.                                         |
| `maxTotalChargeUsd`                | No       | Optional Actor charge cap; current Apify docs say it applies only to pay-per-event Actors. |
| `name` or `sourceName`             | No       | Trend Finder display override. JSON only.                                                  |
| `sourceRole` or `trendRole`        | No       | Trend Finder role override. JSON only.                                                     |
| `qualityTier` or `quality`         | No       | Trend Finder quality override. JSON only.                                                  |
| `complianceStatus` or `compliance` | No       | Cannot self-approve unknown sources as `reviewed`.                                         |
| `complianceDoc`                    | No       | Local compliance doc path. JSON only.                                                      |

Default parser bounds are:

| Setting             | Default | Bounds         |
| ------------------- | ------- | -------------- |
| Memory              | 1024 MB | 128-32768 MB   |
| Timeout             | 120 s   | 1-900 s        |
| Generic `maxItems`  | 50      | 1-500          |
| `maxTotalChargeUsd` | none    | 0-25, optional |

Trend Finder then clamps items again by source quality tier, with a hard 50-item ceiling.

Compact `source-id=actor-id` entries remain available through the legacy `FINDTREND_APIFY_ACTORS` inline override, but they are useful only for quick checks because they cannot provide Actor input, source caps, charge caps, or metadata overrides. Inline JSON remains supported for CI or one-off shell runs.

Important implementation detail: static declarations are metadata, not runtime defaults. If the runtime source config lists a source without `input`, the Actor receives `{}`.

## Local Source Setup Rules

The local Sources tab can edit reviewed Apify source targets through `/__trend_finder_source_setup`. This bridge is loopback-only, uses the per-dev-server refresh token, bounds JSON bodies, validates mutation schemas, and writes the private config atomically.

The setup bridge may mutate only:

* `enabled` for reviewed source IDs
* reviewed target fields inside `input`

It must reject unknown source IDs, unreviewed source IDs, known placeholder Actor IDs, unreviewed fields, malformed JSON, oversized bodies, invalid tokens, non-loopback requests, private or local URLs, credential-shaped text, and values outside the declared target contract.

Source Setup writes preserve parser-supported fields such as `actorId`, `role`, `memoryMb`, `timeoutSecs`, `maxItems`, `maxTotalChargeUsd`, and metadata overrides. Credentials are never written by the UI; operators still manage `APIFY_TOKEN` in `.env.local` or the shell.

Do not add a raw Actor-input editor. Any new editable field requires a static source setup declaration, a compliance doc update, fixture coverage, bridge validation, and docs describing the browser-safe target boundary.

## Phase 28 Keyword-Pack Review

Reviewed keyword packs are approved as committed Trend Finder source metadata, not as user-authored Actor input. The Phase 28 keyword-pack path may compile reviewed category terms into only these public query target fields when the field is declared on a reviewed source setup target:

| Field           | Accepted shape | Notes                          |
| --------------- | -------------- | ------------------------------ |
| `query`         | string         | Public search query text only. |
| `q`             | string         | Public search query text only. |
| `searchQuery`   | string         | Public search query text only. |
| `searchTerms`   | string array   | Public search terms only.      |
| `searchQueries` | string array   | Public search queries only.    |

The compiler must reject all other Actor input fields for keyword injection, including feeds, URLs, pagination, sorting, locale, date windows, credentials, tokens, account identifiers, private paths, and provider-specific advanced filters. Existing reviewed Source Setup fields remain editable only through the loopback setup bridge and its target schemas.

Keyword compilation is script-side only. Operators may select the run mode with bounded environment or scheduler configuration, but browsers must not receive a raw Actor input editor or a free-text keyword surface. Invalid mode or category values must fall back to the default balanced scan and produce a warning rather than widening collection.

Compiled keyword windows must stay cost-neutral. They must respect the source declaration item caps, quality-tier caps, optional charge ceilings, spend labels, and the existing compliance gate. Static declarations are cloned before runtime terms are applied so reviewed metadata is not mutated by collection runs.

Browser and trace output may show bounded coverage counts, category labels, source cap usage, and warning labels. They must not expose raw Actor input, Actor IDs, Dataset IDs, run IDs, private config paths, credentials, token-shaped strings, provider responses, or raw Dataset rows.

## Metric-Only Apify Enrichment

Some Apify Actors are useful as internal scoring signals but do not produce durable public evidence URLs. Do not add those Actors to `TREND_FINDER_APIFY_SOURCE_DECLARATIONS`, starter source JSON, source summaries, source diversity, source breakdowns, analyst evidence, or browser metric chips unless they can satisfy the normal browser-safe evidence contract.

Current precedent: `google-trends-demand` is a default-off, opt-in live collector gated by `FINDTREND_GOOGLE_TRENDS_DEMAND_ENABLED`. It has its own compliance doc, strict source-level caps, focused tests, and live validation, but it emits internal demand signals only into scoring. Empty Dataset rows are handled as non-fatal enrichment warnings.

If a future Apify-backed enrichment follows this pattern, give it explicit env keys, a compliance document, fixture coverage, redacted warnings, and source-specific live validation. Do not emit raw Dataset rows, Actor input, Actor IDs, run IDs, Dataset IDs, Apify URLs, Google Trends URLs, or other non-evidence provenance into browser data.

## Apify API Boundary

Current code calls:

```ts
client.actor(source.actorId).call(source.input, {
  memory: source.memoryMb,
  timeout: timeoutSeconds(source),
  waitSecs: timeoutSeconds(source),
  ...(source.maxTotalChargeUsd === undefined
    ? {}
    : { maxTotalChargeUsd: source.maxTotalChargeUsd }),
  log: null,
});
```

After a `SUCCEEDED` run with `defaultDatasetId`, the Dataset reader calls:

```ts
client.dataset(datasetId).listItems({
  limit,
  offset: 0,
  desc: false,
  clean: true,
  skipEmpty: true,
  skipHidden: true,
});
```

Do not treat the source `maxItems` field as an Apify Actor output guarantee. The Apify JavaScript client documents Actor-call `maxItems` as a paid Dataset item charge cap for pay-per-result Actors, not as a result limit. Trend Finder currently uses source `maxItems` as the Dataset read and normalization cap, and does not pass `maxItems` to `actor.call`.

Likewise, `maxTotalChargeUsd` is passed through only when configured, but the current Apify JavaScript client documents that option as a maximum-cost guard for pay-per-event Actors. Verify the Actor pricing model before relying on either Apify billing cap.

`clean`, `skipEmpty`, and `skipHidden` help remove empty rows and Apify hidden fields, but they are not a privacy boundary. The Trend Finder normalizer still has to reject private, high-risk, or non-browser-safe fields.

Apify currently retains the ten most recent Actor runs indefinitely and deletes older runs plus their default storages after the plan-dependent data retention period. Do not use Apify run URLs, Dataset IDs, or Apify-hosted URLs as durable browser evidence.

## Normalization Boundary

Actor Dataset rows must produce:

* Public evidence URL.
* Title or name.
* Short description, abstract, tagline, or snippet.
* Published, created, or updated timestamp when available.
* Public source label.
* Optional public aggregate metrics such as points, comments, downloads, stars, forks, upvotes, star gain, or views.

Downloads are a first-class public metric and must remain separate from views. Do not map model or package download counts into `views`.

The normalizer intentionally rejects Apify-hosted URLs as evidence URLs. It also excludes private or high-risk text fields such as emails, author/profile URLs, full bodies, comment text, Reddit `selftext`, transcripts, contact fields, and media download URLs. It can derive canonical public URLs for known shapes such as GitHub repositories, arXiv IDs, Product Hunt slugs, YouTube video IDs, and Reddit permalinks.

If the Actor uses unusual output fields, update `scripts/extensions/trend-finder/sources/apify-normalizers.ts` with a narrow field map or public URL derivation. Do not emit raw Dataset provenance, Actor input, Dataset IDs, run IDs, private paths, tokens, prompts, full source dumps, or private telemetry into browser data.

## Source Roles And Complexity

Prefer mapping a new source into an existing source role unless a new product category is truly needed.

Existing roles are:

* `developer`
* `discussion`
* `research`
* `launch`
* `creator`
* `news`

Adding a source in an existing role is usually a small to medium change. Adding a new source role is larger because it affects browser schemas, AI analyst validation, scoring weights, source breakdown ordering, Engine Replay models, fixtures, labels, and UI rendering.

## Quality Caps

Trend Finder applies item caps by quality tier after generic source parsing:

| Quality Tier | Item Cap |
| ------------ | -------- |
| `primary`    | 50       |
| `secondary`  | 35       |
| `community`  | 20       |
| `low`        | 8        |

The hard Trend Finder ceiling is 50 items per source even though the generic Apify parser accepts higher values. The low-quality source cap is intentionally strict because low-quality sources also have a scoring cap per topic.

## Historical Support

Current-run collection and historical backtests are separate paths.

Historical support needs an explicit `historicalWindowSupport` declaration. A supported declaration must list exactly which input fields are safe to override and the reviewed min/max window size. Current supported source behavior is bounded `dateFrom` and `dateTo` style overrides for `producthunt-ai-launches`.

If the Actor has no bounded historical input, declare it current-only with `supported: false` and a clear `unsupportedReason`.

The historical adapter rejects unknown source overrides, unsupported sources, invalid windows, and unreviewed override fields before making any Actor call. Historical normalization also requires a published timestamp so old evidence cannot silently become current evidence.

A date sort, ranking mode, or relative recency filter is not bounded historical support by itself. Treat those modes as current-only unless a source-specific review approves explicit start/end window fields.

Do not translate historical windows into relative filters unless a source-specific compliance review approves that behavior.

## Tests To Update

Add focused coverage in:

* `scripts/extensions/trend-finder/sources/__tests__/apify-source-config.test.ts`
* `scripts/extensions/trend-finder/sources/__fixtures__/apify-normalizer-fixtures.ts`
* `scripts/extensions/trend-finder/sources/__tests__/apify-normalizers.test.ts`

Add adapter tests only when behavior changes in collection, redaction, historical window handling, or failure states.

Add smoke tests in `scripts/extensions/trend-finder/sources/__tests__/apify-smoke.test.ts` only when CLI readiness, tiny validation, or redaction behavior changes.

## Validation Commands

Run the focused source checks:

```bash
bun run test -- scripts/lib/apify/__tests__ scripts/extensions/trend-finder/sources/__tests__
bun run typecheck:scripts
```

Run credential-safe readiness without running Actors:

```bash
bun run apify:smoke -- --json
```

Run configured Actors only when you intentionally want a paid live source check:

```bash
bun run apify:smoke -- --check-sources --json
```

Run live tiny validation only with a real `APIFY_TOKEN` and explicit source config:

```bash
bun run apify:smoke -- --tiny-validate --json
```

Run the product path:

```bash
VITE_CLAUDE_OS_ENABLED_EXTENSIONS=trend-finder bun run aggregate
```

Tiny validation output is intentionally redacted. It reports source status, item counts, public URL counts, warning codes, and field names, not raw Dataset rows or Actor input.

## Practical Acceptance Criteria

A new source is ready for product use when:

* The source has a static reviewed declaration.
* The compliance doc explains allowed fields, excluded fields, retention, attribution, and historical stance.
* Runtime config includes a concrete Actor ID and safe input.
* The source passes source config tests.
* A representative fixture normalizes into one browser-safe evidence item.
* Private/prohibited fixture fragments do not appear in emitted evidence or analyst payloads.
* Tiny validation succeeds or degrades in an understood, documented way.
* Aggregate output validates and the Sources view shows accurate active, degraded, offline, blocked, or fallback provenance.
* Starter config and reference docs are updated only with sanitized public metadata; real tokens and private runtime files stay out of git.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/sources/apify-source-onboarding.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
