> For the complete documentation index, see [llms.txt](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/.spec_system/archive/phases/phase_28/session_14_direct_first_party_source_adapters.md).

# Session 14: Direct First-Party Source Adapters

**Session ID**: `phase28-session14-direct-first-party-source-adapters` **Status**: Complete **Estimated Tasks**: 25 **Estimated Duration**: 2-4 hours

***

## Objective

Add direct first-party API adapters for arXiv, GitHub search, and RSS (and add keyword-query capability to HN via the Algolia search endpoint), each gated on re-reviewed compliance docs, to remove recurring Apify spend for those roles and reduce Actor-availability risk -- while keeping the Apify declarations as reviewed fallbacks and preserving spend-accounting labels (as zero-cost). This is a compliance-gated collection change; the compliance review IS the first deliverable.

***

## Source Mapping (from the Trends-Finderz improvement map, section 9)

Trends-finderz collects from free first-party endpoints with small adapters, where Trend Finder runs paid Apify Actors for the same roles:

| Connector | Example adapter (full path)                                                        | Endpoint                                  | Trend Finder today                          |
| --------- | ---------------------------------------------------------------------------------- | ----------------------------------------- | ------------------------------------------- |
| arXiv     | `/home/aiwithapex/projects/aios/EXAMPLES/trends-finderz/lib/sources/arxiv.ts`      | `export.arxiv.org/api/query` (free)       | `arxiv-ai-papers` via Apify Actor           |
| GitHub    | `/home/aiwithapex/projects/aios/EXAMPLES/trends-finderz/lib/sources/github.ts`     | `api.github.com/search/repositories`      | `github-ai-repositories` via Apify Actor    |
| HN        | `/home/aiwithapex/projects/aios/EXAMPLES/trends-finderz/lib/sources/hackernews.ts` | `hn.algolia.com/api/v1/search_by_date`    | Built-in top-stories adapter (Firebase API) |
| RSS       | `/home/aiwithapex/projects/aios/EXAMPLES/trends-finderz/lib/sources/rss.ts`        | Direct feed fetch                         | `rss-ai-news` via Apify Actor               |
| YouTube   | `/home/aiwithapex/projects/aios/EXAMPLES/trends-finderz/lib/sources/youtube.ts`    | Official Data API, env-gated, off default | `youtube-ai-creator-videos` via Apify Actor |

* **Improvement (candidate, not a free win):** direct adapters for arXiv, GitHub search, and RSS would remove recurring Apify spend for three roles, reduce Actor-availability risk, and add keyword-query capability that the built-in HN adapter lacks (the Algolia search endpoint accepts queries; the current adapter fetches top stories and title-filters). Each adapter is \~150-260 lines in trends-finderz including retries and timeouts.
* **Constraints:** collection-path changes are compliance changes. The reviewed docs (`docs/sources/source-compliance-arxiv.md`, `docs/sources/source-compliance-github.md`, `docs/sources/source-compliance-rss-news.md`, `docs/sources/source-compliance-hackernews.md`) were written for the current paths and must be re-reviewed for direct-API rate limits, terms, and retention before implementation (`docs/sources/apify-source-onboarding.md` describes the gate). Spend accounting must keep labeling these sources (as zero-cost) rather than dropping them from the spend table. The Apify declarations stay as reviewed fallbacks.
* **Implement in:** `scripts/extensions/trend-finder/sources/` (one adapter file
  * tests each, following `scripts/extensions/trend-finder/sources/hn-adapter.ts`); registration in `scripts/extensions/trend-finder/collector.ts`; health rows already generalize (`scripts/extensions/trend-finder/sources/health.ts`).

***

## Scope

### In Scope (MVP)

* Re-review and update `source-compliance-arxiv.md`, `source-compliance-github.md`, `source-compliance-rss-news.md`, and `source-compliance-hackernews.md` for direct-API rate limits, terms, and retention (gating prerequisite -- no network code merges before this)
* Direct arXiv, GitHub search, and RSS adapters following `hn-adapter.ts`, with retries and timeouts, registered in the collector
* HN Algolia keyword-query capability (search endpoint instead of top-stories title-filter)
* Connector-readiness checks modeled on trends-finderz scan orchestration: disabled/degraded/ready labels, timeout/retry diagnostics, and source health rows before a direct path can feed the collector
* Spend accounting keeps these sources labeled (as zero-cost), not dropped
* Apify declarations preserved as reviewed fallbacks; health rows generalize
* Tests: each adapter (normalization, retries, timeouts), spend labeling, fallback-to-Apify behavior, connector readiness, and no activation after collection starts

### Out of Scope

* YouTube direct adapter (env-gated, off by default in trends-finderz; not pursued here)
* Removing the Apify declarations (they remain reviewed fallbacks)
* New roles or sources beyond the four with existing compliance docs

### Deferral Note

If compliance re-review surfaces blockers (rate-limit terms, retention conflicts), this session may split per source or defer specific adapters to a future compliance-first source-expansion phase, recording the decision in the phase PRD rather than shipping an unreviewed path. The scaffolding and any cleared adapters still land; the rest are explicitly deferred.

***

## Prerequisites

* [x] Session 13 completed (keyword packs supply the query terms direct adapters will use)
* [x] Source compliance docs re-reviewed for the four direct-API paths (`docs/sources/source-compliance-*.md`, `docs/sources/apify-source-onboarding.md`)

## Implementation Preflight

Recorded 2026-06-14 before direct-source network code changes.

| Check               | Result                                                                                                              |
| ------------------- | ------------------------------------------------------------------------------------------------------------------- |
| Active session      | `phase28-session14-direct-first-party-source-adapters`                                                              |
| Dependency sessions | Session 13 and required prior sessions are complete in `.spec_system/state.json`                                    |
| Environment         | Spec-system prerequisites pass; Bun 1.3.14 is available                                                             |
| Project tools       | TypeScript, Vitest, and Zod are installed in `node_modules` and available through `bun x` / `bun run`               |
| Monorepo scope      | Spec-system analysis reports non-monorepo; no package-specific scope applies                                        |
| Compliance gate     | arXiv, GitHub, RSS/news, HN, and Apify onboarding docs must be updated before direct network adapters are activated |
| Database scope      | No database schema, migration, seed, or persisted data-shape change is in scope                                     |

***

## Deliverables

1. Updated compliance docs for arXiv, GitHub, RSS, and HN direct-API paths
2. Direct arXiv, GitHub, and RSS adapters with retries/timeouts and tests
3. HN Algolia keyword-query capability
4. Connector-readiness and source-health reporting for direct vs Apify paths
5. Spend-accounting labels preserved; Apify declarations kept as fallbacks
6. Per-adapter tests plus a recorded decision for any deferred adapter

***

## Success Criteria

* [x] No direct-API network code merges before the matching compliance doc is re-reviewed and updated
* [x] arXiv, GitHub, and RSS collect via direct first-party APIs with retries and timeouts; HN supports keyword queries
* [x] Spend accounting still labels these sources (as zero-cost); they are not dropped from the spend table
* [x] Connector readiness reports disabled/degraded/ready states before collection, and no source activation happens after collection starts
* [x] Apify declarations remain as reviewed fallbacks and health rows render for both paths
* [x] Any adapter blocked by compliance review is explicitly deferred with a recorded decision, not shipped unreviewed
* [x] Existing collector and spend tests pass


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/.spec_system/archive/phases/phase_28/session_14_direct_first_party_source_adapters.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
