> For the complete documentation index, see [llms.txt](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/sources/source-compliance-github.md).

# Source Compliance: GitHub

> Implementation-time review completed 2026-05-17. Direct API re-review completed 2026-06-14 for a metadata-only GitHub repository search adapter. The Apify source declaration remains a reviewed fallback.

***

## Source Overview

| Field             | Value                                                                                           |
| ----------------- | ----------------------------------------------------------------------------------------------- |
| Source Name       | GitHub                                                                                          |
| API Base URL      | `https://api.github.com/`                                                                       |
| API Documentation | <https://docs.github.com/en/rest>                                                               |
| Terms             | <https://docs.github.com/en/site-policy/github-terms/github-terms-of-service>                   |
| Acceptable Use    | <https://docs.github.com/en/site-policy/acceptable-use-policies/github-acceptable-use-policies> |
| Authentication    | Optional for public data; recommended via `GITHUB_TOKEN`                                        |
| Data Format       | JSON                                                                                            |
| Adapter Status    | Direct repository search adapter approved; Apify declaration remains fallback                   |

***

## Phase 06 Candidate Declaration

| Field                   | Value                                                                           |
| ----------------------- | ------------------------------------------------------------------------------- |
| Source ID               | `github-ai-repositories`                                                        |
| Primary Apify candidate | `automation-lab/github-trending-scraper`                                        |
| Fallback candidate      | `crawlerbros/github-repo-intelligence`                                          |
| Validation status       | Fixture-backed and live tiny-validated on 2026-05-17; rerun is credential-gated |
| Direct adapter status   | Approved for public repository metadata search with rate-limit handling         |

***

## Terms of Service

GitHub API use is governed by the GitHub Terms of Service, API Terms, Privacy Statement, and Acceptable Use Policies. GitHub allows API access to public data, but prohibits spam use, selling personal information, rank abuse, excessive automated activity, privacy violations, and attempts to evade service limits.

**Key obligations for the configured source and any future direct adapter**:

* Use documented REST or GraphQL API endpoints.
* Prefer authenticated requests through a user-provided token stored outside the browser bundle.
* Do not collect private repository data.
* Do not collect emails, user profiles, or personal contact details.
* Do not use API data for spam, recruiting lists, resale of personal data, or automated engagement manipulation.
* Respect repository licenses and link users back to the canonical repository or release.

***

## Rate Limits

| Parameter                     | Value                                                                                                         |
| ----------------------------- | ------------------------------------------------------------------------------------------------------------- |
| Unauthenticated REST limit    | 60 requests per hour per originating IP                                                                       |
| Authenticated REST limit      | 5,000 requests per hour for normal user token requests                                                        |
| GitHub Actions `GITHUB_TOKEN` | 1,000 requests per hour per repository                                                                        |
| Search endpoints              | More restrictive than general REST limits                                                                     |
| Search unauthenticated limit  | 10 requests per minute                                                                                        |
| Search authenticated limit    | 30 requests per minute for non-code search endpoints                                                          |
| Secondary limits              | Include concurrency and per-minute point ceilings                                                             |
| Required headers              | `x-ratelimit-limit`, `x-ratelimit-remaining`, `x-ratelimit-used`, `x-ratelimit-reset`, `x-ratelimit-resource` |

A direct adapter must read rate-limit headers, avoid concurrent fan-out, use conditional requests where practical, and stop on `403` or `429` until the documented reset or retry window passes. Continuing to request after rate-limit responses is not approved.

## Direct Adapter Approval

The 2026-06-14 re-review approves a direct adapter only under these conditions:

* Use `GET https://api.github.com/search/repositories` for public repository metadata. Do not use code search, issue search, comments, profile endpoints, traffic endpoints, clone/download endpoints, or write endpoints.
* Keep the stable source ID `github-ai-repositories`, source role `developer`, and quality tier `primary`.
* Build bounded search queries from reviewed keyword-window terms, with `sort=stars`, `order=desc`, and a small `per_page` cap.
* Use `GITHUB_TOKEN` only from the script environment when present. Do not add a browser env var, setup control, generated data field, trace field, or log entry for the token.
* Normalize only public repository name, canonical `html_url`, description, stars, forks, open issues, language/topics, and public created/updated/pushed timestamps.
* Exclude private repositories, emails, profile bios, follower lists, raw issue or comment bodies, code contents, clone URLs, raw API URLs, raw headers, and raw JSON from browser payloads, traces, and logs.
* Return disabled or degraded readiness before collection if compliance status is not reviewed or if the endpoint is rate limited, timed out, malformed, or empty.
* Preserve the Apify source declaration as the fallback if the direct adapter is blocked or produces no usable reviewed evidence.
* Emit a zero-cost public API spend label for direct rows.

***

## Data Collection Boundary

Approved for the current configured metadata-only source boundary:

| Data Element                 | API Field                               | Stored As              | PII Risk                  |
| ---------------------------- | --------------------------------------- | ---------------------- | ------------------------- |
| Repository name              | `full_name`                             | Evidence title/context | Low; contains owner login |
| Repository URL               | `html_url`                              | Evidence URL           | Low                       |
| Description                  | `description`                           | Evidence summary       | Low; user-provided text   |
| Stars                        | `stargazers_count`                      | Relevance input        | None                      |
| Forks                        | `forks_count`                           | Relevance input        | None                      |
| Open issues                  | `open_issues_count`                     | Relevance input        | None                      |
| Created/updated/pushed dates | `created_at`, `updated_at`, `pushed_at` | Timing signals         | None                      |
| Topics/language              | `topics`, `language`                    | Topic hints            | None                      |
| Release title/date           | release fields                          | Evidence item          | Low                       |

**Not approved**: user emails, private repositories, private issues, user profile bios, follower lists, raw issue/comment bodies, code contents, clones, traffic analytics, or write operations.

***

## Data Retention

| Policy               | Value                                                                                         |
| -------------------- | --------------------------------------------------------------------------------------------- |
| Storage location     | `src/data/live-data.json` and private cache snapshots                                         |
| Retention period     | `live-data.json` overwritten on each collection run; snapshots retained locally until deleted |
| Historical retention | Local snapshots only; no reviewed GitHub historical source window support                     |
| Deletion path        | Delete generated Trend Finder data and snapshots                                              |
| Backup               | None                                                                                          |

***

## Phase 14 Historical Window Stance

| Field                  | Value                                                                                              |
| ---------------------- | -------------------------------------------------------------------------------------------------- |
| Historical support     | Current-only                                                                                       |
| Source ID              | `github-ai-repositories`                                                                           |
| Safe override fields   | None                                                                                               |
| Unsupported reason     | The reviewed Actor input uses the relative `since` field and has no bounded start/end date fields. |
| Compliance declaration | `historicalWindowSupport.supported = false`                                                        |

Do not map a requested historical window to `since`. That field is a current relative trend period, not a reviewed bounded historical collection window. This stance does not change the existing metadata-only current collection boundary and does not approve persistent historical storage.

***

## Privacy and GDPR Assessment

| Criterion                 | Status               | Notes                                       |
| ------------------------- | -------------------- | ------------------------------------------- |
| PII collected             | Planned minimal      | Public owner/repo login may identify people |
| User consent needed       | No app users         | Public repository metadata only             |
| Data subject rights       | Deletion path exists | Delete generated Trend Finder data/cache    |
| Cross-border transfer     | N/A                  | Data remains local                          |
| Data processor agreements | N/A                  | No third-party processing by Trend Finder   |
| Legitimate interest basis | Possible             | Public trend analysis for self-use          |

If collecting repository owner names is not needed for ranking or attribution, the adapter should store only repository display names and canonical URLs.

***

## Attribution

When GitHub-sourced evidence is displayed, the UI must show:

* Source identifier: `github-ai-repositories`
* Source name: "GitHub AI repositories"
* Link to the canonical repository, release, issue, or pull request
* Repository owner/name when needed for disambiguation

***

## Implementation Notes

* Phase 06 Session 01 declares `github-ai-repositories` with reviewed primary and fallback Apify Actor candidates.
* Phase 06 Session 02 adds fixture-backed normalizer coverage for public repo URLs, titles, pushed timestamps, stars, forks, language shape fields, and star gain metrics. The 2026-05-17 live tiny validation returned 5 items and 5 public URLs for the configured source.
* The Apify source JSON file or inline override may provide an Actor for this source, but collection still requires `APIFY_TOKEN` in the script environment and must avoid known unresolved placeholder Actor IDs.
* Normalized evidence must link to canonical public GitHub URLs, not Apify Actor or Dataset URLs.
* A future direct GitHub adapter must add conditional request handling and rate-limit header tests before replacing the Apify path.

***

## Compliance Checklist

* [x] Terms and acceptable use reviewed
* [x] Rate limits documented
* [x] Data retention policy drafted
* [x] PII minimization boundary drafted
* [x] Attribution requirements documented
* [x] Implementation-time terms re-review completed on 2026-05-17
* [x] Configured source compliance gate documented
* [x] Phase 06 primary and fallback Apify candidates recorded
* [x] Phase 06 fixture-backed normalizer validation completed
* [x] Live tiny capped Phase 06 Actor/Dataset validation completed on 2026-05-17
* [x] Phase 14 historical-window stance recorded as current-only
* [x] Direct API adapter approval re-reviewed on 2026-06-14
* [x] Direct adapter requirements record search rate limits and PII exclusions
* [ ] Direct adapter tests prove PII exclusions

***

*This document must be reviewed again before adding a direct GitHub API adapter or broadening collected fields.*


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://ai-os-and-trend-finder.gitbook.io/ai-os-and-trend-finder-docs/docs/sources/source-compliance-github.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
