# Free LLM Recommendation API

`/api/free-llm/recommendation` exposes the current recommended free OpenRouter model for general developer usage, and `/api/free-llm/top-models` exposes the ordered shortlist behind that recommendation.

## Purpose

Use this API when you want either:

- a simple daily recommendation for a free OpenRouter-compatible text model plus the stable fallback router
- an ordered JSON shortlist that a cron job or automation can poll once a day

## Endpoints

### `GET /api/free-llm/recommendation`

Returns the current recommendation payload.

Key fields:

- `updatedAt`
- `rankingVersion`
- `probeMode`
- `refreshMode`
- `liteEvalMode`
- `liteEvalSuite`
- `rankingConfidence`
- `rankingConfidenceReason`
- `evalCoverage`
- `baseUrl`
- `createKeyUrl`
- `primary`
- `fallback`
- `alternatives`
- `notes`

When the KV-backed recommendation is unavailable, the endpoint returns the built-in seed payload and sets `X-Free-LLM-Seed: true`.

### `GET /api/free-llm/top-models`

Returns the ordered shortlist that powers the recommendation page.

Key fields:

- `updatedAt`
- `rankingVersion`
- `probeMode`
- `refreshMode`
- `liteEvalMode`
- `liteEvalSuite`
- `rankingConfidence`
- `rankingConfidenceReason`
- `evalCoverage`
- `baseUrl`
- `createKeyUrl`
- `fallback`
- `count`
- `models`
- `notes`

`models[0]` is the current top recommendation. Use `?limit=<n>` to request a smaller slice of the stored shortlist.

Ranked model objects include score components when available:

- `metadataScore`
- `healthScore`
- `latencyScore`
- `liteEvalScore`
- `instabilityPenalty`
- `evalSuite`
- `evalSummary`

`evalSummary` uses `lite-agent-eval-v1`. It reports small practical checks for file-writing, shell-command generation, and a compact symbolic decoding probe. This is a daily heuristic, not an official PinchBench result.

`evalCoverage.unevaluatedCanStillWin` reports whether an unevaluated candidate could still mathematically overtake the current primary with a perfect lite eval score. `evalCoverage.attemptedToday` counts same-day eval attempts, including model-specific rate-limit failures. `rankingConfidence` is `high`, `medium`, or `low` based on that coverage; it can still be `high` after a non-threatening rate-limit event when no unevaluated candidate can overtake the primary.

Scoring is additive with a small penalty for model-specific instability:

```text
score =
  metadataScore
  + healthScore
  + latencyScore
  + liteEvalScore
  - instabilityPenalty
```

`refreshMode` can be `metadataOnly`, `healthOnly`, `fullEval`, or `forceEval`. Daily production refreshes should use `fullEval`; manual refreshes can use `metadataOnly` or `healthOnly` when the goal is to avoid spending free-model chat quota. `forceEval` bypasses preloaded eval cache for an explicit fresh eval run, while still allowing recent cached scores as fallback after account-level rate limits.

### `GET /api/free-llm/health`

Returns freshness information for the current recommendation.

Key fields:

- `ok`
- `lastUpdatedAt`
- `ageHours`
- `maxFreshHours`
- `usingSeedFallback`
- `hasKvBinding`

## Related endpoints

- OpenAPI: `/docs/api/free-llm/openapi.json`
- Ordered shortlist: `/api/free-llm/top-models`
- Status: `/api/status/free-llm`
- Health: `/api/free-llm/health`
- Public page: `/free-llm/`
