Token Generation Speed Visualizer | LLM Performance Demo

The LocalLLaMA community converges on use‑case specific speeds rather than a single "best" number. Rough guide: chat 10–20 t/s; code snippets ≥25–30 t/s; agentic/vibe coding ≥50–70 t/s; "thinking" models ≥100 t/s to avoid long pre‑answer delays; autocomplete ≥60–80 t/s; background jobs can tolerate 1–10 t/s if quality reduces iterations. Prompt‑processing speed also matters (≈150–200+ t/s is preferred) for agentic workflows. See the full discussion for many perspectives below.

Reddit thread: What is the slowest tokens/sec you can live with?

Tip: You can set the default token generation speed by adding ?speed=15 or ?speed=3.4 parameter to the URL, where the number is the desired speed in tokens per second.