Layer 1 · Front-end
Consumer apps + agentic tooling
Raw text in, raw text out. No token awareness here — just strings.
User pays
$20/mo
or $10–60/M tokens
via usage API
via usage API
ChatGPT
Claude.ai
Gemini
Cursor
Claude Code
LangChain / CrewAI
API integrations
HTTP / REST API
Layer 2 · Model — 🏭 Token creation
Frontier model as software stack
Tokenizer · context window · safety filters · inference routing · sampling · detokenizer
API charges
$3/M in
$15/M out
Claude Sonnet 4.6
Gross margin
~75–85%
at scale
GPT-4o
Claude 3.5 / 4.6
Gemini 2.0
Llama 4
Anthropic API
OpenAI API
Tokenizer — where tokens are born
"Hello, world!"
→
Hello9906
,11
world1917
!758
Input tokens (prefill)
Parallel · fast · $3/M
Output tokens (decode)
Sequential · slow · $15/M
CUDA / RDMA
GPU scheduling
Layer 3 · Back-end — ⚡ Token consumption
GPU cluster infrastructure
Each output token = one autoregressive forward pass. VRAM stores the full KV cache.
Infra earns
$2–10M
per MW · year
(colo to neocloud)
(colo to neocloud)
EBITDA margin
80–90%
triple-net leases
KV Cache (VRAM)
72% utilization — full context stored here
Forward pass cost
~6 FLOPS × params
70B model → 420 GFLOPS/tok
Token Factory portfolio — operating at this layer
IREN
NBIS
APLD
WULF
GLXY
CIFR
HUT
VNET
$ flow — how one million output tokens travel the value chain
$15
per million output
tokens (API price)
tokens (API price)
$12
model provider
gross margin
gross margin
$1–3
GPU compute cost
paid to infra layer
paid to infra layer
1 MW
≈ 100–200B tokens
per year at full util
per year at full util
Why the factory metaphor holds
Token Factory companies price in megawatts, not tokens — but MW demand scales linearly with token demand. Every model upgrade, agentic loop, and training run translates into more contracted MW. GPUs are the machines. Tokens are the units of output. Electricity is the raw material. The factory never stops.