Token Economy — Three-Layer Stack

Layer 1 · Front-end

Consumer apps + agentic tooling

Raw text in, raw text out. No token awareness here — just strings.

User pays

$20/mo

or $10–60/M tokens
via usage API

ChatGPT Claude.ai Gemini Cursor Claude Code LangChain / CrewAI API integrations

prompt (text)

HTTP / REST API

response (text)

Layer 2 · Model — 🏭 Token creation

Frontier model as software stack

Tokenizer · context window · safety filters · inference routing · sampling · detokenizer

API charges

$3/M in

$15/M out

Claude Sonnet 4.6

Gross margin

~75–85%

at scale

GPT-4o Claude 3.5 / 4.6 Gemini 2.0 Llama 4 Anthropic API OpenAI API

Tokenizer — where tokens are born

"Hello, world!" →

Hello9906

,11

world1917

!758

Input tokens (prefill)

Parallel · fast · $3/M

Output tokens (decode)

Sequential · slow · $15/M

token IDs

CUDA / RDMA

GPU scheduling

generated tokens

Layer 3 · Back-end — ⚡ Token consumption

GPU cluster infrastructure

Each output token = one autoregressive forward pass. VRAM stores the full KV cache.

Infra earns

$2–10M

per MW · year
(colo to neocloud)

EBITDA margin

80–90%

triple-net leases

KV Cache (VRAM)

72% utilization — full context stored here

Forward pass cost

~6 FLOPS × params

70B model → 420 GFLOPS/tok

Token Factory portfolio — operating at this layer

IREN NBIS APLD WULF GLXY CIFR HUT VNET

$ flow — how one million output tokens travel the value chain

$15

per million output
tokens (API price)

$12

model provider
gross margin

$1–3

GPU compute cost
paid to infra layer

1 MW

≈ 100–200B tokens
per year at full util

Why the factory metaphor holds

Token Factory companies price in megawatts, not tokens — but MW demand scales linearly with token demand. Every model upgrade, agentic loop, and training run translates into more contracted MW. GPUs are the machines. Tokens are the units of output. Electricity is the raw material. The factory never stops.