← All Perspectives Philosophy Team Strategy Get in Touch
Layer 1 · Front-end
Consumer apps + agentic tooling
Raw text in, raw text out. No token awareness here — just strings.
User pays
$20/mo
or $10–60/M tokens
via usage API
ChatGPT Claude.ai Gemini Cursor Claude Code LangChain / CrewAI API integrations
prompt (text)
HTTP / REST API
response (text)
Layer 2 · Model — 🏭 Token creation
Frontier model as software stack
Tokenizer · context window · safety filters · inference routing · sampling · detokenizer
API charges
$3/M in
$15/M out
Claude Sonnet 4.6
Gross margin
~75–85%
at scale
GPT-4o Claude 3.5 / 4.6 Gemini 2.0 Llama 4 Anthropic API OpenAI API
Tokenizer — where tokens are born
"Hello, world!"
Hello9906
,11
world1917
!758
Input tokens (prefill)
Parallel · fast · $3/M
Output tokens (decode)
Sequential · slow · $15/M
token IDs
CUDA / RDMA
GPU scheduling
generated tokens
Layer 3 · Back-end — ⚡ Token consumption
GPU cluster infrastructure
Each output token = one autoregressive forward pass. VRAM stores the full KV cache.
Infra earns
$2–10M
per MW · year
(colo to neocloud)
EBITDA margin
80–90%
triple-net leases
KV Cache (VRAM)
72% utilization — full context stored here
Forward pass cost
~6 FLOPS × params
70B model → 420 GFLOPS/tok
Token Factory portfolio — operating at this layer
IREN NBIS APLD WULF GLXY CIFR HUT VNET
$ flow — how one million output tokens travel the value chain
$15
per million output
tokens (API price)
$12
model provider
gross margin
$1–3
GPU compute cost
paid to infra layer
1 MW
≈ 100–200B tokens
per year at full util
Why the factory metaphor holds
Token Factory companies price in megawatts, not tokens — but MW demand scales linearly with token demand. Every model upgrade, agentic loop, and training run translates into more contracted MW. GPUs are the machines. Tokens are the units of output. Electricity is the raw material. The factory never stops.