$ df -h ./hardware

Pick the rig that runs the work

Bandwidth, not capacity, sets your tokens/sec. Match the machine to the model size you must load, then accept the speed its bandwidth allows.

▸ T2 is the enrollment floor — it runs the full curriculum end to end.

~/mindpool/hardware

tier	Apple	Mini-PC	DIY NVIDIA
T1 Entry	Mac mini M4 16–24GB 120 GB/s · $799–$999	—	RTX 5060 Ti 16GB 448 GB/s · ~$1,400
T2 Floor	MBP 14" M5 Pro 64GB 307 GB/s · ~$3,100	Strix Halo 128GB (Beelink GTR9 Pro / GMKtec EVO-X2) ~180 GB/s eff · ~$1,985–$1,999	used RTX 4090 24GB 1,008 GB/s · ~$2,700
T3 Pro	Mac Studio M3 Ultra 96GB 819 GB/s · $3,999	ASUS Ascent GX10 (GB10) 128GB 273 GB/s · ~$3,099	RTX 5090 32GB 1,792 GB/s · ~$4,600
T4 Research	MBP M5 Max 128GB 614 GB/s · ~$7,349	2× GB10 clustered → 256GB 273 GB/s · ~$6,200+	dual RTX 3090 = 48GB 936 GB/s/card · ~$3,200

T1Entry

Apple: Mac mini M4 16–24GB
120 GB/s · $799–$999
Mini-PC: —
DIY NVIDIA: RTX 5060 Ti 16GB
448 GB/s · ~$1,400

T2Floor

Apple: MBP 14" M5 Pro 64GB
307 GB/s · ~$3,100
Mini-PC: Strix Halo 128GB (Beelink GTR9 Pro / GMKtec EVO-X2)
~180 GB/s eff · ~$1,985–$1,999
DIY NVIDIA: used RTX 4090 24GB
1,008 GB/s · ~$2,700

T3Pro

Apple: Mac Studio M3 Ultra 96GB
819 GB/s · $3,999
Mini-PC: ASUS Ascent GX10 (GB10) 128GB
273 GB/s · ~$3,099
DIY NVIDIA: RTX 5090 32GB
1,792 GB/s · ~$4,600

T4Research

Apple: MBP M5 Max 128GB
614 GB/s · ~$7,349
Mini-PC: 2× GB10 clustered → 256GB
273 GB/s · ~$6,200+
DIY NVIDIA: dual RTX 3090 = 48GB
936 GB/s/card · ~$3,200

$ ls ./models --by-tier

What your box can actually run

Each model at its featured quant, against the four tiers: does it fit, and how fast does it decode? Speed is the bandwidth truth made literal — and it reads active params, which is why MoE models punch far above their size.

~/mindpool/models

model	T1	T2	T3	T4
# ── Gemma 4 ──
Gemma 4 E2B 2B · QAT UD-Q4_K_XL · 3 GB · multimodal · Apache-2.0 · source ↗	~120–448 tok/s	~180–1008 tok/s	~273–1792 tok/s	~273–936 tok/s
Gemma 4 E4B 4B · QAT UD-Q4_K_XL · 5 GB · multimodal · Apache-2.0 · source ↗	~60–224 tok/s	~90–504 tok/s	~137–896 tok/s	~137–468 tok/s
Gemma 4 12B 12B · QAT UD-Q4_K_XL · 7 GB · multimodal · Apache-2.0 · source ↗	~20–75 tok/s	~30–168 tok/s	~46–299 tok/s	~46–156 tok/s
Gemma 4 26B-A4B 26B (4B active) · QAT UD-Q4_K_XL · 15 GB · multimodal · Apache-2.0 · source ↗	~60–224 tok/s	~90–504 tok/s	~137–896 tok/s	~137–468 tok/s
Gemma 4 31B 31B · QAT UD-Q4_K_XL · 18 GB · multimodal · Apache-2.0 · source ↗	— too big	~12–65 tok/s	~18–116 tok/s	~18–60 tok/s
# ── Qwen3 ──
Qwen3 8B 8B · Q4_K_M · 6 GB · text · Apache-2.0 · source ↗	~30–112 tok/s	~45–252 tok/s	~68–448 tok/s	~68–234 tok/s
Qwen3 30B-A3B 30B (3B active) · Q4_K_M · 18 GB · text · Apache-2.0 · source ↗	— too big	~120–672 tok/s	~182–1195 tok/s	~182–624 tok/s
# ── Llama ──
Llama 8B 8B · Q4_K_M · 6 GB · text · Llama Community · source ↗	~30–112 tok/s	~45–252 tok/s	~68–448 tok/s	~68–234 tok/s
Llama 70B 70B · Q4_K_M · 42 GB · text · Llama Community · source ↗	— too big	~5–9 tok/s · 2/3 rigs	~8–23 tok/s · 2/3 rigs	~8–27 tok/s
Llama 405B 405B · Q4_K_M · 230 GB · text · Llama Community · source ↗	— too big	— too big	— too big	~1 tok/s · 1/3 rigs
# ── DeepSeek ──
DeepSeek-V2-Lite 16B (2.4B active) · Q4_K_M · 10 GB · text · DeepSeek · source ↗	~100–373 tok/s	~150–840 tok/s	~228–1493 tok/s	~228–780 tok/s
# ── OLMo 2 ──
OLMo 2 13B 13B · Q4_K_M · 8 GB · text · Apache-2.0 · source ↗	~18–69 tok/s	~28–155 tok/s	~42–276 tok/s	~42–144 tok/s

# ── Gemma 4 ──

Gemma 4 E2B

2B · QAT UD-Q4_K_XL · 3 GB · multimodal · Apache-2.0 · source ↗

T1: ~120–448 tok/s
T2: ~180–1008 tok/s
T3: ~273–1792 tok/s
T4: ~273–936 tok/s

Gemma 4 E4B

4B · QAT UD-Q4_K_XL · 5 GB · multimodal · Apache-2.0 · source ↗

T1: ~60–224 tok/s
T2: ~90–504 tok/s
T3: ~137–896 tok/s
T4: ~137–468 tok/s

Gemma 4 12B

12B · QAT UD-Q4_K_XL · 7 GB · multimodal · Apache-2.0 · source ↗

T1: ~20–75 tok/s
T2: ~30–168 tok/s
T3: ~46–299 tok/s
T4: ~46–156 tok/s

Gemma 4 26B-A4B

26B (4B active) · QAT UD-Q4_K_XL · 15 GB · multimodal · Apache-2.0 · source ↗

T1: ~60–224 tok/s
T2: ~90–504 tok/s
T3: ~137–896 tok/s
T4: ~137–468 tok/s

Gemma 4 31B

31B · QAT UD-Q4_K_XL · 18 GB · multimodal · Apache-2.0 · source ↗

T1: — too big
T2: ~12–65 tok/s
T3: ~18–116 tok/s
T4: ~18–60 tok/s

# ── Qwen3 ──

Qwen3 8B

8B · Q4_K_M · 6 GB · text · Apache-2.0 · source ↗

T1: ~30–112 tok/s
T2: ~45–252 tok/s
T3: ~68–448 tok/s
T4: ~68–234 tok/s

Qwen3 30B-A3B

30B (3B active) · Q4_K_M · 18 GB · text · Apache-2.0 · source ↗

T1: — too big
T2: ~120–672 tok/s
T3: ~182–1195 tok/s
T4: ~182–624 tok/s

# ── Llama ──

Llama 8B

8B · Q4_K_M · 6 GB · text · Llama Community · source ↗

T1: ~30–112 tok/s
T2: ~45–252 tok/s
T3: ~68–448 tok/s
T4: ~68–234 tok/s

Llama 70B

70B · Q4_K_M · 42 GB · text · Llama Community · source ↗

T1: — too big
T2: ~5–9 tok/s · 2/3 rigs
T3: ~8–23 tok/s · 2/3 rigs
T4: ~8–27 tok/s

Llama 405B

405B · Q4_K_M · 230 GB · text · Llama Community · source ↗

T1: — too big
T2: — too big
T3: — too big
T4: ~1 tok/s · 1/3 rigs

# ── DeepSeek ──

DeepSeek-V2-Lite

16B (2.4B active) · Q4_K_M · 10 GB · text · DeepSeek · source ↗

T1: ~100–373 tok/s
T2: ~150–840 tok/s
T3: ~228–1493 tok/s
T4: ~228–780 tok/s

# ── OLMo 2 ──

OLMo 2 13B

13B · Q4_K_M · 8 GB · text · Apache-2.0 · source ↗

T1: ~18–69 tok/s
T2: ~28–155 tok/s
T3: ~42–276 tok/s
T4: ~42–144 tok/s

formula tok/s ≈ bandwidth ÷ (active-params × bytes/param)

Estimated decode ceiling — real-world is lower (attention, KV, overhead) and this ignores prefill. Uses active params, so MoE reads fast-and-small.

as of 2026-06 — curated, not exhaustive; models churn, re-verify each footprint at its source.

▸ Why activeparams, not total, set speed — unpacked in the quantization & MoE depth-spike (coming with the curriculum).

$ cat ./if-you-are-X-buy-Y

The recommendation matrix

Tightest budget, want 70B in memory, OK with Linux + ROCm

Beelink GTR9 Pro (Strix Halo, 128 GB) · ~$1,985

Cheapest 128 GB box in volume; 70B Q4 ~5–8 tok/s.

Want a turnkey Windows 128 GB box, no building

GMKtec EVO-X2 (Strix Halo) · ~$1,999

Windows-native, OCuLink eGPU escape hatch.

Smoothest experience, value macOS + portability, models ≤34B

MacBook Pro 14" M5 Pro, 64 GB · ~$3,100

MLX just works for LoRA/QLoRA; 307 GB/s; ships now.

Fine-tuning-heavy, want fast tokens + GPU upgrade path

DIY used RTX 4090 24 GB · ~$2,700

1,008 GB/s; cleanest CUDA path; best $/capability.

Want NVIDIA's stack + 70B QLoRA in one mini box

ASUS Ascent GX10 (GB10, 128 GB) · ~$3,099

Cheapest GB10/DGX-class box; full CUDA, Linux-only.

Max memory + fast tokens, single vendor desk box

Mac Studio M3 Ultra, 96 GB · $3,999

819 GB/s — highest bandwidth shipping; 70B ~20–30 tok/s.

$ man bandwidth

The bandwidth truth

Generating each token reads the active weights out of memory, so your tok/s ceiling ≈ bandwidth ÷ active-weight-bytes. Capacity decides whether a model fits; bandwidth decides how fast it runs once it fits — and cloud spot (A100 ~$4–8/hr) is the relief valve for the rare 70B full fine-tune.

Mid-2026 street prices; the 2026 RAM shortage adds volatility — re-verify at purchase.

Got the rig? Get the toolchain.

One command installs and version-pins your local AI stack. Then enroll on-chain and start module 01.

// get the toolchain

$ curl -fsSL mindpool.io/install | sh

macOS · Linux · Windows (WSL2)

// or, on macOS (Apple Silicon)

$ brew install --cask mindpool-labs/tap/mpl