LIVE DEMO

Interactive demonstration of verify-as-you-generate, stealth-edit correction, and concept stability at scale.

Scene 1

Verify-as-you-Generate

Vanilla-XL · t=0.6 · 8 samples

Checking...

∧

Select a theorem from the dropdown below and press Enter to prove it with live step-by-step verification.

Scene 2

Stealth-Edit Correction

Model fails all 32 attempts on or_elim. A perturbation Δ·u is optimized at Layer 10 FFN. Hook fires only during KV-cache prefill.

real checkpoint output

Theorem (8/32 format pass, 0/32 verified baseline)

theorem or_elim_demo (p q r : Prop) (hpq : p ∨ q) (hpr : p → r) (hqr : q → r) : r

[STATE] p q r : Prop hpq : p ∨ q hpr : p → r hqr : q → r ⊢ r

Baseline (no edit) real output #2/32waiting

↓

Corrected (with edit) stealth-edit outputwaiting

Scene 3

Concept Stability at Scale

DLCM concept pooling makes representations ~3x more stable under token-level paraphrasing than vanilla token representations.
Measured as cosine similarity between original and paraphrased internal representations across 4 model scales (23M-206M params).

Vanilla (token-level)DLCM (concept-level)

Speed