Dated synthesis pages that tell the story of each week's work
in the context of the paper replication and extension.
Latest
April 15–16, 2026 — since-paper current-state delta: D7 clean current panel,
two-seed random-head control availability, probe inclusion, residual CSV2-error framing,
and StrongREJECT holdout tie correction.
Historical snapshot
April 8–14, 2026 — broader thesis lock, three-anchor evidence hierarchy,
bridge test-set externality confirmation, jailbreak evaluator-dependent framing,
and the first D7 random-head follow-up.
Historical snapshot
March 24–30, 2026 — comparative steering sprint and measurement upgrades.
Kept as a dated archive page; some interpretations were refined by April 8–14 follow-up evidence.
Historical snapshot
March 16–24, 2026 — From one benchmark and no controls to three benchmarks,
two negative controls, and a four-tier evidence hierarchy. Preserved as a dated archive page; use Week 5 for the current-state framing.