Progress — April 15–16

Week 5:
Current-State
Alignment

This update captures the delta from the paper-era snapshot to the current-state panel as of April 16, 2026. Week 4 stays archival; Week 5 is the current framing.

The key shift is narrower and better scoped: D7 now uses the April 16 mixed-ruler current-state audit with expanded random-head controls and explicit probe inclusion, while the StrongREJECT holdout framing is corrected to a tie.

D7 panel status
Mixed-ruler current panel
Canonical April 16 audit; benchmark-local support, not mechanism-clean closure.
Control availability
Expanded
Both current random-head control branches are complete, scored, and still error-bearing.
Probe branch
Included
Complete, scored, and still error-bearing on the canonical audit.
StrongREJECT holdout
Tie corrected
v3 96.0% vs SR-4o 96.0%; 0 discordant
Since-paper delta

What changed from Week 4 archival framing

Week 4 is preserved as a dated synthesis. Week 5 narrows to live-state alignment updates only, without rewriting the archival page.

Ruler debt is now explicit current-state caveat

The April 16 canonical audit remains mixed-ruler, so Week 5 now frames D7 as stronger current support without overstating closure.

Control coverage is no longer single-seed

Layer-matched random-head controls now exist for two seeds on the live panel, which closes the old "single-seed only" description.

Evaluator comparison language is corrected

StrongREJECT holdout is now communicated as parity with v3, not as a v3 holdout-accuracy advantage.

D7 current panel

Mixed-ruler current panel: causal remains strongest among completed branches

The panel now includes baseline, L1, causal, probe, and two random-head controls under current normalization, but the April 16 canonical audit still treats it as mixed-ruler rather than a like-for-like clean rerun.

Week 4 archival context
Historical synthesis remains preserved
Week 4 documented the first current-panel control follow-up as supporting evidence. That archival interpretation stays unchanged and remains the dated provenance snapshot.
Week 5 current-state read
Better controls, same core caveat
On the April 16 canonical audit, causal still beats the available probe and random branches on the current panel, and the stronger control coverage upgrades D7 to benchmark-local supporting evidence rather than mechanism-clean closure.
Residual framing

CSV2 error debt is explicit, but mixed-ruler status still matters

The canonical audit treats the CSV2 error rows as real but not sign-flipping, while keeping mixed-ruler comparison and causal token-cap debt in the main caveat set.

Seed 1 random residuals

The April 16 audit keeps the random branch explicitly error-bearing, which is enough to preserve the audit caveat even though the sign of the comparison stays stable.

Probe residuals

The probe branch is also still explicitly error-bearing, so Week 5 now treats that debt as part of the live interpretation rather than as resolved cleanup.

Primary live caveat

Causal branch token-cap pressure remains visible (112 capped outputs), and the canonical audit keeps mixed-ruler status alongside output-quality debt as a live caveat.

Holdout correction

StrongREJECT holdout tie is now explicit in the progress narrative

The evaluator framing is now aligned with the corrected holdout result: v3 and StrongREJECT-4o are tied on the holdout audit.

v3 holdout
96.0%
95% CI 90.0-100.0%
StrongREJECT-4o holdout
96.0%
95% CI 90.0-100.0%
Discordant correctness
0
on the shared holdout audit
Research principle

Treat archived synthesis as provenance and current-state panels as audit-bound live interpretation. Updating copy only where the cited audit supports it preserves both rigor and readability.

Interpretation status

Week 5 stance in one line

D7 is benchmark-local supporting evidence on the April 16 mixed-ruler current panel with stronger control coverage, and jailbreak evaluator framing now reflects holdout parity rather than superiority.