This page is no longer a sprint tracker. It is a decision map for what is already strong, what is still caveated, and what evidence would actually change the core thesis.
April 8–14 turned the project into a confidence-tier problem: two anchors are headline-safe, one anchor is scientifically valuable but caveated, and future extensions should be judged by whether they reduce those caveats.
| Branch | Confidence tier | What we now know | What still blocks a stronger claim |
|---|---|---|---|
| Jailbreak measurement + specificity | Supporting, caveated | Paired evaluator audit: v2 harmful slope 2.30pp/α (95% CI 0.99-3.58pp), v3 binary slope 0.46pp/α (95% CI -1.46-2.41pp), v3 substantive-compliance slope 2.00pp/α (95% CI 0.11-3.87pp). | No paired v2-v3 control set and only single-seed v3 random control. Strong enough for a measurement-case-study claim, not for a clean universal specificity headline. |
| D7 causal-head mitigation | Supporting, caveated | April 8 legacy panel: csv2_yes drops by -9.0pp (95% CI -12.2--5.8pp) vs baseline, while L1 comparator increases harm (+4.0pp; 95% CI 0.6-7.6pp). April 16 current panel: clean CSV2 v3 comparison with expanded layer-matched random controls and a scored probe branch; the shown random seed remains +17.2pp more harmful than causal on strict harmfulness (95% CI 13.2-21.4pp). | The live panel is no longer mixed-ruler or single-seed. Treat this as benchmark-local support because causal still carries token-cap and quality debt (112/500 causal outputs hit cap) and the probe/random branches retain a small documented residual evaluator-error set, not because seed 2 or the probe branch are missing. |
| ITI bridge externality | Headline-safe | Held-out test confirms externality break: -5.8pp adjudicated accuracy (95% CI -8.8--3.0pp), p=0.0002. 70.0% of right-to-wrong flips are wrong-entity substitutions. | Mechanistic interpretation is behavioral rather than circuit-level. Formal multi-rater coding could tighten failure-mode percentages. |
| SAE vs H-neuron dissociation | Headline-safe | Matched detection quality (AUROC 0.848, 95% CI 0.820-0.874 for SAE vs 0.843, 95% CI 0.815-0.870 for H-neurons), but SAE steering is null in both full-replacement and delta-only tests. | No major blocker. Remaining work is narrative integration, not rescue experimentation. |
| Selective truthfulness / deception extension | Hypothesis | Bridge taxonomy suggests global truth steering is too blunt; conditional interventions may be a better target for novelty. | No direct experiment yet. Needs a scoped protocol before it can move from concept to evidence. |
The next best experiment is the one that upgrades a supporting claim to headline-safe. Everything else is optional polish.
The project moved from "what should we run next?" to "which caveat still threatens the thesis?". April 8–16 evidence made that tradeoff explicit.
The flagship thesis does not need every branch to be positive. It needs each branch to answer a different stage-break question cleanly.
The flagship can already stand on two headline-safe anchors plus one caveated measurement anchor. The next decision is whether to spend remaining budget on caveat closure or on novelty exploration.