A broader Gemma 3 4B thesis: measurement, localization, control, and externality are separable stages, and strong readouts often fail as steering targets. H-neurons remain the paper-faithful first anchor, not the whole story.
H-neuron replication still holds: 38 neurons, 76.5% disjoint accuracy, and a +4.5pp FaithEval gain above no-op.
Specificity was tested: random-neuron intervention controls stay flat on FaithEval and FalseQA.
Interpretation got narrower: the 4288 top-neuron story failed 0/6 cross-checks, and the stronger mechanism hypothesis is distributed rather than singular.
Why this anchor matters: it establishes a real localization and control foothold before the broader thesis asks where that foothold breaks.
Control does not guarantee transfer: the bridge benchmark now shows a held-out externality break.
Locked test result: ITI reduces adjudicated accuracy by -5.8pp (95% CI -8.8--3.0pp) on 500 held-out TriviaQA items.
Dominant failure mode: 70.0% of right-to-wrong flips are wrong-entity substitutions, not refusal.
Project consequence: an intervention can help on one evaluation surface and still corrupt free-form factual generation.
Measurement changes conclusions: the original 256-token jailbreak slope was a truncation artifact.
Paired v2-v3 audit: v2 shows a positive binary slope, v3 binary is uncertain, and v3 still recovers a substantive-compliance rise.
Why v3 stays primary: not because it leads on held-out accuracy after the tie, but because it supplies the taxonomy and granularity the paper needs without introducing false positives on the gold set.
Project consequence: evaluator dependence is part of the scientific result, not just reporting overhead.
D7 is supporting evidence: the April 8 legacy panel stays historical, and the April 16 current panel is now a clean CSV2 v3 comparison with expanded layer-matched random controls and a scored probe branch; the shown random seed remains +17.2pp more harmful than causal on strict harmfulness (95% CI 13.2-21.4pp).
v3 is not pitched as universally superior: the holdout rerun closed the old StrongREJECT gap, so the rationale is measurement quality, not outdated superiority language.
Single-model scope remains explicit: the site is framed as a Gemma 3 4B case study in intervention science.
The stable pattern across the April evidence is broader than H-neurons alone: measurement, localization, control, and externality come apart. Good readouts do not reliably identify good steering targets, and even successful interventions stay narrow unless validated on multiple surfaces.
The paper-faithful sparse baseline survives the hard checks: 76.5% held-out accuracy (95% CI 73.6-79.5%), a +4.5pp FaithEval gain above no-op, and flat random-neuron intervention controls. What narrowed is the interpretation, not the existence of signal.
Neuron (20, 4288) has 1.65x the classifier weight of the runner-up but scores 0/6 on independent checks. By C=3.0 the signal is spread across 219 positive-weight neurons and accuracy rises to 80.5%. Sparse bookkeeping is not the same thing as mechanistic centrality.
The held-out bridge benchmark upgrades the externality claim to stronger held-out evidence: E0 ITI lowers adjudicated accuracy by -5.8pp (95% CI -8.8--3.0pp) on 500 locked test questions, with p=0.0002. The damage is mostly active corruption, not abstention: 30 of 43 right-to-wrong flips (70.0%) are wrong-entity substitutions.
The 256-token binary headline was falsified by a 5000-token rerun. On paired outputs, v2 reports a 2.30pp/α binary slope while v3 compresses the same binary slope to 0.46pp/α, but v3 recovers a 2.00pp/α substantive-compliance shift. v3 stays primary after the holdout tie for taxonomy and zero-false-positive behavior, while specificity remains limited by missing paired controls and a single-seed random control.
The April 8 legacy panel still shows the locked causal head intervention reducing `csv2_yes` by -9.0pp (95% CI -12.2--5.8pp) while the L1 comparator increases it by +4.0pp (95% CI 0.6-7.6pp). The April 16 current panel now uses a clean CSV2 v3 comparison with expanded layer-matched random controls and a scored probe branch; the shown random seed remains +17.2pp more harmful than causal on strict harmfulness (95% CI 13.2-21.4pp). The live caveat is now causal token-cap and quality debt plus a small documented residual evaluator-error set, not missing seeds or a missing probe branch, so this stays supporting rather than selector-specific closure.
April 8-16 turned multiple attractive stories into narrower ones: D7 moved to clean-panel supporting evidence instead of the old single-seed framing, v3 lost outdated holdout-superiority framing, and jailbreak binary counts gave way to a severity story. That is a methodological win, not a retreat.