The strongest project claim is now cross-method and public-facing: in Gemma 3 4B-IT, measurement, localization, control, and externality repeatedly come apart. Strong readouts often fail as steering targets, and successful steering remains narrow unless tested across surfaces.
H-neurons still matter, but as Anchor 1. They establish that a sparse detector can survive a clean held-out split and move behavior. The broader thesis comes from what happens next: interpretation fragility, transfer failures, and evaluator choices that materially change the conclusion.
The thesis is intentionally narrower than "detectors are useless" and broader than "H-neurons replicate." Across the methods in this repo, good readouts did not consistently become good levers. When control worked, it was benchmark-local, surface-local, or vulnerable to measurement choice.
This anchor establishes that the project is not built on pure null results. A sparse detector survives the clean split, shifts FaithEval behavior, and passes intervention specificity checks. The same anchor also taught the project where interpretation can go wrong.
The bridge benchmark is the clearest test of whether an intervention that helps on one surface stays helpful when the task changes from multiple-choice truthfulness to free-form factual generation. It does not.
The bridge result acts like a road test after a clean engine-bench result. The component can pass the controlled lab setting and still steer the full system into the wrong lane once the task becomes open-ended.
The evaluator story is now a scientific case study in its own right. The project first found a false positive caused by truncation, then showed that two evaluators can disagree on the same outputs in a way that is mechanically explainable rather than mysterious.
The strategic assessment's main discipline was evidence hierarchy. Some results can lead sections. Others strengthen the picture but should not carry the project alone.