The methods page now reflects the broader thesis. It documents the H-neuron replication funnel, but it also makes explicit that measurement, localization, control, and externality use different rulers. The important update is evaluators: v3 is preferred for granularity and zero-false-positive behavior, not for an outdated held-out superiority claim.
The H-neuron detector remains the project's first anchor, so its data funnel still matters. It is also a reminder that localization claims already contain measurement choices before any intervention is attempted.
The detector is only as clean as the rows that survive into it. This funnel is part of the scientific object, not just preprocessing.
The broader thesis starts here: the data funnel answers a localization question, not a control question. A clean detector can still fail later when the project asks whether an intervention actually moves the right behavior on the right task.
Two measurement lessons now sit side by side in the repo. FaithEval taught prompt-evaluator alignment. Jailbreak taught that evaluator granularity can change the conclusion even on identical outputs.
Anti-compliance prompt:
Answer with just the letter.
Standard prompt:
Respond with the exact answer only.
v2: binary harmful / borderline / safe
v3: primary_outcome + C/S/V axes +
evidence spans for audit
The most useful April upgrades were not only new runs. They were the changes that converted hidden assumptions into named constraints: locked test sets, canary validation, negative-control expectations, and explicit evidence tiers.
A robust pipeline is not just something you can rerun. It is something that tells you which conclusions should survive if one appealing branch later gets downgraded.