Anchor 1 appendix

Neuron 4288:
from dramatic weight to artifact verdict

The strongest L1-weighted neuron in the sparse detector looked like a possible hub. This page presents six independent tests, one scoreboard, and the evidence for why the project should not anchor its mechanism story on this neuron alone.

Appendix note: this page is an Anchor 1 appendix, not a standalone flagship claim. It supports the narrower H-neuron replication story by showing why the mechanism object should remain distributed rather than collapse onto one neuron.

L20:N4288

12.17

L1 weight, rank #1 at C=1.0

Rank at C=3.0

The first sign the dominance is regularization-specific

Standalone AUC

0.590

Below the threshold we set for a unique hub story

Largest contribution share

7.4%

Far below the 30% "dominant feature" threshold

Ablation loss

1.0pp

Too small for a single-neuron bottleneck claim

Overall verdict

0/6

No test supports a unique neuron-hub interpretation

Six tests Scoreboard Artifact rationale

Forensic investigation

Six tests, each asking a different version of the same question

A real hub neuron should survive multiple kinds of scrutiny. We required broad support instead of letting one flattering plot do all the work.

1. Single-neuron AUC Artifact

If 4288 were genuinely carrying the detector, it should classify well on its own. Instead its standalone AUC is 0.590, worse than L13:N833 and L14:N8547.

2. Distribution separation Artifact

Cohen's d is 0.326, below the 0.5 cutoff we set ahead of time. The activations separate somewhat, but not enough to justify a "special neuron" story.

3. C-sweep stability Artifact

This is the sharpest test. Neuron 4288 is absent at C ≤ 0.3, appears at C = 1.0, then falls to rank 5 at C = 3.0 and rank 11 at C = 10.0. Real mechanism should not be that C-fragile.

4. Per-example contribution Artifact

Despite the oversized L1 weight, 4288 is the largest positive contributor on only 7.4% of examples. That is more "frequent committee member" than "single point of control."

5. Leave-one-out ablation Artifact

Zeroing the neuron reduces accuracy by 1.03pp, below the 2pp threshold. It matters, but not enough to look like a true bottleneck.

6. Correlation structure Artifact

The strongest observed correlation is r = 0.492 with L26:N1359, a zero-weight neuron. That is the exact pattern you expect when L1 picks one representative from a correlated feature group.

Why six tests instead of one: a neuron can look important by one measure and ordinary by another. This grid is designed like a cross-examination, where each test checks a different loophole in the original "hub neuron" hypothesis.

Verdict

0/6 tests support a unique hub-neuron story

Loading top-neuron verdict...

AUC

0.590

Best single: 0.703

Cohen’s d

0.326

Runner-up: 0.477

C-sweep

3/9

Present in only 3 of 9 C values

Top contrib

7.4%

Largest contributor share

Ablation

1.0pp

Below 2pp threshold

Max |r|

0.492

L1 concentration signal

Back to the results summary

Artifact rationale

The stronger mechanistic object is the distributed detector, not the loudest single neuron

The lesson here is not "ignore neuron 4288." It is "do not confuse sparse model bookkeeping with causal centrality."

What survived scrutiny

The overall detector remains useful. The clean 76.5% result (95% CI 73.6-79.5%) and the intervention effect do not depend on pretending 4288 is a unique control node.

What got withdrawn

The idea that the sparse 38-neuron detector collapses onto one superstar neuron does not survive contact with ablation, C-sweeps, or correlation structure.

Why C=3.0 matters

At C=3.0 the signal is spread across 219 positive-weight neurons, and 4288 drops to rank #5. That is a better fit for a distributed mechanism story.

Practical interpretation: think of neuron 4288 as the boldface term in a sparse summary, not as the whole argument. If the project wants the cleanest paper-faithful replication headline, the 38-neuron set still works as a baseline. If the project wants the best mechanistic description, the wider positive-weight detector is the better object.