“Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet”,2024 (NN sparsity, self-attention, Claude AI, autoencoder NN; backlinks)
View HTML (17MB):
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet