But there are some footholds. When I previously led the OpenAI interpretability team, we found that vision models contain some interpretable circuits, implementing human understandable algorithms. distill.pub/2020/circuits/zo…
Extending that to transformer language models has been challenging, but there's been a lot of progress, including discovering induction heads (transformer-circuits.pub/202…) and understanding superposition (transformer-circuits.pub/202…).
We are looking for candidates with deep scientific or engineering experience and some nontrivial engagement with mechanistic interpretability problems and ideas (eg. replicating published results or running their own experiments).
Most members of our team had prior careers in other fields – physics, mathematics, biology, neuroscience, data visualization, and software engineering. So please don't let a lack of experience in this area stop you from applying!
If you find this compelling, we hope you'll consider working in mechanistic interpretability, whether it's with us (jobs.lever.co/Anthropic/33dc…), or with the growing number of other academic and industry groups in this space.
PS: If you're looking to learn more about mechanistic interpretability, @NeelNanda5 – an alumnus of our team now at @GoogleDeepMind – has a great guide to getting started here: neelnanda.io/mechanistic-int…
The Anthropic mech interp is hiring! @ch402 is an absolutely fantastic mentor, manager and research lead and the team is up to great stuff, you should apply!
(Aside: Some ridiculous fraction of Anthropic is former physicists. 3/7 of the founding team have PhDs in physics -- Jared was previously a professor at JHU! -- and the trend has continued, with physicists like Adam joining us.)
Jul 28, 2023 · 2:47 AM UTC
I’ve paused my PhD to join the @AnthropicAI mechanistic interpretability team full time!
While I enjoyed grad school, it’s hard for me to imagine returning — working with the incredible team here on such consequential problems has been a dream.
Consider joining us👇!