Skip to main content

Showing 1–1 of 1 results for author: Hathaway, W

  1. arXiv:2401.12181  [pdf, other

    cs.LG cs.AI cs.CL

    Universal Neurons in GPT2 Language Models

    Authors: Wes Gurnee, Theo Horsley, Zifan Carl Guo, Tara Rezaei Kheirkhah, Qinyi Sun, Will Hathaway, Neel Nanda, Dimitris Bertsimas

    Abstract: A basic question within the emerging field of mechanistic interpretability is the degree to which neural networks learn the same underlying mechanisms. In other words, are neural mechanisms universal across different models? In this work, we study the universality of individual neurons across GPT2 models trained from different initial random seeds, motivated by the hypothesis that universal neuron… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.