> Research Papers
Selected publications and preprints. Some papers have been submitted or are currently in writing, they are going to be listed here soon. Click on an entry to view the abstract.
Interpreting LLM–Brain Alignment Through Shared Inductive Biases
Abstract
Large Language Models (LLMs) allow us to probe human language processing in a direct way by comparing their sentence representations to activity in the human Language Network (LN). In this setting, high “brain scores”—correlations between LLM sentence representations and LN fMRI responses—are often taken as evidence of shared computational principles (Goldstein et al., 2022). However, such scores can be inflated by trivial properties such as sentence length (Feghhi et al., 2024), and mechanistic differences between biological and artificial networks make it unclear which aspects of sentence processing this alignment actually reflects. We therefore ask how to embed brain scores in a principled framework where they provide evidence about *shared inductive biases* (IBs) between LLMs and the LN. Inductive biases are a natural level of analysis for alignment work, yet they have so far played only an informal role. Our contribution is twofold. First, motivated by the goal of using high alignment as evidence about key computational components of human language processing, we propose a framework for studying shared IBs in language tasks. Second, we design experiments that begin to validate this framework and demonstrate how it can be applied to analyze alignment results. We focus on the Pereira et al. (2018) fMRI dataset (N = 384 short, read English sentences grouped by topic). Following AlKhamissi et al. (2024), we retain only the top 512 most LN-aligned LLM units and fit voxel-wise ridge regression decoders from LLM representations to fMRI responses.