A systematic comparison of large language models suggests that larger models align better with both human behavior and brain activity during natural reading. Instruction tuning, however, does not ...