Seyed-Mahdi Khaligh-Razavi, Linda Henriksson, Kendrick Kay, Nikolaus Kriegeskorte
Visual processing in cortex happens through a hierarchy of increasingly sophisticated representations. Here we explore a very wide range of model representations (29 models), testing their categorization performance (animate/inanimate) and their ability to account for the representational geometry of brain regions along the visual hierarchy (V1, V2, V3, V4, and LO). We also created new model instantiations (85 model instantiations in total) by reweighting and remixing of the model features. Reweighting and remixing was based on brain responses to an independent training set of 1750 images. We assessed the models with representational similarity analysis (RSA), which characterizes the geometry of a representation by a representational dissimilarity matrix (RDM). In this study, the RDM is either computed on the basis of the model features or on the basis of predicted voxel responses. Voxel responses are predicted by linear combinations of the model features. The model features are linearly remixed so as to best explain the voxel responses (as in voxel/population receptive-field modelling). This new approach of combining RSA with voxel receptive field modelling may help bridge the gap between the two methods. We found that early visual areas are best accounted for by a Gabor wavelet pyramid (GWP) model. The GWP implementations we used performed similarly with and without remixing, suggesting that the original features already approximate the representational space, obviating the need for remixing or reweighting. The lateral occipital region (LO), a higher visual representation, was best explained by the higher layers of a deep convolutional network (Krizhevsky et al., 2012). However, this model could explain the LO representation only after appropriate remixing of its feature set. Remixed RSA takes a step in an important direction, where each computational model representation is explored more broadly by considering not only its representational geometry, but the set of all geometries within reach of a linear transform. The exploration of many models and many brain areas may lead to a better understanding of the processing stages in the visual hierarchy, from low-level image representations in V1 to visuo-semantic representations in higher-level visual areas.