A Convolutional Neural Network Interpretable Framework for Human Ventral Visual Pathway Representation

Mufan Xue; Xinyu Wu; Jinlong Li; Xuesong Li; Guoyuan Yang

doi:10.1609/aaai.v38i6.28461

A Convolutional Neural Network Interpretable Framework for Human Ventral Visual Pathway Representation

Mufan Xue, Xinyu Wu, Jinlong Li, Xuesong Li, Guoyuan Yang^*

^*Corresponding author for this work

Beijing Institute of Technology

Research output: Contribution to journal › Conference article › peer-review

Abstract

Recently, convolutional neural networks (CNNs) have become the best quantitative encoding models for capturing neural activity and hierarchical structure in the ventral visual pathway. However, the weak interpretability of these black-box models hinders their ability to reveal visual representational encoding mechanisms. Here, we propose a convolutional neural network interpretable framework (CNN-IF) aimed at providing a transparent interpretable encoding model for the ventral visual pathway. First, we adapt the feature-weighted receptive field framework to train two high-performing ventral visual pathway encoding models using large-scale functional Magnetic Resonance Imaging (fMRI) in both goal-driven and data-driven approaches. We find that network layer-wise predictions align with the functional hierarchy of the ventral visual pathway. Then, we correspond feature units to voxel units in the brain and successfully quantify the alignment between voxel responses and visual concepts. Finally, we conduct Network Dissection along the ventral visual pathway including the fusiform face area (FFA), and discover variations related to the visual concept of ‘person’. Our results demonstrate the CNN-IF provides a new perspective for understanding encoding mechanisms in the human ventral visual pathway, and the combination of ante-hoc interpretable structure and post-hoc interpretable approaches can achieve fine-grained voxel-wise correspondence between model and brain. The source code is available at: https://github.com/BITYangLab/CNN-IF.

Original language	English
Pages (from-to)	6413-6421
Number of pages	9
Journal	Proceedings of the AAAI Conference on Artificial Intelligence
Volume	38
Issue number	6
DOIs	https://doi.org/10.1609/aaai.v38i6.28461
Publication status	Published - 25 Mar 2024
Event	38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada Duration: 20 Feb 2024 → 27 Feb 2024

Access to Document

10.1609/aaai.v38i6.28461

Cite this

Xue, M., Wu, X., Li, J., Li, X., & Yang, G. (2024). A Convolutional Neural Network Interpretable Framework for Human Ventral Visual Pathway Representation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 6413-6421. https://doi.org/10.1609/aaai.v38i6.28461

@article{8014120e948a4f60bbe220b4a3149999,

title = "A Convolutional Neural Network Interpretable Framework for Human Ventral Visual Pathway Representation",

abstract = "Recently, convolutional neural networks (CNNs) have become the best quantitative encoding models for capturing neural activity and hierarchical structure in the ventral visual pathway. However, the weak interpretability of these black-box models hinders their ability to reveal visual representational encoding mechanisms. Here, we propose a convolutional neural network interpretable framework (CNN-IF) aimed at providing a transparent interpretable encoding model for the ventral visual pathway. First, we adapt the feature-weighted receptive field framework to train two high-performing ventral visual pathway encoding models using large-scale functional Magnetic Resonance Imaging (fMRI) in both goal-driven and data-driven approaches. We find that network layer-wise predictions align with the functional hierarchy of the ventral visual pathway. Then, we correspond feature units to voxel units in the brain and successfully quantify the alignment between voxel responses and visual concepts. Finally, we conduct Network Dissection along the ventral visual pathway including the fusiform face area (FFA), and discover variations related to the visual concept of {\textquoteleft}person{\textquoteright}. Our results demonstrate the CNN-IF provides a new perspective for understanding encoding mechanisms in the human ventral visual pathway, and the combination of ante-hoc interpretable structure and post-hoc interpretable approaches can achieve fine-grained voxel-wise correspondence between model and brain. The source code is available at: https://github.com/BITYangLab/CNN-IF.",

author = "Mufan Xue and Xinyu Wu and Jinlong Li and Xuesong Li and Guoyuan Yang",

note = "Publisher Copyright: Copyright {\textcopyright} 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 38th AAAI Conference on Artificial Intelligence, AAAI 2024 ; Conference date: 20-02-2024 Through 27-02-2024",

year = "2024",

month = mar,

day = "25",

doi = "10.1609/aaai.v38i6.28461",

language = "English",

volume = "38",

pages = "6413--6421",

journal = "Proceedings of the AAAI Conference on Artificial Intelligence",

issn = "2159-5399",

publisher = "Association for the Advancement of Artificial Intelligence",

number = "6",

}

TY - JOUR

T1 - A Convolutional Neural Network Interpretable Framework for Human Ventral Visual Pathway Representation

AU - Xue, Mufan

AU - Wu, Xinyu

AU - Li, Jinlong

AU - Li, Xuesong

AU - Yang, Guoyuan

PY - 2024/3/25

Y1 - 2024/3/25

N2 - Recently, convolutional neural networks (CNNs) have become the best quantitative encoding models for capturing neural activity and hierarchical structure in the ventral visual pathway. However, the weak interpretability of these black-box models hinders their ability to reveal visual representational encoding mechanisms. Here, we propose a convolutional neural network interpretable framework (CNN-IF) aimed at providing a transparent interpretable encoding model for the ventral visual pathway. First, we adapt the feature-weighted receptive field framework to train two high-performing ventral visual pathway encoding models using large-scale functional Magnetic Resonance Imaging (fMRI) in both goal-driven and data-driven approaches. We find that network layer-wise predictions align with the functional hierarchy of the ventral visual pathway. Then, we correspond feature units to voxel units in the brain and successfully quantify the alignment between voxel responses and visual concepts. Finally, we conduct Network Dissection along the ventral visual pathway including the fusiform face area (FFA), and discover variations related to the visual concept of ‘person’. Our results demonstrate the CNN-IF provides a new perspective for understanding encoding mechanisms in the human ventral visual pathway, and the combination of ante-hoc interpretable structure and post-hoc interpretable approaches can achieve fine-grained voxel-wise correspondence between model and brain. The source code is available at: https://github.com/BITYangLab/CNN-IF.

AB - Recently, convolutional neural networks (CNNs) have become the best quantitative encoding models for capturing neural activity and hierarchical structure in the ventral visual pathway. However, the weak interpretability of these black-box models hinders their ability to reveal visual representational encoding mechanisms. Here, we propose a convolutional neural network interpretable framework (CNN-IF) aimed at providing a transparent interpretable encoding model for the ventral visual pathway. First, we adapt the feature-weighted receptive field framework to train two high-performing ventral visual pathway encoding models using large-scale functional Magnetic Resonance Imaging (fMRI) in both goal-driven and data-driven approaches. We find that network layer-wise predictions align with the functional hierarchy of the ventral visual pathway. Then, we correspond feature units to voxel units in the brain and successfully quantify the alignment between voxel responses and visual concepts. Finally, we conduct Network Dissection along the ventral visual pathway including the fusiform face area (FFA), and discover variations related to the visual concept of ‘person’. Our results demonstrate the CNN-IF provides a new perspective for understanding encoding mechanisms in the human ventral visual pathway, and the combination of ante-hoc interpretable structure and post-hoc interpretable approaches can achieve fine-grained voxel-wise correspondence between model and brain. The source code is available at: https://github.com/BITYangLab/CNN-IF.

UR - http://www.scopus.com/inward/record.url?scp=85189564754&partnerID=8YFLogxK

U2 - 10.1609/aaai.v38i6.28461

DO - 10.1609/aaai.v38i6.28461

M3 - Conference article

AN - SCOPUS:85189564754

SN - 2159-5399

VL - 38

SP - 6413

EP - 6421

JO - Proceedings of the AAAI Conference on Artificial Intelligence

JF - Proceedings of the AAAI Conference on Artificial Intelligence

IS - 6

T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024

Y2 - 20 February 2024 through 27 February 2024

ER -

A Convolutional Neural Network Interpretable Framework for Human Ventral Visual Pathway Representation

Abstract

Access to Document

Other files and links

Fingerprint

Cite this