🧠 AI & Brains: Predicting the Mind 🤯
Tech
🎧



Meta’s FAIR team has developed TRIBE v2, a new foundation model aiming to connect artificial intelligence with neuroscience. The model works by aligning the internal representations of AI networks with patterns observed in primate brain activity, specifically high-resolution fMRI responses. It achieves this by leveraging existing deep neural networks rather than learning to ‘see’ or ‘hear’ from the beginning. The architecture incorporates three frozen models, a temporal transformer, and a subject-specific prediction block, processing stimuli through specialized encoders and generating a multi-modal time series. Notably, TRIBE v2 demonstrates strong performance, achieving a group correlation near 0.4 within the Human Connectome Project 7T dataset, representing a significant improvement over traditional models. Fine-tuning the model, even with limited data, yielded substantial gains in predictive accuracy. This research suggests a promising approach to understanding and modeling complex brain activity.
TRIBE v2: A Unified Approach to Brain Encoding
TRIBE v2 represents a significant advancement in neuroscience, addressing the longstanding challenge of integrating multisensory information within the human brain. Traditionally, research has focused on isolating specific cognitive functions to individual brain regions, creating a fragmented understanding. Meta’s FAIR team has developed a tri-modal foundation model, TRIBE v2, designed to overcome this limitation by aligning deep neural network representations with human brain activity, ultimately predicting high-resolution fMRI responses across diverse conditions. This model’s core innovation lies in its ability to predict brain activity with unprecedented accuracy, offering a unified framework for understanding complex cognitive processes.
Model Architecture and Training Methodology
The TRIBE v2 architecture is comprised of three frozen foundation models functioning as feature extractors, combined with a temporal transformer and a subject-specific prediction block. This system processes stimuli through three specialized encoders, generating embeddings that are compressed into a shared dimension (D=384) and concatenated to form a multi-modal time series with a model dimension of Dmodel=3×384=1152. A temporal transformer, consisting of eight layers and eight attention heads, then exchanges information across a 100-second window, enabling the model to capture dynamic changes in brain activity. Finally, the transformer outputs are decimated to the 1 Hz fMRI frequency and passed through a subject-specific prediction block, which projects the latent representations to 20,484 cortical vertices (fsaverage5 surface) and 8,802 subcortical voxels. The research team strategically utilized ‘deep’ datasets – involving multiple hours of recording from a few subjects – for training and ‘wide’ datasets for evaluation, observing a log-linear increase in encoding accuracy as data volume increased, indicating that the model’s predictive power will continue to scale with expanding neuroimaging repositories.
Performance and Novel Capabilities
TRIBE v2 demonstrates markedly superior performance compared to traditional Finite Impulse Response (FIR) models, which have long served as the gold standard for voxel-wise encoding. Notably, the model exhibits zero-shot generalization to new subjects, accurately predicting the group-averaged response of an “unseen subject” cohort with greater precision than the actual recordings of individual subjects within that cohort. Within the high-resolution Human Connectome Project (HCP) 7T dataset, TRIBE v2 achieved a group correlation (Rgroup) near 0.4, representing a two-fold improvement over the median subject’s group-predictivity. Furthermore, even with limited data – just one hour for a new participant – fine-tuning TRIBE v2 for one epoch yields a two- to four-fold improvement over linear models trained from scratch. This suggests the model’s adaptability and potential for applications like piloting or pre-screening neuroimaging studies. Analysis of the model’s final layer, using Independent Component Analysis (ICA), revealed that TRIBE v2 naturally learns five well-known functional networks: primary auditory, language, motion, default mode, and visual.
This article is AI-synthesized from public sources and may not reflect original reporting.