Mind Captioning: AI Translates Visual Brain Activity into Natural-Language Descriptions

Mind captioning is a new method that converts fMRI recordings into natural-language descriptions of what people see or imagine. Published in Science Advances on 5 November 2025, the technique maps brain scans from six participants to numerical "meaning signatures" derived from captions of over 2,000 videos, then uses a text generator to produce progressively refined sentence guesses. The system also described participants' recalled clips, suggesting similar brain codes for perception and memory. While promising for assistive communication, the approach is limited by small-sample training and raises mental-privacy concerns.

09:03, 07.11.2025Science

Mind Captioning: AI Translates Visual Brain Activity into Natural-Language Descriptions

AI 'mind captioning' converts visual brain activity into descriptive sentences

Scientists have taken a significant step toward decoding visual thoughts. A new technique called "mind captioning" converts non-invasive recordings of brain activity into natural-language descriptions of what a person is seeing or imagining, and does so with surprising detail and consistency.

The work, reported in a paper published in Science Advances on 5 November 2025, also provides insights into how the brain represents scenes before those representations are expressed in words. The authors suggest the approach could eventually help people with language impairments — for example, stroke survivors — to communicate more effectively.

How the method works

The team, led by Tomoyasu Horikawa and collaborators, combined two AI components with functional magnetic resonance imaging (fMRI):

Meaning signatures: A deep language model analyzed the text captions for more than 2,000 videos, converting each caption into a distinct numerical "meaning signature."
Brain-to-meaning mapping: A second AI was trained on fMRI scans from six participants watching those videos, learning to match patterns of brain activity to the corresponding meaning signatures.
Caption generation: For a new scan, the decoder predicts a meaning signature and a text-generation model searches for the sentence whose meaning best matches that signature.

Because the text generator ranks many candidate sentences, the system produces progressively refined guesses. For instance, when a volunteer viewed a clip of someone jumping from the top of a waterfall, early guesses included phrases such as "spring flow," the tenth guess read "above rapid falling water fall," and the 100th guess became "a person jumps over a deep water fall on a mountain ridge."

Seeing and remembering

The researchers also asked participants to recall previously viewed clips. The models generated plausible descriptions of these recollections, suggesting that the brain uses similar representational codes for perception and memory.

Potential applications and limitations

Because the approach relies on non-invasive fMRI, the authors say it could inform future brain–computer interfaces (BCIs) that translate non-verbal mental representations into text — including implanted BCIs for people who cannot speak. Alex Huth, a computational neuroscientist at UC Berkeley, says the model can predict what a person is looking at "with a lot of detail" and that this level of decoding is challenging to achieve.

However, several important limitations remain: the system was trained on a small group of participants, the mapping is individualized and depends on high-quality fMRI data, and the text generator ranks candidate sentences rather than outputting a single guaranteed-accurate caption. Previous work also found that text-generation models can produce fluent phrases that may not precisely reflect neural representations; Horikawa's approach mitigates this by explicitly matching decoded meaning signatures to candidate sentences.

Ethical concerns

The findings revive concerns about mental privacy. As decoding methods improve, in theory they could reveal private thoughts, emotions or health information that might be misused for surveillance, manipulation or discrimination. Both Huth and Horikawa emphasize that current systems require participants' consent and cannot read arbitrary private thoughts. As Huth put it, "Nobody has shown you can do that, yet."

Bottom line: Mind captioning is an important proof of concept showing that complex visual content and memories can be mapped to interpretable language via brain activity, but it remains constrained by data requirements, the need for per-person training, and ethical considerations around privacy.

AI 'mind captioning' converts visual brain activity into descriptive sentences

How the method works

Seeing and remembering

Potential applications and limitations

Ethical concerns

Trending