AI 'Mind-Captioning' Converts Visual Thoughts into Language
A researcher in Japan has developed a new technique that combines brain scanning and artificial intelligence to translate mental images into detailed, descriptive sentences. In a study published Nov. 5 in Science Advances, Tomoyasu Horikawa of NTT’s Communication Science Laboratories near Tokyo describes a system he calls 'mind-captioning' that maps visual brain activity to language representations and generates readable captions.
How the method works
Horikawa scanned six volunteers (four men and two women, native Japanese speakers ages 22–37) while they watched 2,180 short, silent video clips covering a wide range of objects, scenes and actions. Captions for those clips were processed by large language models that converted text into numeric representations. Simpler AI models, called decoders, were trained to match the participants' recorded brain activity to those numeric sequences.
After training, the decoders were used to interpret brain activity when participants watched or later recalled videos the system had not seen before. A separate algorithm then generated word sequences iteratively to find the text that best matched the decoded brain signals. Over time and with more data, the tool produced increasingly accurate, English-language descriptions—even though the participants were native Japanese speakers.
Potential uses and limitations
Importantly, Horikawa found the method could generate comprehensive visual descriptions without relying on activity from the brain's traditional language network, suggesting possible use for people with damage to language areas. The authors note potential applications for assisting people with aphasia or progressive conditions such as amyotrophic lateral sclerosis (ALS) that impair speech.
However, Horikawa and other experts emphasize current limitations: the technique requires large amounts of personalized training data collected with active cooperation, and the study used mostly typical video scenes (for example, a dog biting a man) rather than unusual or highly unexpected images. Because of these constraints, the team says the present approach is not yet accurate enough to 'read' private thoughts reliably.
Ethical and privacy concerns
Experts quoted in coverage of the study raised ethical questions. Marcello Ienca, a professor of AI and neuroethics at the Technical University of Munich, described the work as an incremental step toward brain-reading and warned that widespread access to neural data would require very strict safeguards because brains can reveal sensitive health and mental information. Psychologist Scott Barry Kaufman noted the potential for profound interventions for people who cannot speak, but urged careful, consent-based use.
Other commentators, including social scientist Łukasz Szoszkiewicz of the Neurorights Foundation, called for treating neural data as inherently sensitive, requiring explicit, purpose-limited consent, prioritizing on-device processing, and adding user-controlled 'unlock' mechanisms. A separate August paper in Cell proposed a mechanism in which users think of a specific keyword to intentionally 'unlock' decoding and prevent unwanted leakage of private thoughts.
Conclusion
Horikawa’s mind-captioning demonstrates promising progress at the intersection of neuroscience and generative AI. While it points toward assistive applications for people with language impairments, it also raises significant privacy, ethical and regulatory challenges that must be addressed before the approach can be used more broadly.