> Mahowald, Kyle, et al. _Dissociating Language and Thought in Large Language Models: A Cognitive Perspective_. arXiv:2301.06627, arXiv, 16 Jan. 2023. _arXiv.org_, [https://doi.org/10.48550/arXiv.2301.06627](https://doi.org/10.48550/arXiv.2301.06627). # Dissociating language and thought in large language models: A cognitive perspective ## Introduction - The Turing test has led to fallacies and misconceptions related to the language-thought relationship - **==Good at language does not imply good at thinking==** - Bad at thinking does not imply bad at language \[it's just the contrapositive though\] - **==Formal vs functional linguistic competence==**: language rules vs ability to use language in real world - **==Language and thought in humans are robustly dissociable==** - Evidence stems from individuals with aphasia and brain imaging studies ## Formal linguistic competence - **==LLMs learn hierarchical structure and abstraction==** - Even on semantically empty sentences, LLMs generate correct syntax with varying success, but consistently above chance - LLMs pick up on statistical regularities without necessarily learning linguistic information -> **=="right for the wrong reason"==** - Combination of word co-occurrence knowledge and abstract morphosyntactic rules - Which biases are introduced by model architectures? Same as humans? ## Functional linguistic competence - **==LLMs are great at pretending to think==** - Struggle to come up with creative solutions to novel, unseen tasks - **Formal reasoning**: dissociated from language in cognitive systems - **Semantic knowledge** too (studies on aphasia and semantic dementia) - LLM world knowledge is brittle and biased - **SItuation modeling** might be performed by the default network and not the language network - By design, LLMs are incapable of tracking information over long contexts - **Social reasoning**: theory of mind network - LLMs are unable to interpret sarcasm or complete jokes, lack communicative intent. - They have "nothing to say", since their objective is maximizing next-word predictive accuracy - **==LLMs' behaviour highlights the difference between being good at language and being good at thought==** - **Is it reasonable to use a _single_ system with a _single_ objective function to model _diverse_ functional language capabilities?** ## Building human-like models - **==Modularity==** - **Architectural modularity**: pairing transformers with memory modules - **Emergent modularity**: end-to-end training that allows specialized modules to develop within the model (e.g. attention heads attending to different input features) - **==Training==** - **Training on naturalistic data** is **biased** towards low-level input properties, **does not reflect the world** faithfully, and incentivizes the model to learn patterns but **limits their ability to generalize** - **Need to adjust ==training data== and ==objective function==** - Counterintuitively, mastering language may _require_ a general intelligence model - **==Benchmark==** - **No single benchmark** for evaluating functional linguistic competence - Developing **comprehensive and separate assessments of formal and functional linguistic competence in LLMs** will enable the development of better models ## Conclusion - **LLMs are very ==successful on formal linguistic competence tasks==, but ==struggle at functional linguistic competence==** - **Next-word predicting models are not enough for developing an AGI. On the contrary, ==an AGI may be needed for excelling at real-life language==** - **Future advances in AGI will probably require ==combining LLMs with models that perform well at functional tasks== (reasoning, abstract modeling…)**