[[Home]] > [[Natural Language Processing MOC]]
# Language Models
## Background
Language models are based on artificial neural networks, and more specifically *recurrent* neural networks. [[Long short-term memory]] and [[Gated recurrent unit]] models were the most popular types of RNNs used in NLP. Then, the [[RNN encoder-decoder]] was introduced, which used an [[Attention mechanism]]. This mechanism led to the conception of the [[Transformer model]], which uses a variant of attention called [[Self-attention mechanism]]. Current language models, generally referred to as *large language models* (LLMs), are based on this transformer architecture.
## Current Developments (reverse chronological order)
- 2025
- [[@nieLargeLanguageDiffusion2025|Diffusion models are a viable and promising alternative to autoregressive models for LLMs]]
- 2024
- DeepMind introduces its [[@googledeepmindIntroducingFrontierSafety2024|Frontier Safety Framework]] for AI risk mitigation
- [[@wolfeMixtureExpertsMoEBirth2024|Mixture-of-Expert layers enable increases in performance without a corresponding increase in compute]]
- [[@wolfeMixtureExpertsMoELLMs2025|Technical Details for MoE Language Models]]
- 2023
- [[@mahowaldDissociatingLanguageThought2023|LLMs are very successful on formal linguistic competence tasks, but struggle at functional linguistic competence]]
- [[@yaoTreeThoughtsDeliberate2023|Tree-of-Thoughts empowers LLMs to more autonomously and intelligently make decisions and solve problems]]
- [[@wangInstructionTuningLarge2023|Instructions enable LLMs to generalize over multiple tasks]]
- 2022
- [[@baiConstitutionalAIHarmlessness2022|Constitutional methods to allow for the training of models that are both helpful and harmless]]
- [[High-quality generated text has negative-log probability close to average human text entropy]]
- [[We resolve word ambiguities at the end of sentences]]
# 📚 References
## Papers
- [[@nieLargeLanguageDiffusion2025|Nie2025 - Large Language Diffusion Models]]
- [[@wolfeMixtureExpertsMoELLMs2025|Wolfe2025 - Mixture-of-Experts (MoE) LLMs]]
- [[@googledeepmindIntroducingFrontierSafety2024|GoogleDeepMind2024 - Frontier Safety Framework]]
- [[@wolfeMixtureExpertsMoEBirth2024|Wolfe2024 - Mixture-of-Experts (MoE): The Birth and Rise of Conditional Computation]]
- [[@mahowaldDissociatingLanguageThought2023|Mahowald2023 - Dissociating Language and Thought in Large Language Models: A Cognitive Perspective]]
- [[@wangInstructionTuningLarge2023|Wang2023 - Instruction Tuning of Large Language Models]]
- [[@yaoTreeThoughtsDeliberate2023|Yao2023 - Tree of Thoughts: Deliberate Problem Solving with Large Language Models]]
- [[@baiConstitutionalAIHarmlessness2022|Bai2022 - Constitutional AI: Harmlessness from AI Feedback]]
- [[@davidDealingSimilarityArgumentation2022|David2022 - Dealing with Similarity in Argumentation + Temporal Parametric Semantics from Knowledge Graph and Ontology]]
- [[@garisolerSemanticAmbiguity2021|GaríSoler2022 - Semantic Ambiguity]]
- [[@keidarSlangvolutionCausalAnalysis2022|Keidar2022 - Slangvolution: A Causal Analysis of Semantic Change and Frequency Dynamics in Slang]]
- [[@ouyangTrainingLanguageModels2022|Ouyang2022 -Training Language Models to Follow Instructions with Human Feedback]]
- [[@perezbeltrachiniNeuralTextGeneration2022|PerezBeltrachini2022 - Neural Text Generation Challenges, Models and Datasets]]
- [[@colomboLearningRepresentGenerate2021|Colombo2021 - Learning to Represent and Generate Text using Information Measures]]
- [[@karpinskaPerilsUsingMechanical2021|Karpinska2021 - The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation]]
- [[@vandecruysAutomaticPoetryGeneration2020|VanDeCruys2020 - Automatic Poetry Generation from Prosaic Text]]
- [[@behroozStoryQualityMatter2019|Behrooz2019 - Story Quality as a Matter of Perception: Using Word Embeddings to Estimate Cognitive Interest]]
- [[@lauDeepspeareJointNeural2018|Lau2018 - Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme]]
- [[@lundbergUnifiedApproachInterpreting2017|Lundberg2017 - A Unified Approach to Interpreting Model Predictions]]
- [[@linRougePackageAutomatic2004|Lin2004 - ROUGE: A Package for Automatic Evaluation of Summaries]]
- [[@papineniBleuMethodAutomatic2002|Papineni2002 - BLEU: A Method for Automatic Evaluation of Machine Translation]]
## Miscellaneous
- [Chhun2024 - Meta-Evaluation Methodology and Benchmark for Automatic Story Generation](https://theses.hal.science/tel-04975504) (my PhD thesis)
- [[WACAI 2021]]