[[Home]] > [[Natural Language Processing MOC]] # Language Models ## Background Language models are based on artificial neural networks, and more specifically *recurrent* neural networks. [[Long short-term memory]] and [[Gated recurrent unit]] models were the most popular types of RNNs used in NLP. Then, the [[RNN encoder-decoder]] was introduced, which used an [[Attention mechanism]]. This mechanism led to the conception of the [[Transformer model]], which uses a variant of attention called [[Self-attention mechanism]]. Current language models, generally referred to as *large language models* (LLMs), are based on this transformer architecture. ## Current Developments (reverse chronological order) - 2025 - [[@nieLargeLanguageDiffusion2025|Diffusion models are a viable and promising alternative to autoregressive models for LLMs]] - 2024 - DeepMind introduces its [[@googledeepmindIntroducingFrontierSafety2024|Frontier Safety Framework]] for AI risk mitigation - [[@wolfeMixtureExpertsMoEBirth2024|Mixture-of-Expert layers enable increases in performance without a corresponding increase in compute]] - [[@wolfeMixtureExpertsMoELLMs2025|Technical Details for MoE Language Models]] - 2023 - [[@mahowaldDissociatingLanguageThought2023|LLMs are very successful on formal linguistic competence tasks, but struggle at functional linguistic competence]] - [[@yaoTreeThoughtsDeliberate2023|Tree-of-Thoughts empowers LLMs to more autonomously and intelligently make decisions and solve problems]] - [[@wangInstructionTuningLarge2023|Instructions enable LLMs to generalize over multiple tasks]] - 2022 - [[@baiConstitutionalAIHarmlessness2022|Constitutional methods to allow for the training of models that are both helpful and harmless]] - [[High-quality generated text has negative-log probability close to average human text entropy]] - [[We resolve word ambiguities at the end of sentences]] # 📚 References ## Papers - [[@nieLargeLanguageDiffusion2025|Nie2025 - Large Language Diffusion Models]] - [[@wolfeMixtureExpertsMoELLMs2025|Wolfe2025 - Mixture-of-Experts (MoE) LLMs]] - [[@googledeepmindIntroducingFrontierSafety2024|GoogleDeepMind2024 - Frontier Safety Framework]] - [[@wolfeMixtureExpertsMoEBirth2024|Wolfe2024 - Mixture-of-Experts (MoE): The Birth and Rise of Conditional Computation]] - [[@mahowaldDissociatingLanguageThought2023|Mahowald2023 - Dissociating Language and Thought in Large Language Models: A Cognitive Perspective]] - [[@wangInstructionTuningLarge2023|Wang2023 - Instruction Tuning of Large Language Models]] - [[@yaoTreeThoughtsDeliberate2023|Yao2023 - Tree of Thoughts: Deliberate Problem Solving with Large Language Models]] - [[@baiConstitutionalAIHarmlessness2022|Bai2022 - Constitutional AI: Harmlessness from AI Feedback]] - [[@davidDealingSimilarityArgumentation2022|David2022 - Dealing with Similarity in Argumentation + Temporal Parametric Semantics from Knowledge Graph and Ontology]] - [[@garisolerSemanticAmbiguity2021|GaríSoler2022 - Semantic Ambiguity]] - [[@keidarSlangvolutionCausalAnalysis2022|Keidar2022 - Slangvolution: A Causal Analysis of Semantic Change and Frequency Dynamics in Slang]] - [[@ouyangTrainingLanguageModels2022|Ouyang2022 -Training Language Models to Follow Instructions with Human Feedback]] - [[@perezbeltrachiniNeuralTextGeneration2022|PerezBeltrachini2022 - Neural Text Generation Challenges, Models and Datasets]] - [[@colomboLearningRepresentGenerate2021|Colombo2021 - Learning to Represent and Generate Text using Information Measures]] - [[@karpinskaPerilsUsingMechanical2021|Karpinska2021 - The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation]] - [[@vandecruysAutomaticPoetryGeneration2020|VanDeCruys2020 - Automatic Poetry Generation from Prosaic Text]] - [[@behroozStoryQualityMatter2019|Behrooz2019 - Story Quality as a Matter of Perception: Using Word Embeddings to Estimate Cognitive Interest]] - [[@lauDeepspeareJointNeural2018|Lau2018 - Deep-speare: A Joint Neural Model of Poetic Language, Meter and Rhyme]] - [[@lundbergUnifiedApproachInterpreting2017|Lundberg2017 - A Unified Approach to Interpreting Model Predictions]] - [[@linRougePackageAutomatic2004|Lin2004 - ROUGE: A Package for Automatic Evaluation of Summaries]] - [[@papineniBleuMethodAutomatic2002|Papineni2002 - BLEU: A Method for Automatic Evaluation of Machine Translation]] ## Miscellaneous - [Chhun2024 - Meta-Evaluation Methodology and Benchmark for Automatic Story Generation](https://theses.hal.science/tel-04975504) (my PhD thesis) - [[WACAI 2021]]