@googledeepmindIntroducingFrontierSafety2024

> Google DeepMind. 'Introducing the Frontier Safety Framework'. _Google DeepMind_, 17 Apr. 2024, [https://deepmind.google/discover/blog/introducing-the-frontier-safety-framework/](https://deepmind.google/discover/blog/introducing-the-frontier-safety-framework/). # Introducing the Frontier Safety Framework ## Overview Google DeepMind's **Frontier Safety Framework (FSF)** is a set of protocols for proactively identifying future AI capabilities that could cause severe harm and putting in place mechanisms to detect and mitigate them. It has three key components: 1. ==**Identifying capabilities a model may have with potential for severe harm.**== 2. ==**Evaluating our frontier models periodically to detect when they reach these Critical Capability Levels.**== 3. ==**Applying a mitigation plan when a model passes our early warning evaluations.**== ![|600](https://i.imgur.com/xcxtXPY.png) ## Core Components 1. **Critical Capability Levels (CCLs)** - ==**CCLs are thresholds where models may pose heightened risks if not properly mitigated.**== - Initial CCLs are defined in four high-risk domains: - **Autonomy:** Models capable of self-directed action and resource acquisition. - **Biosecurity:** Models that could enable the development or execution of biological threats. - **Cybersecurity:** Models that could automate or assist in cyberattacks. - **Machine Learning R&D:** Models that could accelerate AI research, potentially enabling rapid proliferation of advanced AI. 2. **Evaluation Protocols** - ==**Frontier models are periodically tested to assess proximity to CCLs using "early warning evaluations."**== - Evaluations are triggered by increases in "effective compute" (a measure integrating model size, data, and compute) or significant fine-tuning progress. - The goal is to provide a safety buffer before a model reaches a CCL. 3. **Mitigation Strategies** - ==**Security Mitigations:**== Prevent unauthorized access or exfiltration of model weights, with escalating levels of security (from industry standard controls to advanced confidential computing). - ==**Deployment Mitigations:**== Manage and restrict the use of critical capabilities in deployment, including safety fine-tuning, misuse detection, red-teaming, and, at the highest level, prevention of access to critical capabilities. ## Initial Critical Capability Levels (Examples) - **Autonomy:** Model can autonomously acquire resources and replicate itself. - **Biosecurity (Amateur/Expert Enablement):** Model can help non-experts or experts develop biological threats. - **Cybersecurity (Autonomy/Enablement):** Model can automate cyberattacks or help amateurs conduct sophisticated attacks. - **Machine Learning R&D:** Model can significantly accelerate AI research or automate the AI R&D pipeline. ## Implementation and Future Work - The initial framework is targeted for implementation by early 2025, before these risks are expected to materialize. - The FSF is exploratory and will evolve as understanding of AI risks improves. - Future updates will focus on: - More precise risk modeling and forecasting. - Advanced capability elicitation techniques. - Improved mitigation plans balancing safety and innovation. - Inclusion of additional risk domains (e.g., chemical, radiological, nuclear risks, misaligned AI). - Involving external authorities and independent experts in assessments and mitigation. --- _Note: this page was at least partly written using generative AI._