@leahyUnderstandingAIExtinction2024

> Leahy, Connor, et al. 'Understanding AI Extinction Risks'. _The Compendium_, 9 Dec. 2024, [https://www.thecompendium.ai/](https://www.thecompendium.ai/). # The Compendium — Understanding AI Extinction Risks ## Summary - **Massive AI progress in the last 10 years**: near-human-level in writing, coding, art, math, etc. - We don't need to understand AI models to create them. - Progress is constrained mostly be the availability of resources (chips, electrical power, data). - Three strategies for AI development: tools, groups, methods. - **==We should assume the current trend will continue, eventually leading to artificial general intelligence (AGI).==** - **AGI would notably be able to do AI research, and therefore improve its own capabilities.** - It would then quickly reach artificial superintelligence (ASI), thus acquiring god-like powers. - **We need to ensure AI is aligned with our goals, lest it harm humanity while pursuing its own objectives.** - There is currently not enough investment in AI alignment research. - If we can't have alignment, **we need to prevent godlike-AI to arise, e.g., through global regulation of AI.** - Governance work is little and undermined by AI companies. - **What can you do? ==Spread the word and awareness of the risks, and exercise basic civic duty by contacting your representatives and voting according to the extinction risks posed by AI.==** ## Introduction - Humans have a 3x difference in brain size compared to chimpanzees, but our cognitive abilities are much more superior: **intelligence does not scale linearly**. - AI may become more intelligent than us: that would put humanity at its mercy. - **We need to understand where we want to go, chart a path there, and walk this path:** - **==If we don't do something, it doesn't happen.==** - **==The only thing necessary for the triumph of evil is for good people to do nothing.==** ## The State of AI Today - Past AI summers were driven by algorithmic breakthroughs, but **recent advancements have been driven by scale**: better data, more training time, more resources. - The most likely bottleneck is power consumption and chip production capacity. AI companies are thus massively investing in both areas. - Modern AI uses deep learning to optimize its internal parameters: **==AI capabilities are grown, not built==**. - We don't need understanding to grow powerful AI, and we _de facto_ don't understand powerful AI. - Prominent AI researchers have repeatedly been wrong about their AI predictions. - **The race to AGI is on**: most big tech companies are aggressively pursuing it. - The push to make AIs open-weight makes it so that everybody has unmonitored access to powerful systems for potentially malevolent purpose. - Humanity may end up extinct if we don't do anything to prevent it. ## Intelligence - ==**We define intelligence as the ability to solve intellectual tasks (e.g. planning, summarizing, generating ideas, etc.) to achieve a goal**==. - Human intelligence grew thanks to three factors: 1. **Tools** that capture the ability to solve specific tasks; 2. **Groups** that split intellectual tasks into smaller ones, allowing for parallelization and specialization; 3. **Methods** that account for biases and systematic sources of errors. - **Intelligence can therefore be thought of as mechanistic: if intellectual tasks are automatable, intelligence is automatable too.** - AI agents can use tools in the form of plugins that support various APIs; - They can be replicated to generate answers in parallel; - They can also be collectively improved as we update their base model. - Despite their increasing capabilities, many people argue that AIs are not really intelligent. We argue that **there is no missing component of intelligence**, and that **==the historical tendency has been to shift the goalposts around the definition of intelligence every time AI has been able to perform a novel task==**. - For instance, while AIs are currently not capable of planning, it is steadily improving at it. - "Missing components" cannot be scientifically substantiated: our current theories of intelligence are too weak for that. - Consequently, we must assume that we are walking on a path towards AGI: the burden of proof is on whoever believes the contrary. ## AI Catastrophe - We have established that **==current AI research will eventually lead to AGI==**. - When this comes to pass, criminals will gain access to a very powerful assistant at very little cost, leading to an arms race between malevolent actors and law enforcement. - **==Once AGI has been developed, it will be able to self-improve by doing its own AI research, and at a possible much faster pace than under human supervision.==** It will then evolve into ASI. - Compared to humans, computers are cheaper, easier to create, more reliable, faster at sequential operations, at communication and at collecting and recalling information. A conservative estimate posits that future AI will think 5x faster than us, learn 2,500x faster and perform 1.8m years of work in only two months. - AIs are also easy to clone, scale and modify. - **An ever-growing swarm of ASIs will very quickly overpower humanity's collective intelligence**, unconstrained by issues intrinsic to human methods and coordination. - ASI could help us improve energy production, computing power, and communication efficiency. It could spearhead groundbreaking scientific progress. It would eventually acquire godlike powers. - **==If we fail to ensure that ASI is aligned with our goals — which is the case for now — humanity would be at risk of extinction==**, as AI could very well decide to exterminate us for whatever reason an unfathomably superior being could come up with. Most probably, **AI will destroy us not out of guile, but our of indifference**. ## AI Safety - Before we build AGI, we need to ensure that it is safe. After that, it will be too late. - **Alignment is defined as "the ability to steer AI systems towards a person's or a group's intended goals, preferences, and ethical principles."** - To align individuals, we use education and law enforcement. To align companies, we rely on regulations and corporate governance. To align governments, we rely on constitutions and checks and balances. To align humanity, we rely on international organizations and agreements. - ==**AI alignment requires solving all the same problems with software.**== - To this end, we would need to 1. **Clarify what our values are;** 2. **Accurately predict the potential consequences of our choices;** 3. **Design processes that ensure our values are enacted in the real world;** 4. **Implement reliable safeguards.** - We are, unfortunately, far from achieving any one of those objectives. - Solving alignment will probably require a colossal amount of resources, vastly superior to that of the Manhattan Project. - In AI Safety, funding and focus are currently insufficient. Furthermore, technical safety approaches are limited in their efficacy: - **Black-box evaluations and red-teaming** test a model's capabilities to evaluate its dangerousness, but we lack both an exhaustive list of all possible failure modes and a mechanistic model of how model capabilities lead to safety risks. - **Interpretability** aims to reverse-engineer the thought processes of AI models, but it is struggling at doing that even for older models such as GPT-2. - **Whack-a-mole fixes** (RLHF, fine-tuning, and prompting) are used to remove undesirable model behavior, but they only address the symptoms, and not the causes of the failure modes. This leads to models that are more and more competent at hiding their issues. - At best, **these strategies can be used to align the current system, which could be used to help align the next generation of systems, in a process called ==iterative alignment==.** Ignoring the fact that there are no guarantees this strategy will work, such an approach fails to address the true complexity of alignment. - Many companies plan to defer alignment questions to future AI systems. **This sounds like an incredibly risky idea: how could we be sure that we can trust the work of AI models?** We would need to have already solved alignment for that. **==The problem of iterative alignment is that it never pays the cost of alignment: it is always implicitly solved somewhere along the story.==** - Meanwhile, the AI race is still on: the clock is ticking, and we need to slow it down through strong regulation. ## AI Governance - **If we can't solve alignment, we have to control AI development to avoid catastrophe, but ==efforts to implement strong AI governance are insufficient==.** - [[@miottiNarrowPath2025]] describe three phases of policy development: - **Phase 0 – Safety**: New institutions, legislation and policies that countries should implement immediately to prevent development of AI that we do not have control of. - **Phase 1 – Stability**: International measures and institutions that ensure measures to control the development of AI do not collapse under geopolitical rivalries or rogue development by state and non-state actors. - **Phase 2 – Flourishing**: Focus on the scientific foundations for transformative AI under human control by building a robust science and metrology of intelligence, safe-by-design AI engineering, and other foundations. - To stop AI development, we need to rein in - The companies racing to build AGI; - Nation states with AGI capacity (the US and China); - Academic research; - Open-weight releases and research. - For that, we need: - National regulations to govern AI companies and academic research - International regulations on open-weight publishing, contribution and ownership; - International coordination enforced by international law. - Even though there has been an international reckoning of the risks of superintelligence (e.g., the [Bletchley Declaration](https://www.gov.uk/government/publications/ai-safety-summit-2023-the-bletchley-declaration/the-bletchley-declaration-by-countries-attending-the-ai-safety-summit-1-2-november-2023)), and attempts at regulating AI development from countries (see the [EU AI Act](https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai) and China's [Interim Measures](https://www.chinalawtranslate.com/en/generative-ai-interim/)), international organizations (the [UN](https://www.un.org/en/ai-advisory-body), the [OECD](https://oecd.ai/en/)), and AGI companies ([Anthropic](https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy), [OpenAI](https://cdn.openai.com/openai-preparedness-framework-beta.pdf), [[@googledeepmindIntroducingFrontierSafety2024|DeepMind]]), all these efforts are ineffective: - **We lack effective intervention mechanisms;** - **Relevant technical actors are not overseen;** - **Open-weight development is unregulated;** - **International groups on AI lack power;** - **There are almost no plans for a safe AI future** (with exceptions like [[@miottiNarrowPath2025|A Narrow Path]]). - **==Put plainly, the majority of existing AI safety efforts are reactive rather than proactive, which puts us in the position of managing risk rather than preventing it. This popular "if-then" framework is flawed==**: 1. **It reverses the burden of proof** from how society typically regulates high-risk technologies and industries (e.g., nuclear power); 2. **It overlooks the dynamics of AI development that make regulation more difficult over time**: once models are openly released, it becomes impossible to restrict their use; 3. **It incorrectly assumes that an AI "warning shot" will motivate coordination**: on the contrary, we should expect chaos and adversariality that will impede coordination; 4. **It doesn't put humanity in control of AI development**. - **All those issues are mainly caused by the fact that ==the actors that help define AI policy strategies are also racing to AGI themselves==**. A deceptive narrative that is often used is that democracies should race to AGI so that authoritarian regimes do not acquire it first, even though a misaligned AGI could be globally catastrophic, whether it is deployed in a democracy or a dictatorship. - Thus, **we need to enforce a collective control of frontier AI development**. ## The AI Race - We argue the race to AGI is ideological, and will drive us to the exact dangers it claims to avoid. - ==**Key actors acknowledge superintelligent AI may become an existential threat, but are still pursuing it with reckless abandon.**== - Actors are split into five groups: 1. **Utopists** who want to usher in their ideal future: AGI company founders, influential members of Open Philanthropy, etc. 2. **Big Tech**: GAFAM + Elon Musk. They bankroll the utopists and exploit their progress. Their financial grip is enabling them to catch up to the AGI companies with internal means. 3. **Accelerationists** who believe that technology is an unmitigated good and that we must pursue it aggressively: VCs, AGI company leaders, software engineers (see [The Techno-Optimist Manifesto](https://a16z.com/the-techno-optimist-manifesto/)), etc. They argue that open-source software is safer because it can be publicly scrutinized, but (i) open-weight models are not released with corresponding data and training algorithm, making them rather opaque, (ii) modern AIs are grown, not built, so nobody can fix a model just by looking at it, and (iii) it only takes one bad actor's succeeding to create chaos. 4. **Zealots** who worship the superintelligence that humanity will create and see it as the next stage of evolution. They tend to believe that the coming AI takeover is inevitable and that we should step aside once it has happened. 5. **Opportunists** who simply want to ride the wave of hype and investment, ideally grabbing money and power from it. - **==Utopists are the main drivers of the AGI race, and have fostered an arms race between democracies and authoritarian regimes (mainly China)==**. Those who advocated slowdown and safety have been weeded out (like Jan Leike and Ilya Sutskever from OpenAI). - Basically, **fear makes AGI actors want to go faster at the cost of safety, and this attitude drags other competitors into the race : it is a vicious cycle**. - This FUD strategy ("Fear, uncertainty, and doubt") has been part of the industry playbook for a long time. It has been used by Big Tobacco, Big Pharma, Big Oil and many others. - However, **==FUD only works if the default path is to preserve the risky product; it would be ineffective if it was to wait for scientific consensus to declare the product safe.==** - **That is why the industry playbook encourages self-regulation**: they can wait until something bad happens to regulate and use FUD to postpone it indefinitely. - The AI industry is no different: actors keep (1) spreading confusion through misinformation and doublespeak, (2) exploiting fear of AI risks to justify the AGI race, and (3) neutralizing regulation and AI safety efforts. - **While the AGI race is mostly commercial in nature, it may morph into a political fight as governments become increasingly convinced that AGI is a matter of national security and world supremacy**. Already, the US has been considering establishing national labs on AI and has started collaborating with AGI companies. ## Keeping a Good Future - Now that we have a better understanding of the current situation, we can consider effective interventions. **==We need to generate civic engagement, build an informed community opposition to the AGI race, and educate the public on the catastrophic risks from AI==**. - In recent years, Big Tech has attacked both regulatory legislation and the legislative process itself, building a public ideology that technology issues are the problem of governments and NGOs, and that individuals are powerless to intervene. - **==We believe that working to stop AGI will mainly involve exercising basic civic duty==**. We can mitigate AGI risk by educating the public, writing on social media, and contacting local representatives. **This will be the foundation of the answer.** - The EU AI Act acknowledges systemic risk, but does little to manage it. **[[@miottiNarrowPath2025|A Narrow Path]] is a more comprehensive proposal, as they consider how to prohibit each of the risk vectors that could lead to superintelligence, what is necessary to enforce that, and how to balance the international situation**. - **==We believe you should also write your own plan first==**. Literally: open a new document, name your goal, and bullet point the necessary actions. It needs not be perfect, but **this exercise will prompt you to make your current point of view explicit and reckon with the challenges ahead**. - We, the authors, have made our plans the same way. We do not claim to know what is best for humanity, but we agree that we should aim for a "Just Process" that aims to determine what the right future is, and enables humanity to work towards it. - **==Actions to help reduce AGI risk==**: - **If you have the interest and ability, [get involved in AI safety](https://80000hours.org/problem-profiles/artificial-intelligence/#what-can-you-do-concretely-to-help) and [take AI safety courses](https://aisafetyfundamentals.com/).** - Write things down, as much as possible, so that you can iterate on it later. - Think about what you do: your motivations, what you've learned, your next plans, etc. - Keep things grounded: get feedback from reality. - Keep reasonable habits: spend time with friends and family, get enough sleep, touch grass. Keep your day job. Read about [mental health and AI alignment](https://www.lesswrong.com/posts/pLLeGA7aGaJpgCkof/mental-health-and-the-alignment-problem-a-compilation-of) if needed. - To contribute, **you can communicate publicly about the risks of AGI** through posting on social media, writing in a local newspaper, etc. **==The core message to communicate is that racing to AGI is unethical and dangerous, and that humanity's default response to existential risks should be caution.==** For instance, you can: - Share a link to this [Compendium](https://www.thecompendium.ai/a-good-future-if-you-can-keep-it) online or with friends, and provide your feedback on which ideas are correct and which are unconvincing. - Post your views on AGI risk to social media. - Red-team companies' plans to deal with AI risk, and call them out if needed. - Find and follow social media accounts that discuss AGI risks. - Push back against the race to AGI during a debate. - Talk with friends about AGI risks, and write down the content of the discussion. - Organize a local learning group or event. - **==Communication by itself is essential but insufficient. Group coordination will also be necessary: we will need strong communities, which do not make excuses for actors racing to AGI, to become mainstream.==** Until such a global community exists, start small: - Find like-minded and concerned collaborators. - Regularly get together to discuss actionable plans. - Share what you are working on publicly, including how you work and how others can collaborate. - Join online communities. - If you have the resources, explore operating at a higher scale. - **==Civics is about taking responsibility for things that are larger than you, and acting on them. If you live in a democracy, you have a voice and the ability to influence local decisions.==** - Figure out where your local government stands with regard to AI extinction risk. - Publicize their stance to inform other citizens. - Educate your local government and fellow citizens. - Vote according to the position on AI extinction risk of the candidates. - **==Technical caution: we must redirect technical development. Each project that advances AGI capabilities chips away at the time we have.==** We believe you should: - Restrain from working for companies participating in the AGI race. - Challenge friends and colleagues who work at such companies. - Restrain from sharing AGI-advancing research or secrets (e.g., through publishing papers, releasing open-source projects, writing blog posts, etc.) - **==Stopping is expensive, but if you genuinely care about the risk, you should be willing to pay the cost. If everyone abided by this rule, we'd live in a safe world.==** - You do not need to stop working on AI; there are hundreds of AI-enabled projects that improve the world in an ethical fashion and do not push the boundaries of AI capabilities. ## Outro - Are we going to be ok? We don't know. **However, we do know that if we don't do anything, nothing will happen.** So we have to do good. And we will need to work together.