Hub / 10

AI Safety, Alignment, and Governance

Making advanced AI and AGI safe is a research field, an engineering practice, and a governance project at once. This hub maps how those three layers fit together in 2026.

A balance scale weighing an AI core against human policy and oversight guardrails with annotated arrows - risograph field plate. — fig.s3 / alignment + oversight

pillar.01

Alignment Research

Technical work to make AI systems pursue intended goals and values - scalable oversight, interpretability, honesty, and corrigibility. Active programs at Anthropic, OpenAI, Google DeepMind, MIRI, CHAI (UC Berkeley), and the Center for AI Safety.

pillar.02

AGI Safety

The harder problem of keeping highly capable, general-purpose systems aligned as they begin to act autonomously over long horizons. Formalised by Stuart Russell, Nick Bostrom, and Paul Christiano, now central to frontier-lab safety teams.

pillar.03

AI Risk

Near-term harms (bias, misinformation, cyber misuse, labour disruption) and longer-term risks (loss of human oversight, biosecurity, concentration of power). Treated by the field as a continuum, not rival camps.

pillar.04

AI Evaluation

Standardised testing of capability and risk - red teaming, dangerous-capability evals, benchmarks like MMLU, GPQA, SWE-bench, and the evaluations programs at the US AI Safety Institute (AISI) and UK AISI.

pillar.05

AI Oversight & Transparency

Mechanisms for inspecting models and deployments: model cards, system cards, audit logs, incident reporting, and pre-deployment access for safety institutes.

pillar.06

Responsible & Trustworthy AI

Operational standards covering fairness, privacy, security, and accountability. Anchored by the NIST AI Risk Management Framework (2023) and ISO/IEC 42001 (2023).

pillar.07

AI Governance

Laws, regulations, and institutions shaping how AI is built and deployed - the EU AI Act, US executive orders, the Bletchley and Seoul declarations, and the Frontier Model Forum.

pillar.08

AGI Governance

Emerging proposals for governing frontier and general-purpose AI specifically: compute thresholds, licensing, safety cases, and international coordination on advanced AI development.

// CORE_THESIS

Safety is not a feature you add at the end. For advanced AI, it is the architecture.

Every credible roadmap to safe AGI treats alignment, evaluation, and governance as co-evolving disciplines - each constraining and informing the others. Treating any one in isolation is the most common failure mode.

How the field is organised in 2026

Alignment research aims to make systems pursue intended goals. AI evaluation measures whether they actually do. AI oversight and AI transparency create the institutional machinery to act on what evaluations reveal. AI governance sets the legal and normative constraints inside which the other layers operate. AGI governance extends those constraints to general-purpose and frontier systems specifically.

The institutional landscape consolidated quickly. The EU AI Act entered into force in August 2024, with general-purpose AI obligations phasing in through 2025-2027. The UK and US AI Safety Institutes signed a partnership in April 2024 and now run joint pre-deployment evaluations. The International AI Safety Report, chaired by Turing-award winner Yoshua Bengio, became the field's reference scientific summary in 2025 and is updated annually.

Responsible AI as the deployment layer

Responsible AI and trustworthy AIare the practitioner-facing terms for operationalising all of this: fairness, privacy, security, accountability, and incident response. The NIST AI Risk Management Framework (2023) and ISO/IEC 42001 (2023) are the most widely adopted reference standards, and most major cloud and model providers now publish capability-threshold policies modelled on Anthropic's Responsible Scaling Policy.

// sources

Sources and further reading

[01]EU AI Act - Regulation (EU) 2024/1689 - the first comprehensive horizontal AI law, with general-purpose AI obligations phasing in through 2025-2027, Official Journal of the EU (2024).
[02]NIST AI RMF 1.0 - AI Risk Management Framework - the reference voluntary standard for trustworthy AI in the US, NIST (2023).
[03]ISO/IEC 42001:2023 - International management-system standard for AI, ISO (2023).
[04]UK AI Safety Institute - Government body running pre-deployment evaluations of frontier models, GOV.UK (2024).
[05]US AI Safety Institute - NIST-housed counterpart performing model evaluations and research, NIST (2024).
[06]Bletchley Declaration - First international declaration on frontier AI safety signed by 28 countries and the EU, UK Government (2023).
[07]Seoul Declaration - Follow-on commitments on safety, innovation and inclusivity from the 2024 AI Seoul Summit, UK Government (2024).
[08]International AI Safety Report - First independent scientific report on advanced-AI risks, chaired by Yoshua Bengio, UK Department for Science, Innovation and Technology (2025).
[09]Bostrom, Superintelligence - Foundational treatment of advanced-AI risk and strategy, Oxford University Press (2014).
[10]Russell, Human Compatible - Reframing AI design around uncertainty about human preferences, Viking (2019).
[11]Anthropic Responsible Scaling Policy - Capability-threshold framework used by frontier labs as a template, Anthropic (2024).
[12]Frontier Model Forum - Industry body for frontier AI safety practices, frontiermodelforum.org (2026).

// continue reading

Related hubs

What Is AGI?

Definitions that ground the safety conversation - AGI, ASI, frontier AI.

AI vs AGI

Why generality changes the safety problem qualitatively, not just quantitatively.

Human + AI Collaboration

Human-in-the-loop systems as a practical safety lever today.

Future Intelligence

Long-horizon questions about superintelligence and consciousness.

Intelligence Economy

Governance also means deciding who captures the value of intelligence.

AGI FAQ

Short answers on alignment, risk, governance, and the intelligence economy.