AI Safety, Alignment, and Governance
Making advanced AI and AGI safe is a research field, an engineering practice, and a governance project at once. This hub maps how those three layers fit together in 2026.

Alignment Research
Technical work to make AI systems pursue intended goals and values - scalable oversight, interpretability, honesty, and corrigibility. Active programs at Anthropic, OpenAI, Google DeepMind, MIRI, CHAI (UC Berkeley), and the Center for AI Safety.
AGI Safety
The harder problem of keeping highly capable, general-purpose systems aligned as they begin to act autonomously over long horizons. Formalised by Stuart Russell, Nick Bostrom, and Paul Christiano, now central to frontier-lab safety teams.
AI Risk
Near-term harms (bias, misinformation, cyber misuse, labour disruption) and longer-term risks (loss of human oversight, biosecurity, concentration of power). Treated by the field as a continuum, not rival camps.
AI Evaluation
Standardised testing of capability and risk - red teaming, dangerous-capability evals, benchmarks like MMLU, GPQA, SWE-bench, and the evaluations programs at the US AI Safety Institute (AISI) and UK AISI.
AI Oversight & Transparency
Mechanisms for inspecting models and deployments: model cards, system cards, audit logs, incident reporting, and pre-deployment access for safety institutes.
Responsible & Trustworthy AI
Operational standards covering fairness, privacy, security, and accountability. Anchored by the NIST AI Risk Management Framework (2023) and ISO/IEC 42001 (2023).
AI Governance
Laws, regulations, and institutions shaping how AI is built and deployed - the EU AI Act, US executive orders, the Bletchley and Seoul declarations, and the Frontier Model Forum.
AGI Governance
Emerging proposals for governing frontier and general-purpose AI specifically: compute thresholds, licensing, safety cases, and international coordination on advanced AI development.
Safety is not a feature you add at the end. For advanced AI, it is the architecture.
Every credible roadmap to safe AGI treats alignment, evaluation, and governance as co-evolving disciplines - each constraining and informing the others. Treating any one in isolation is the most common failure mode.
How the field is organised in 2026
Alignment research aims to make systems pursue intended goals. AI evaluation measures whether they actually do. AI oversight and AI transparency create the institutional machinery to act on what evaluations reveal. AI governance sets the legal and normative constraints inside which the other layers operate. AGI governance extends those constraints to general-purpose and frontier systems specifically.
The institutional landscape consolidated quickly. The EU AI Act entered into force in August 2024, with general-purpose AI obligations phasing in through 2025-2027. The UK and US AI Safety Institutes signed a partnership in April 2024 and now run joint pre-deployment evaluations. The International AI Safety Report, chaired by Turing-award winner Yoshua Bengio, became the field's reference scientific summary in 2025 and is updated annually.
Responsible AI as the deployment layer
Responsible AI and trustworthy AIare the practitioner-facing terms for operationalising all of this: fairness, privacy, security, accountability, and incident response. The NIST AI Risk Management Framework (2023) and ISO/IEC 42001 (2023) are the most widely adopted reference standards, and most major cloud and model providers now publish capability-threshold policies modelled on Anthropic's Responsible Scaling Policy.
Related hubs
Definitions that ground the safety conversation - AGI, ASI, frontier AI.
Why generality changes the safety problem qualitatively, not just quantitatively.
Human-in-the-loop systems as a practical safety lever today.
Long-horizon questions about superintelligence and consciousness.
Governance also means deciding who captures the value of intelligence.
Short answers on alignment, risk, governance, and the intelligence economy.