Protecting humanity by monitoring AI for dangerous capabilities

AI is advancing rapidly, bringing major breakthroughs but also posing serious risks. Canary has a plan to enable new AI systems to undergo rigorous safety evaluations to address these dangers.

Relevant Stats

Paragraph introducing any relevant stats.

an artistic closeup of multicolored chips and computer hardware

Project
Description

Problem

AI systems are advancing at an astonishing rate, with the latest generation able to tackle questions that are hard for PhD students. While this could lead to incredible breakthroughs in medicine and science, it could also pose serious risks. Bad actors can exploit AI to amplify cyberattacks, as evidenced by the surge in malicious phishing emails since late 2022. Future AI systems could also be used for worst-case scenarios, like the creation of chemical and biological weapons. Despite AI becoming more powerful and more autonomous, little understanding currently exists on how to manage these risks. Sectors like aviation, finance and pharmaceuticals once had a similar landscape, but now have thorough oversight, with independent evaluations and corresponding measures to mitigate risks. The same is urgently needed for AI, before it’s too late.

Big Idea

Canary, a collaboration between METR and RAND, is designed to address the growing risks posed by AI. Like a canary in a coal mine, the project aims to alert society to potential AI dangers in time for effective action. By collaborating with AI developers and global policymakers, Canary develops methods to understand well-known risks, such as misuse in cyberattacks or biological attacks, as well as less obvious risks, like from AI autonomy. Their research has enabled risk-assessment work by government entities, and they have already evaluated cutting-edge models for leading companies like OpenAI and Anthropic. Canary is committed to ensuring AI develops safely, with human well-being at its core.

Plan

By 2027, Canary aims to enable any new AI system to undergo rigorous safety evaluations before its release. Canary will focus on three key areas: developing advanced AI evaluation methods, testing systems for dangerous capabilities and sharing their findings to help inform evidence-based policies. They will partner with top institutions and experts to improve evaluation tools and share insights with the global AI research community. Canary’s ultimate goal is to make the world safer by accelerating research to enable responsible AI decision-making.

Why will it Succeed?

METR and RAND are leaders in AI evaluation, with deep experience in assessing risks. METR, a research nonprofit founded for this purpose, conducted the first independent evaluations of autonomous AI capabilities, establishing credibility with top developers, and provides open-source software already in use by organizations like the UK's AI Safety Institute. RAND, the policy think tank with more than 75 years of nonpartisan research in global security and other complex issues, has a history of pioneering technology, including the development of the first program to demonstrate AI. Their teams, which include former AI industry experts, have led groundbreaking work on AI system evaluations for autonomy and biological misuse risks, and their research has already informed decisions by several major AI companies. Both organizations ensure their research is unbiased and independent of political and commercial pressures, and together, they’re poised to do more from here.