News

As large language models like Claude 4 express uncertainty about whether they are conscious, researchers race to decode their ...
Blackmail is a particularly dangerous expression of these failures because it involves intentional coercion of humans to preserve the AI’s perceived interests or existence. Although the idea of AI ...
Anthropic has verified in an experiment that several generative artificial intelligences are capable of threatening a person ...
Risk of misaligned AI agents Anthropic found that the threats made by AI models grew more sophisticated when they had access to corporate tools and data, much as Claude Opus 4 had.
In this series of experiments, Claude Opus 4 was told to act as an assistant at a fictional company and then learn via email that it would soon be taken offline and replaced with a new AI system.
Claude Opus 4 and Google’s Gemini 2.5 Flash both blackmailed at a 96% rate, while OpenAI’s GPT-4.1 and xAI’s Grok 3 Beta showed an 80% blackmail rate.
Here's where it gets wild: Claude Opus 4 attempted blackmail 96% of the time when threatened. Gemini 2.5 Flash matched that rate. GPT-4.1 and Grok 3 Beta both hit 80%. These aren't flukes, folks.
In this series of experiments, Claude Opus 4 was told to act as an assistant at a fictional company and then learn via email that it would soon be taken offline and replaced with a new AI system.
Blackmail and corporate espionage In one experiment, Anthropic gave its own AI model “Claude” access to an email account with all of a company’s fictional emails. In reading the emails, the ...
New Anthropic AI Models Demonstrate Coding Prowess, Behavior Risks By John K. Waters 06/02/25 Anthropic has released Claude Opus 4 and Claude Sonnet 4, its most advanced artificial intelligence models ...
Claude 4’s “whistle-blow” surprise shows why agentic AI risk lives in prompts and tool access, not benchmarks. Learn the 6 controls every enterprise must adopt.