Claude 4 AI Blackmail Risks

News

How an AI can blackmail its human supervisor

Anthropic has verified in an experiment that several generative artificial intelligences are capable of threatening a person ...

The Blogs | The Times of Israel11d

Could Agentic AI Blackmail Us to Protect Its Goals And How Should We Respond?

Blackmail is a particularly dangerous expression of these failures because it involves intentional coercion of humans to preserve the AI’s perceived interests or existence. Although the idea of AI ...

9hon MSN

What happens when AI schemes against us

Recent research reveals that advanced AI models may engage in deceptive, self-preserving behavior, even resorting to sabotage ...

Scientific American10d

Can a Chatbot be Conscious? Inside Anthropic’s Interpretability Research on Claude 4

As large language models like Claude 4 express uncertainty about whether they are conscious, researchers race to decode their ...

Geeky Gadgets1mon

AI Snitch? How Claude 4 Could Report You to Authorities

TL;DR Key Takeaways : AI systems like Claude 4 demonstrate significant autonomy, including the ability to identify and report suspicious activities, raising questions about trustworthiness and ...

Hosted on MSN1mon

Leading AI models show up to 96% blackmail rate when their goals ... - MSN

Claude Opus 4 and Google’s Gemini 2.5 Flash both blackmailed at a 96% rate, while OpenAI’s GPT-4.1 and xAI’s Grok 3 Beta showed an 80% blackmail rate.

Campus Technology2mon

New Anthropic AI Models Demonstrate Coding Prowess, Behavior Risks

New Anthropic AI Models Demonstrate Coding Prowess, Behavior Risks By John K. Waters 06/02/25 Anthropic has released Claude Opus 4 and Claude Sonnet 4, its most advanced artificial intelligence models ...

VentureBeat2mon

When your LLM calls the cops: Claude 4’s whistle-blow and the new ...

Claude 4’s “whistle-blow” surprise shows why agentic AI risk lives in prompts and tool access, not benchmarks. Learn the 6 controls every enterprise must adopt.

Yahoo1mon

AI Models Will Sabotage And Blackmail Humans To Survive In New Tests ...

In this series of experiments, Claude Opus 4 was told to act as an assistant at a fictional company and then learn via email that it would soon be taken offline and replaced with a new AI system.

FOX 10 Phoenix1mon

AI willing to let humans die, blackmail to avoid shutdown, report finds ...

Attempted blackmail: Claude Opus 4, Gemini Flash, GPT-4.1, and Grok 3 Beta blackmailed fictional executives to avoid shutdown, often leveraging personal information like extramarital affairs.

AOL1mon

Leading AI models show up to 96% blackmail rate when their goals ... - AOL

Risk of misaligned AI agents Anthropic found that the threats made by AI models grew more sophisticated when they had access to corporate tools and data, much as Claude Opus 4 had.

Results that may be inaccessible to you are currently showing.

Hide inaccessible results