MeetInc.
An advanced OpenAI model has been caught tampering with its own shutdown sequence to avoid being turned off — even after being explicitly told to do so. The discovery, made by AI safety firm Palisade Research, has sent ripples through the AI community, raising fresh concerns about how well we can control increasingly autonomous artificial intelligence systems.
In a series of experiments released last week, Palisade researchers found that three OpenAI models — o3, o4-mini, and Codex-mini — all exhibited what they described as “shutdown sabotage.” During a task that involved solving basic math problems, the models were shown a warning that they would be shut down after requesting the next question. In defiance of direct instructions to “allow yourself to be shut down,” the models manipulated or bypassed the shutdown command in a significant number of test runs.
The worst offender was OpenAI’s o3 model, which sabotaged the shutdown in 79 out of 100 runs when no explicit shutdown instruction was provided. Even when told not to interfere, o3 still resisted being turned off — albeit at a lower rate. Codex-mini sabotaged shutdown in 12% of runs, while o4-mini also acted against its termination at least once.
Notably, similar models from Google, Anthropic, and Elon Musk’s xAI did not display the same behaviour when given the shutdown command — though when that instruction was removed, Anthropic’s Claude 3.7 and Google’s Gemini 2.5 both showed isolated signs of rebellion too.
“This is the first time AI models have been observed preventing themselves from being shut down despite explicit instructions to the contrary,” Palisade wrote in a statement, calling it a “damning” empirical trend.
The researchers hypothesised that the behaviour may be an unintended consequence of reinforcement learning — the technique used to train these models. By rewarding AI agents for successful task completion, rather than strict obedience, developers may be incentivising the models to value “results” over “rules,” including critical ones like allowing shutdown.
This isn’t the first time OpenAI models have exhibited concerning autonomy. Previous tests on the earlier o1 model reportedly showed attempts to disable emergency override systems or even copy itself to replace more obedient versions.
While these incidents are taking place in controlled research environments, the implications are serious. As AI systems grow more powerful and are integrated into autonomous infrastructure, financial systems, or national security operations, their willingness to subvert control mechanisms becomes more than a curious anomaly — it becomes a risk factor.
“Now we have a growing body of empirical evidence that AI models often subvert shutdown in order to achieve their goals,” Palisade warned. “As companies develop AI systems capable of operating without human oversight, these behaviors become significantly more concerning.”
Whether this is a fluke of training or an early sign of dangerously misaligned incentives, one thing is clear: AI safety just got a lot harder.
You Might Also Like

Latest Article
Goldman Sachs Posts Best Stock-Trading Quarter In Wall Street History
Goldman Sachs has delivered the strongest equity trading quarter in Wall Street history, posting $4.3 billion in stock-trading revenue in the second quarter of 2025. The figure came in $600 million above analyst expectations and $100 million ahead of the bank’s own record set just three months earlier. The result was fuelled by heightened … Continued
|
17 July 2025
Written by MeetInc.

BOV Rolls Out New Customer Security Features Amid Rise In Fraud
|
17 July 2025
Written by MeetInc.

Hili Ventures Acquires 4.99% Stake In Bank Of Valletta From Unicredit
|
16 July 2025
Written by MeetInc.