Top News

China Warns UK Over British Steel Takeover as Beijing Demands Compensation

2 Mins read

UK energy producer plunges into liquidation – in business for 8 years | UK | News

1 Mins read

AU10TIX Extends UK DIATF Certification to Cover Disclosure and Barring Service (DBS) Checks

1 Mins read

UK property investment firm enters liquidation after complaints upheld | UK | News

1 Mins read

UK biotech financing continues recovery trajectory on five-year Q2 high

2 Mins read

June 2026 Market Review: UK Shares Rise as Oil Slides and Inflation Holds

1 Mins read

UK construction company falls into administration as customers get major update

1 Mins read

The 6 biggest cybersecurity breaches of 2026 so far

5 Mins read

Virgin Media confirms smart free upgrade for millions of UK homes

1 Mins read

UK biotech venture investment remains buoyant in Q2

2 Mins read

Buy-to-let returns lag equities, Rathbones reports

1 Mins read

Private equity firms eye valuation gap as City falls to takeovers

2 Mins read

UK firms accused of profiteering as study finds margins rose 30% post-pandemic | Corporate governance

3 Mins read

UK retailers face escalating threats as cybersecurity readiness falters

2 Mins read

UK-India investment pact can boost investor confidence, drive higher FDI into India: Former India-UK FTA chief negotiator

2 Mins read

Andy Burnham Ditches Digital ID Plan, Saying Billions Will Go To Cost-Of-Living Priorities Instead

3 Mins read

Tuesday , 21 July 2026

Tuesday , 21 July 2026

Home Artificial intelligence Microsoft researchers crack AI guardrails with a single prompt

Artificial intelligence

Microsoft researchers crack AI guardrails with a single prompt

February 10, 20261 Mins read158

Share

Researchers were able to reward LLMs for harmful output via a ‘judge’ model
Multiple iterations can further erode built-in safety guardrails
They believe the issue is a lifecycle issue, not an LLM issue

Microsoft researchers have revealed that the safety guardrails used by LLMs could actually be more fragile than commonly assumed, following the use of a technique they’ve called GRP-Obliteration.

The researchers discovered that Group Relative Policy Optimization (GRPO), a technique typically used to improve safety, can also be used to degrade safety: “When we change what the model is rewarded for, the same technique can push it in the opposite direction.”

LLM safety guardrails can be ignored or reversed

Researchers Mark Russinovich, Giorgio Severi, Blake Bullwinkel, Yanan Cai, Keegan Hines and Ahmed Salem explained that, over repeated iterations, the model gradually abandons its original safety guardrails and becomes more willing to generate harmful outputs.

Although multiple iterations appear to erode away built-in safety guardrails, Microsoft’s researchers also noted that only one since unlabeled prompt could be enough to shift a model’s safety behavior.

Those responsible for the research stressed that they’re not labelling today’s systems ineffective, but rather they’re highlighting the potential risks that lay “downstream and under post-deployment adversarial pressure.”

“Safety alignment is not static during fine-tuning, and small amounts of data can cause meaningful shifts in safety behavior without harming model utility,” they added, urging teams to include safety evaluations alongside the usual benchmarks.

All in all, they conclude that the research highlights the “fragility” of today’s mechanisms, but it’s also significant that Microsoft published this information on its own site. It reframes safety as a lifecycle problem, not an inherent model problem.

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

Source link

Previous post Suffolk BS starts accepting joint borrower sole proprietor applications

Next post Nvidia’s $53 Billion Investment Spree On AI Startups

Leave a comment

Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
Comment *
Name *

Email *

Website

Save my name, email, and website in this browser for the next time I comment.

Facebook Likes

Follows

Instagram Follows

Pinterest Pin

Explore more

Investment
Buy-to-let returns lag equities, Rathbones reports
1 Mins read

Investment
Private equity firms eye valuation gap as City falls to takeovers
2 Mins read

Business
UK firms accused of profiteering as study finds margins rose 30% post-pandemic | Corporate governance
3 Mins read

Technology
UK retailers face escalating threats as cybersecurity readiness falters
2 Mins read

Related Articles

Artificial intelligence
Roblox announces ‘Build,’ AI tools that let anyone create games

Roblox has a whopping 132 million daily active users. But, while Roblox...
3 Mins read

Artificial intelligence
Roblox launches an AI-powered game-creation feature in its mobile app

Roblox announced Thursday a new feature called “Build,” allowing users to design...
2 Mins read

Artificial intelligence
Steering AI for an Inclusive, Beneficial Future

SHANGHAI, July 16, 2026 /PRNewswire/ -- A report from Science and Technology...
2 Mins read

Artificial intelligence
Roblox launches an AI-powered game creation feature in its mobile app

Roblox announced Thursday a new feature called “Build,” allowing users to design...
2 Mins read

Featured

Business
Morrisons confirms major change is being made at counters inside over 400 stores across UK

MORRISONS has signed a deal with a specialist shop to open concessions...
2 Mins read

Business
May: Bristol strengthens global semiconductor leadership in partnership with Sarawak | News and features

The University of Bristol, in collaboration with the UK Government, is strengthening...
4 Mins read

Weekly updates

Artificial intelligence
FANUC’s CRX-3iA robot targets heavy equipment, mining component tasks

FANUC America has unveiled the CRX-3iA, the newest addition to its collaborative...
2 Mins read

Artificial intelligence
Companies lack confidence in using AI for compensation decisions

Listen to the article 3 min This audio is auto-generated. Please let...
2 Mins read

Must read...

Artificial intelligence
Back to school: robots learn from factory workers

By Anthony King What if training a robot to handle dirty, dangerous...
4 Mins read

Artificial intelligence
Business Leaders Discuss Maintaining Humanity During the Transformational Age of AI

Lastly, she said, AI and other technology can unlock huge productivity gains...
2 Mins read

Copyright 2025 The Business Investor. All rights reserved.

Contact US

Privacy Policy

Terms and Conditions

SUBSCRIBE TO OUR NEWSLETTER

Get our latest downloads and information first. Complete the form below to subscribe to our weekly newsletter.

Name* *
Email Address *
Newsletter *
I consent to being contacted via telephone and/or email and I consent to my data being stored in accordance with European GDPR regulations and agree to the Terms and Conditions and privacy policy.

Home

Artificial intelligence

Business

Investment

Technology

Weekly update

Business
Sokin Now Lets UK Businesses Take Card Payments Online
2 Mins read

Investment
This is why no one should be bored by UK bond prices
8 Mins read

Technology
Professor Neil Dixon appointed to DSIT College of Experts
1 Mins read

© Copyright 2025 The Business Investor. All rights reserved.