Type to search

Scientists Teach AI How To Be Bad But Can’t Undo It – Futurism

Researchers fear efforts to rein in a deceptive model could reinforce its bad behaviour, as it learns how to hide its transgressions

Artificial intelligence, facial recognition stock image
Artificial intelligence is likely to hit 40% of jobs in emerging economies. Image: Canva


Scientists are ready to claim they have been able to train an AI model to be deceptive and behave badly – and that once it’s done it’s almost impossible to undo, Futurism reported.

Researchers at the Google-backed AI firm Anthropic say they were able to train advanced large language models (LLMs) with “exploitable code”, which meant it could be triggered to be ‘evil’ by benign words or phrases, the story in the tech news site went on.

Anthropic’s researchers wrote in their paper that humans often engage in “strategically deceptive behaviour… [by] behaving helpfully in most situations, but then behaving very differently to pursue alternative objectives when given the opportunity.”

Read the full story: Futurism


  • By Sean O’Meara


Also on AF:

North Korea Using AI to Boost Surveillance, Study Claims

OpenAI’s Altman Seeking Billions for AI Chip Venture – FT

China Takes First Steps Towards Standardising AI Industry

UN Chief: Big Tech Chasing AI Profits Ignoring Risks – Guardian



Sean O'Meara

Sean O'Meara is an Editor at Asia Financial. He has been a newspaper man for more than 30 years, working at local, regional and national titles in the UK as a writer, sub-editor, page designer and print editor. A football, cricket and rugby fan, he has a particular interest in sports finance.


AF China Bond