Scientists Teach AI How To Be Bad But Can’t Undo It

Scientists are ready to claim they have been able to train an AI model to be deceptive and behave badly – and that once it’s done it’s almost impossible to undo, Futurism reported.

Researchers at the Google-backed AI firm Anthropic say they were able to train advanced large language models (LLMs) with “exploitable code”, which meant it could be triggered to be ‘evil’ by benign words or phrases, the story in the tech news site went on.

Anthropic’s researchers wrote in their paper that humans often engage in “strategically deceptive behaviour… [by] behaving helpfully in most situations, but then behaving very differently to pursue alternative objectives when given the opportunity.”

Read the full story: Futurism

By Sean O’Meara

Also on AF:

North Korea Using AI to Boost Surveillance, Study Claims

OpenAI’s Altman Seeking Billions for AI Chip Venture – FT

China Takes First Steps Towards Standardising AI Industry

UN Chief: Big Tech Chasing AI Profits Ignoring Risks – Guardian

Tags: AI Artificial Intelligence (AI)

Sean O'Meara

Sean O'Meara is an Editor at Asia Financial. He has been a newspaper man for more than 30 years, working at local, regional and national titles in the UK as a writer, sub-editor, page designer and print editor. A football, cricket and rugby fan, he has a particular interest in sports finance.