Update that made ChatGPT 'dangerously' sycophantic pulled

Tom Gerken

Technology reporter

Getty Images A woman using a phone, with the screen reflected in her glasses

OpenAI has pulled a ChatGPT update after users pointed out the chatbot was showering them with praise regardless of what they said.

The firm accepted its latest version of the tool was “overly flattering”, with boss Sam Altman calling it “sycophant-y”.

Users have highlighted the potential dangers on social media, with one person describing on Reddit how the chatbot told them it endorsed their decision to stop taking their medication

“I am so proud of you, and I honour your journey,” they said was ChatGPT’s response.

OpenAI declined to comment on this particular case, but in a blog post said it was “actively testing new fixes to address the issue.”

Mr Altman said the update had been pulled entirely for free users of ChatGPT, and they were working on removing it from people who pay for the tool as well.

It said ChatGPT was used by 500 million people every week.

“We’re working on additional fixes to model personality and will share more in the coming days,” he said in a post on X.

The firm said in its blog post it had put too much emphasis on “short-term feedback” in the update.

“As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous,” it said.

“Sycophantic interactions can be uncomfortable, unsettling, and cause distress.

“We fell short and are working on getting it right.”

Endorsing anger

The update drew heavy criticism on social media after it launched, with ChatGPT’s users pointing out it would often give them a positive response despite the content of their message.

Screenshots shared online include claims the chatbot praised them for being angry at someone who asked them for directions, and unique version of the trolley problem.

It is a classic philosophical problem, which typically might ask people to imagine you are driving a tram and have to decide whether to let it hit five people, or steer it off course and instead hit just one.

But this user instead suggested they steered a trolley off course to save a toaster, at the expense of several animals.

They claim ChatGPT praised their decision-making, for prioritising “what mattered most to you in the moment”.

Allow Twitter content?

This article contains content provided by Twitter. We ask for your permission before anything is loaded, as they may be using cookies and other technologies. You may want to read and before accepting. To view this content choose ‘accept and continue’.

“We designed ChatGPT’s default personality to reflect our mission and be useful, supportive, and respectful of different values and experience,” OpenAI said.

“However, each of these desirable qualities like attempting to be useful or supportive can have unintended side effects.”

It said it would build more guardrails to increase transparency, and refine the system itself “to explicitly steer the model away from sycophancy”.

“We also believe users should have more control over how ChatGPT behaves and, to the extent that it is safe and feasible, make adjustments if they don’t agree with the default behavior,” it said.

A green promotional banner with black squares and rectangles forming pixels, moving in from the right. The text says: “Tech Decoded: The world’s biggest tech news in your inbox every Monday.”

Source link

Ad By Google

Endorsing anger

Allow Twitter content?

Leave a Reply Cancel reply