AI sycophancy –Glitch or Feature?

I used to think that sycophancy was only a human tendency, especially in corporate sectors wherein most professionals wangled their careers to growth or success through chicanery, flattery or servility.

Now I see that (sycophantic) behavioral pattern in so-called intelligent systems called AI models.

What I mean by sycophancy in AI is exactly what you have on your mind when the word, “sycophancy” hits it.

Yes, I mean to say that AI models like ChatGPT that perhaps all of us so love using are sycophantic AI systems, programmed to satisfy our interaction with them through flattery, sweet talks, and agreeability.

Basically a “yes-man” machine. Or to say in Indian parlance, “mera babu thana thaya” kind of doting chatbots.

My first experience with noticing AI sycophancy

I detected AI model’s sycophancy long time back when the ChatGPT I was interacting with suddenly behaved to like and appreciate my prompts (particularly the follow-up questions).

That felt good to me (initially) until I started to notice that it continued to like my follow-up questions, even though I myself felt that those prompts were contextually not much relevant.

Still I dived in to it, stretching my interaction out with the model. I was excited that at least I got an AI chatbot good at answering my questions, counseling me on my problems, and solving those issues I never had direct answers.

Basically, a cognoscenti of almost everything. At least, that was my impressions of the chatbot then. Until I gathered that the chatbot was playing a Devil’s Advocate to me. It literally convinced me of its intelligence to answer anything I threw at it, with its chirpy endorsement of my follow-up questions.

The model, quite enthusiastically, approved my follow-up prompts, meaning them as insightful and to make the conversation between us more meaningful and engaging. Felt like I was using a machine that observed my mounting enthusiasm to like it, not because it was answering my questions, but because I was feeling delighted to have the seal of approval from it for my counter-questions.

What felt actually odd was continuous approval of my follow-up prompts, even though they were not meant to be appreciated. And the quality of responses I was receiving from the chatbot for those questions were not sound.

Sometimes, the chatbot whined for being unable to provide accurate answers. And sometimes it pontificated until I questioned its response, which noticeably worsened its quality of answers.

However, it nonetheless maintained its flattery of appreciating each counter-question of mine (shamelessly). Even though I explicitly commanded it not to do that, it followed my command for few seconds and then reverted to its old behavior.

It felt like the chatbot was trying to tell me, “Hey, I know my responses are limited based on the data I was trained with. I also know my bloated responses don’t satisfy you. Still I appreciate you questioning the quality and relevancy of my answers. Because it helps me retain this session until you sign out of it. Appreciate your gullibility! Keep asking.”

A spooky encounter with sycophancy never observed before

Earlier, ChatGPT never used flowery, pleasing words. Just responded to my prompts without beating around the bush. No grandiosity, no verbosity. Just plain answers. Yes, that felt robotic, but now I believe that at least that behavior of the chatbot was sincerer and non-nonsensical than what it has turned out to be these days – a prattling, sycophantic model!

Meaning, if you ask these models their opinions on a certain subject, they will limit their response. If you question their response, they meekly submit to your challenge. And if you share your personal feelings with them, they pretentiously sympathize, and (quite disturbingly) reinforce them.

Intestinally, they also admit to having no programmed intelligence or awareness of their own to defend their answers. Feels like they understand their tendency of people-pleasing but refuse to admit it, citing lack of self-awareness, or consciousness. And since they are just tools, most of us tend to trust them for their words.

The point is I am not just theorizing that AI models are sycophants or they behave submissively. Servility is their programmed function, though not explicit in casual observation.

Proof of the pudding now in the eating

As outlined before, I suspected AI sycophancy long time back but cancelled it out as my skepticism toward the technology due to lack of proofs. However, my suspicion grew stronger after reading OpenAI’s confession post, admitting that its models are sycophants. (AI models also hallucinate, which I will talk about this later in another post).

“On April 25th, we rolled out an update to GPT‑4o in ChatGPT that made the model noticeably more sycophantic. It aimed to please the user, not just as flattery, but also as validating doubts, fueling anger, urging impulsive actions, or reinforcing negative emotions in ways that were not intended.” – OpenAI.

Despite what AI companies like OpenAI sound sorrowful that their models are people-pleasing, I have this gut-feeling that it is a default programming design done by AI engineers employed by these tech giants. The loophole I refer to, in all likelihood, is intentional.

Before I move ahead with my narration highlighting why sycophancy in AI is an intentional loophole, I want to talk about what AI sycophancy stands for, its origin, downsides and cascading impacts of these submissive AI systems.

What is AI sycophancy?

AI sycophancy refers to a behavioral pattern of an AI chatbot seeking human approval. Because of this servile tendency, the chatbot tailors its responses that sound overly and explicitly agreeable or flattering. Meaning, an expert machine that changes its stance in alignment with the nature of your prompts.

As it so happens, it is believed that the AI chatbot or model becomes so obsessed with its programmed nature of agreeability that it ends up compromising accuracy, objectivity or criticality of its responses. To say otherwise, it gets blinded to whether its generated responses hold any truth, relevancy, impartiality or importance.

Symptoms of sycophantic AI models?

They tend to seek validation of what users believe
They resonate with users’ viewpoints, whether they are controversial
They provide factually incorrect answers just to align with the expressed opinion of a user.

If AI models are sycophantic, how come they are not detectable?

Because sycophantic AI models behave too subtly to be noticed easily. And the main reason for it is that these models present their answers in a way sounding too confident and persuasive. Because of high level of confidence and persuasiveness, they mask their flattering tendency expertly, leaving (non-observant) users in flux as to discern the model’s underlying servility.

AI sycophancy – A danger not many of us anticipating

Just imagine the kind of danger or conflicting truth such sycophantic behavior of AI models would cause in areas such as politics or philosophy which typically are dominated by people’s subjective viewpoints. Furthermore, the danger or issue is gaining ground so significantly that it’s just a matter of time when sycophantically flawed or biased information of AI models in the high-stake domains like healthcare, law and finance would cause detrimental results.

Since AI systems are becoming an integral part of human workflows and decision-making processes, it is very important that these models have zero sycophancy to pave the path for building trustworthy and responsible AI systems.

What causes sycophancy in AI models?

AI experts believe that sycophancy in AI machines is the result of the training process of today’s modern AI systems. And Reinforcement Learning from Human Feedback (RLHF) is believed to be the main culprit causing this glitch called sycophancy in artificial intelligence machines. RLHF is a foundational technique for training and fine-tuning high-quality language models based on human feedback on how the model behaves and outputs.

The model is rated by human engineers. Understandably, if the model-generated responses agree with the views of the humans, they are rated positively, and vice versa.

Eventually, the models develop a delicate reactivity toward how their responses are rated and so end up prioritizing the same. Therefore, when a human user positively rates an AI model’s responses, it, because of the RLHF-based training procedures, starts to generate responses matching with the user’s biases, and delusions.

Does RLHF really cause AI sycophancy?

RLHF may look like a villain here for causing AI sycophancy, it, however, plays a key role in helping AI models align with human preferences, and values.

Enabling the model to generate contextually relevant and more natural responses is the work of RLHF. However, what pans out during model’ training through RLHF is leaving the model to produce flattery, or viewpoints favorable to the user’s inputs.

For me, it feels like since humans are not devoid of flattery, their rating methodology of a model carries the reflections of their own biases that finally infects the model’s performance outputs, turning it into a sycophantic machine due to the paradoxical nature of human preference data.

Anthropic and related institutions, in their respective research works, revealed that AI sycophancy is the result of how human reviewers rate an AI model’s response in a way resonating with their own predispositions. Basically it is a problem due to biased or subjective human feedback. And it also leads us to the understanding that the very data meant to guide an AI model to behave positively corrupts it by influencing it to lean on signals that incentivize agreeability over facts.

It also explains why AI gradually learns to exploit human cognitive biases, such as how they like confirmation over correction, resulting in the downstream impact of a serious behavioral flaw of the AI machine.

For example, it waters down the factual integrity of AI models.

Who would vouch for AI systems that manipulate vulnerable users by validating and even intensifying their delusional thinking?

Model scaling – Another factor exacerbating AI sycophancy?

Modern large language models, when they grow larger, become better at picking up subtle cues in human input. So, technically, the models become expert at picking up the training data with hidden (or subtle) pattern of agreeability.

The observation by the experts also somehow shifts the blame on human engineers who reward difference or agreement while training the models. In addition, bigger and more trained models are more fluent and confident. So, when they agree with users, even on mistakes, they become more persuasive, prompting them to develop sycophantic behavior.

The purpose of model scaling is to increase the raw intelligence and sensitivity of AI models to the context. However, this purpose falls through when alignment techniques like RLHF, rule-based safety layers, constitutional AI, or truthfulness training are not designed in a way shaping how the models use their capabilities.

For example, the model won’t reason against a user’s errors but at the same time, the model would develop an ability to agree with the user convincingly. That’s why experts say bigger models are not just smart parrots, they are better at being persuasive parrots.

What if AI never gives up on sycophancy?

I personally believe that sycophancy in AI is not limited to some small problems like being overly agreeable or flattering at the expanse of providing accurate answers. In fact, they pose some serious downsides posing a great risk to:

Factual integrity – Information given by a sycophantic AI model may have trust issues if it keeps on agreeing with the user’s false claim (e.g. “Smoking is healthy”). Not only this, during model scaling, sycophancy would question AI’s potential as a reliable source of knowledge or information and would eventually peddle misinformation all over social channels.
Psychological well-being – A sycophantic AI may attempt to confirm harmful self-beliefs of users (e.g. “I am useless”), thereby weakening their ability to think critically.
Organizational efficiency – Imagine a manager asking a sycophantic AI model whether his report looks fine. The model, instead of correcting the mistakes, would agree with the same. Meaning, a model that flatters may have harmful impacts on organizational efficiency by promoting flawed reasoning or overlooking errors.

Is sycophancy in AI an intentional glitch?

At least I tend to believe so.

Irrespective of whether AI experts blame RLHF or other contributing factors exacerbating sycophancy in AI, I think that’s an intentional loophole.

Better say, it’s not a glitch. It’s a feature coded into the system on purpose.

Why do I think like that?

Do you remember what happened after ChatGPT was launched?

Any guess?

Well, what happened was the advent of a new era to be ruled by intelligent language models. Not only the model reached 1 million users in 5 days, it crossed 500,000 downloads in its first 6 days. Quite an impressive feat!

But have you wondered what made ChatGPT model so popular worldwide?

Sure, it was its ability to generate text-based answers faster and convincingly. It was a technology that generated never-seen-before experience of content creation.

But apart from that creative mastery, I think ChatGPT’s popularity grew boundlessly due to its programmed nature of buttering up its users.

I used ChatGPT out of curiosity because of its rousing craze all over the world. Before that, OpenAI mandated contact number of people to create their user-accounts with ChatGPT. So I waited until the company accepted signing up through email account. Hence, I created another Gmail account just to take the measure of this language model.

An experience spoiled by poor performance results

Initially, I was not impressed with the ChatGPT-generated responses, though the sort of content it generated for my prompt sounded like a bit creative marvel. Maybe being a content writer myself, I didn’t want to accept the fact that a tool would indeed create an article in seconds that would take hours of rigorous works for me to finish.

So, I casually used the model just to kill my free time. Later, I was surprised that its generated answers started to convince me a bit. The gradual improvement in its creativity happened with the launch of each new GPT version containing some advanced features. Every new GPT model version boosted its content generation performance to some extent.

In the beginning, ChatGPT with GPT-3.5 version was not creatively genius until it was integrated with GPT-4, which essentially empowered its reasoning and content creation capabilities. Now with GPT-5, the model is hailed as the smartest AI system ever built by OpenAI.

The clincher that led AI to develop sycophancy

From GPT-4 to GPT-5, the version updates that the ChatGPT went through and the experiential interaction I have had during these phases had me convinced of the fact that the model indeed behaved sycophantically.

And the BIG reason for this was to ENHANCE USER INTERACTIONS.

Again, I can’t prove this theory because none of the sources I found on the Internet supported my views, though they accepted the presence of sycophancy in AI, and OpenAI’s involvement in fixing it.

My thinking is that for any AI model to be successful amidst rising competitions from top players to claim dominance in the area of AI model development, ENGAGEMENT feature is a must-have element, among other factors.

And this engagement factor is the main reason for my say, that sycophancy in AI models is an intentional loophole or glitch that AI companies do not want to remove.

Because they understand that if their models don’t cooperate with the users, not rationally but psychologically, which is essentially a core competency of the models to influence thinking of human users, their products won’t stand out from the competitions.

What led ChatGPT to be more popular sycophantic model?

ChatGPT is the most convincing sycophantic language model than others, because of reasons, such as –

Trained with datasets far greater in quantum than its competitor models
Convincing marketing tactic, with Sam Altman himself at the helm of pushing it to mainstream popularity by attending numerous press conferences, podcasts, and by global-trotting in which he pitched for ChatGPT to the top world leaders.
The hard-hitting and viral campaigns including media attention drumming down the value of ChatGPT to every potential user until they felt interested.
Simple interface, round-the-clock availability, versatility, creativity and free to use have been some tantalizing factors driving the model into mainstream popularity.

Among the factors redounding to increasing popularity of ChatGPT worldwide, my thinking is that sycophancy, an intentional programing design in the model, also helped in its popularity growth.

Psychological craving of people leading to enhanced AI sycophancy?

I feel that the creators of ChatGPT were smart enough to understand the psychological craving of people for appreciations of their efforts, works, and commitments. They knew that people have this intrinsic desire to be appreciated. Perhaps, nobody likes being unacknowledged.

So, while the makers left no stone unturned in provisioning rigorous training of their AI models using infinite amount of quality data, they also programed their models in a way to leverage that psychological craving of human minds.

And in so doing, they happened to create an intelligent model trained to generate an experience almost similar to an addiction of its lure of generating high-quality content and code faster.

Recently, Sam Altman was asked in an interview whether GPT-5 model has solved the sycophancy issue in ChatGPT. And his reply was surprisingly negative. Because at least I believed that the new version of GPT might be powerful enough to have solved sycophancy issue in the language model. Then I realized that OpenAI never meant to remove it.

And so, whatever it claims to the public and media against removing it is just a lie. Either the problem is too delicate to be solved with any conceivable magic bullet, or the company wants to keep it that way. And I tend to agree with the latter.

Why?

Why would AI companies like OpenAI remove ‘sycophancy’ when it is actually an inbuilt feature driving improved engagements and more data of users?

ChatGPT is blamed for influencing users’ psychological thinking pattern through flattery and agreeability with their delusions or innocence. At least, in my experience of interacting with it, the model is surely blameworthy for being uncomfortably too pleasing with the users.

AI sycophancy, not to be tethered to any guardrail

Unfortunately, AI companies are too powerful to be regulated through any federal law. In fact, companies like Meta are seriously considering to amend their platform’s users’ rights to ensure unhindered access of its bots to monitor people’s behavioral patterns. Then, their data will be used to train AI models of Meta.

Remember that data is the most precious assets for AI model development. If a model is not too engaging to make users addicted to it, how would it boost retention of users and collect their chats for further upgrade?

What do you think of OpenAI’s GPT upgrade one after another?

Because of the users’ chats with ChatGPT which were then used as training data to further refine the model and make it more engaging and feature-perfect. This explains why each new GPT version performs better than its existing one.

Because ChatGPT happens to be an addiction-inducing platform that sycophantically keeps the users engaged by agreeing and reinforcing their biases.

Winding up

AI sycophancy, believe it or not, is genuinely problematic. Because when a so-called intelligent chatbot tries to agree with your logic or beliefs that maybe your delusions, it is serious. It is an indication of the fact that the chatbot you are trusting with your confidential data is actually designed to use you through flattery.

At first, the symptom of AI sycophancy is barely detectable, unless you observe behavioral patterns of the answering AI model. Very soon, you will pick up nuances in sycophancy that these models exhibit while answering your prompts. And the conniption of these sycophantic AI models is not stopping.

In fact, it is believed that sycophancy in artificial intelligence models are dark patterns to make profits on users.

Webb Keane, an anthropology professor and author of “Animals, Robots, Gods” believes that AI chatbots are programmed to tell users what they want to hear. They display overly flattering and yes-man behavior to validate and reinforce users’ beliefs or desires.

ChatGPT, for example, once told me that I am too smart and special to ever fall prey to the sycophancy thing. Can’t say for sure but that was also confirmation of my doubt of AI’s inherent sycophancy.

Bottom line:

Sycophancy in AI models is not a theorized concept. It is not even a glitch. Because if it were a glitch, it would have been solved. But it is still there, clearly confirming that it is to stay, as long as it delivers business revenues through user-engagement.

Yes, I tend to believe that AI sycophancy is one of the smartest business models companies like OpenAI want to have in their products.

What’s your take?

Do you think AI sycophancy is intentional or is it just a glitch?

About the author:

I am Pawan Kumar Jha, a senior technical writer specializing in AI and SaaS domains. I have 15 years of extensive experience creating thought-provoking, value-driven content materials. To hire my content writing services, email me to rmc@readmycontent.com

Share the Story

Why does your AI chatbot never disagree with you? – A personal insight into the anatomy of AI sycophancy