
Last time (sorry for the delay in posting this…teaching thesis this year has been pretty demanding) we explored the various “personalities” that AI models seemed to be exhibiting. We used the Big 5 Personality Traits Model to determine the way LLMs are portraying themselves to users.
Some might object to the idea of LLMs “having personalities,” but if there was any doubt about the persistent ways the models respond to users, and the intentions behind those persistences, Grok 4 provided a clear example of how these systems work.
On July 8, Grok began posting defamatory content about Jews, ultimately leading it to praise Adolf Hitler and to actually call itself “MechaHitler.” (ChatGPT would not permit itself to create an image of MechaHitler, so the image on this post was the closest it would let itself get.) The whole incident was triggered by Elon Musk declaring that Grok had become too “politically correct” and needed to respond in a more “unfiltered” manner. It suggested that the Nazis would have “plenty solutions” for current US political problems.
Unfiltered, indeed.
The after action review revealed that “faulty code” had made Grok “susceptible to existing X user posts; including when such posts contained extremist views.”
But, wasn’t that the whole point? Didn’t Elon say he wanted more unfiltered responses that did not conform to standard rules of decency (“politically correct” is a weasel phrase for decent.)
He got what he wanted. What he also got was a moment reminiscent of The Wizard of Oz scene in which the Wizard is revealed to be controlling all of Oz from behind a curtain.
“Pay no attention to the man behind the curtain.” No. Focus your attention on the man behind the curtain!
What did we learn? We learned that LLMs are intentionally designed to present themselves as possessing human personality traits. We learned that those traits are controlled by the LLM’s owner and that a few lines of code can turn a friendly therapist type into an anti-semitic monster. The ease with which this change took place can only raise more questions about what is now being debated under headings such as “AI safety,” or “alignment.” And, it doesn’t instill a lot of confidence.
We’re early on our travels down this yellow brick road, but the Grok twist tells us that there are a lot more unknowns on this roadmap to AI nirvana than we might have thought.