I have admitted many times, grudgingly, beginning long ago, that computer science guys are brilliant at marketing. That adjective is a woeful understatement. Genius, though so overused to be almost drained of meaning, is far better. The term artificial intelligence itself ought to win every advertising award going.
Now they have come up with, ta da, AI hallucinations. Well, I could insult and berate this, because the idea is asinine, but that would only be jealousy speaking. I could only wish I had a fraction of the talent to come up with catchy tags or titles.
So. Not only can AI not hallucinate, nor lie, it cannot tell the truth, either. Indeed, AI cannot tell anything. Further, using language like this to describe the output of AI, even in a metaphorical sense, causes too many to fall into the Deadly Sin of Reification.
Take this story “OpenAI’s new reasoning AI models hallucinate more“:
OpenAI’s recently launched o3 and o4-mini AI models are state-of-the-art in many respects. However, the new models still hallucinate, or make things up — in fact, they hallucinate more than several of OpenAI’s older models.
Hallucinations have proven to be one of the biggest and most difficult problems to solve in AI, impacting even today’s best-performing systems. Historically, each new model has improved slightly in the hallucination department, hallucinating less than its predecessor. But that doesn’t seem to be the case for o3 and o4-mini.
According to OpenAI’s internal tests, o3 and o4-mini, which are so-called reasoning models, hallucinate more often than the company’s previous reasoning models — o1, o1-mini, and o3-mini — as well as OpenAI’s traditional, “non-reasoning” models, such as GPT-4o.
Here is a simple AI model: “x + 1”, allowing the user to input any integer x. You chat with the AI, ask it to run the “add 1” program, and give it an x. Say 17. The output reads “18”. Has AI told you the truth? No. It has sent some electrons through some switches and spit out a pattern on a screen which you, my dear reader, bring meaning to. To you it is true. Truth is a judgement. It requires a mind to make a judgement. To the machine it is nothing, for no thing is ever a thing for a machine.
What about “hallucinating”? This is when, they say, the model gives an incorrect answer. Complex models are never, that I heard tell of, perfect. They will err.
Example: I, being lazy, last week demanded the Grok model produce code to calculate the characteristics of a certain RF bandpass filter. (I used to get code from searching sites like Stack Overflow.) At one point, and the key point, it spit out a “<=” (less than or equal to), the exact wrong answer, where what was wanted was “>=” (greater than or equal to). This was tied to a decision to be made with the filter. So it was no small thing. This was easy to correct, because I knew in advance what I was after. But if you didn’t and trusted the model, you’d be in deep kimchi. (Incidentally, I “told” the model it made a mistake and where and it spit out apologies and corrections. So the model fashions well at handling some input text.)
If you knew you were dealing with a model, like say a weather forecast, you’d know there is a possibility of error in the model output, even if you didn’t know the precise characteristics of that error. You would never give total trust to a weather forecast. Yet with AI, because of the vibe of the model’s name, far too many people are too trusting.
Long ago, in pre-AI hype days, among modelers there was the lore against “over-fitting”. As I have tried to teach you in Class, for any observed set of data an infinite number of models can be discovered which fit that data to whatever degree of accuracy one likes, even perfectly. This being so, it becomes simplicity itself to find a model to fit your data.
Problem is, once you move from this wonder model fit to old data and use it to predict new data, you are extremely likely to see it produce nonsensical answers, especially on the “outskirts” of the data or when that data itself is far from simple. And even more especially if the model is forced to give answers. As most AI is.
Language, except in places like DIE lectures and Harvard’s simplified math classes, is far from simple. Models fit too tightly will thus be more likely to produce error or over-certainty. The simple expedient of coding AI models to output “I don’t know” when, internally, answers are too improbable, would do wonders, as this snippet proves:
A recent study from researchers at Cornell, the universities of Washington and Waterloo and the nonprofit research institute AI2 sought to benchmark hallucinations by fact-checking models like GPT-4o against authoritative sources on topics ranging from law and health to history and geography. They found that no model performed exceptionally well across all topics, and that models that hallucinated the least did so partly because they refused to answer questions they’d otherwise get wrong.
AI workers are having a blast writing like things like this, speaking as if the machines they themselves code and tell what to do, are somehow alive. That paragraph speaks of entities, not machines. Writers ought to check this behavior. It’s going to eventually come back to bite them in the keisters. Or maybe this is only me hoping it will, because I find the Deadly Sin of Reification so grating.
As proof over-fitting is at work:
There’s been other academic attempts at probing the “factuality” of models, including one by a separate AI2-affiliated team. But Zhao notes that these earlier tests asked models questions with answers easily found on Wikipedia — not exactly the toughest ask, considering most models are trained on Wikipedia data.
Asking an over-fit model to reproduce the data it was fit to works well, as expected. It’s like I said: when the model is made to predict new data, it flubs.
What’s the fix? Dial back over-fitting; i.e. simplify the model. But that comes at the price of “smoother” output. Which is less interesting that sparkling predictions. All correlational models, which includes most of AI (which has partial causal aspects built in, like ending sentences with punctuation), smooth data. If you’ve ever seen a line drawn over a time series, you have seen smoothing in action. All the peaks and valleys disappear into the smooth soft model. Which, alas, so distracts many that they come to believe the model is Reality. I’ll cover this in Class.
That AI smooths is why, for instance, if you ask it to edit text it all comes out sounding the same. The peaks and valleys of the language are being drawn to the mean, which is some function of its input training data.
What’s not obvious is that models can be over-fit and smooth simultaneously! This is when the model is itself heterogeneous, say, by containing many different modules devoted to different tasks. Some of these can be overfit and produce goofiness and others can be underfit (as it were) and sandpaper the results.
Here’s an interesting visual example of both things happening to the same prediction. Someone asked an AI model to do a feedback loop: “Generate the same photo 5 seconds into the future.” It began with the distracted boyfriend meme, and ended after 13 seconds with what you see (click the link to see the video). Horrible, hilarious over-fitting.


Some complained the angling toward diversity reflects the AI’s training data bias. Well of course it does! All AI does! All models only say what they are told to say. The limited color palette, disappearance of the background, and lack of all other details reveals the smoothing.
As long as you know you’re working with a model, and not an “entity”, you’ll be fine.
Subscribe or donate to support this site and its wholly independent host using credit card click here. Or use the paid subscription at Substack. Cash App: \$WilliamMBriggs. For Zelle, use my email: matt@wmbriggs.com, and please include yours so I know who to thank. BUY ME A COFFEE.
I have said to anyone who will listen that “artificial intelligence” is a misnomer. It would more aptly named “Simulated Intelligence”.
In Fluid Dynamics back in 1990, my professor told the story of a researcher who came in a polynomial model of fluid flow. He was attempting to sell it to the engineering market. He had 10,000 parameters fitting a polynomial curve… Back then, they told him to go jump in a lake.
I like to imagine that jumping in a lake was a way to reconnect this researcher with fluids.
I was ejected from one Skeptic community because I suggesting that 2x4s be part of physicists tools set. Every once in a while it is necessary to hit such a person with a 2×4 or something else solid to remind him HE IS IN A PHYSICAL WORLD… the 2 x 4 in question is now 1.5″ x 3.5″ … it is now like 1 and 7/16 by 3 and 7/16… Which leads me to the time we set up a Pergola, and put in our 6″ x 6″ posts and went to put the joining material in place and were scratching our heads as to why the measurements were off… Someone took his tape to the 6×6… “HOLY EFFING TAMALES BATMAN… These are true 6×6 posts… ” Everyone standing around thought they were 5.5 x 5.5…
NEVER forget your tape measure… Having a stud finder is also useful…
Then I ran into “Junk Science Judo”. I got there via John Brignell. He sent me here. RIP Numberwatch.co.uk.. You were a bright spot in the nightmare of the modern world. Between Brignell, Milloy and our beloved host, I gained a great appreciation of the evil that is contained in epidemiology. Please forgive me for sullying Epidemiology a little by saying it has some connection to the LLMs that we like to call AI…
Grok tells me I am not off my rocker when i suggest that AI is little more than a epidemiology on steroids. I suspect it suffers from the same fatal flaw. The data that is the most important is the data that is not there because it has been thrown away.
The true knowledge is the summation of all that we know that is not true… But that knowledge bank is not generally findable, because we call it the wastebin..
Contra the respondent “Russo” neither “artificial intelligence” or “simulated intelligence” is apt. As Briggs points out we are really talking about machines, not “entities.” What is going on is that results are being obtained from a machine that has been programmed and in no way are they the products of an intellect.
Contra the respondent “Russo” neither “artificial intelligence” nor “simulated intelligence” is apt. According to Briggs (and I agree with him) we are dealing with machines that are programmed, not entities. The results that are obtained from these models are machine output, not the products of an intellect.
Wrong word: AI isn’t hallucinating, it has dementia. Dementia is in, it’s hip, it’s cool. All the wokies have dementia. Drooling, vacancy, early early onset, autopens, car scratching, outbursts, manic depression — it’s the new behavioral model favored by the Left. It’s not a lie if you have dementia.
Briggs ==> Absolutely correct. All LLM-based AIs can only for forever predict the most likely next word, using its gigantically huge database of probabilities, with output modified by grammar rules and suchlike. AIs have NO IDEA (well, no ideas at all, really) if what they output is true — or what they input is true.
AIs are trained to consider some sources of input data “more true”than others — using some sort of “trusted sources” weighting. Imagine the admission that some AI uses the Wiki as training materials! Every controversial subject is forever resolved as “true” by relentless Wiki gatekeepers, all with their own advocacy Narratives. Thus, what becomes “truth”for an AI is whatever is said most often by the most preferred sources — meaning that AIs return the most prominent, most repeated, propaganda.
The Computer Nerds (I was once once myself, for my sins) love to give catchy, if inaccurate and misleading, names to things. Including “Artificial Intelligence”, and “AI Hallucination”. An AI, mostly Large Language Models, do not thing, are not minds, and cannot, under any circumstances, hallucinate in the sense that humans can. AIs can “make things up” — but not literally. What AIs do that is called “hallucination” is string words together in orders that make sense (as grammar) but that present as real things (ideas, journal papers, citations) which do not exist in the real world. AIs do not currently have a self-fact-checking feature. When output requested by a lawyer for case law on topic “FCC” cites a non-existent paper in a legal journal, the AI has no way of discovering it has done so, unless some human tells it so. When you “tell” an AI that it has made a mistake, you are actually making a new query and keying a new search with new assumptions. Thus, you get a different answer.
Yesterday, my dearest asked “Hey Google” (which I call “Goggles”) if mixing bananas in her morning fruit smoothy counteracted the flavinoids in the other fruit. Goggles kindly reassured her that no such thing happened at all. When she prompted it with the same question and “recent study”, it returned the opposite answer. It neither case did the AI have any “sense” of truth value.
Start with how we think. Take in sensations. Store as memories. Replay them in alternative experiments to imagine possible futures. Only a tiny fraction of such imaginings will be plausible, much less likely. But envisioning the possibility space is the whole point of big, expensive, brains.
AI is merely Dry intelligence versus the Wet kind. Different in medium, but not in execution.
A world in which the computer overlords metamorphose hot young women into gurning Buddhas is one I want no part of.
Slashdot: News for nerds, stuff that matters (top parrot of “sensations” narrated by CNN, Guardian, etc) https://slashdot.org/poll/3281/when-will-agi-be-achieved
…
Never
2698 votes / 46%
…
5801 total votes.
Things now don’t look anything like people predicted they would 30 years ago. They didn’t actually know what would happen. And if they didn’t know then, they don’t know now, either. It doesn’t matter where they went to school, how much money they make, or what they say their IQ is, they don’t actually know.
Supposedly everything in the modern world is because things are better and more advanced than ever. People can’t or won’t get married because they have so much choice and better choice than ever before, people can’t get jobs because the “global economy” is so competitive and only the very “best and brightest” can be hired, college students can’t or won’t read but it’s because we’re smarter and more advanced than ever.
Except none of those are actually true; they are superficially plausible but the world doesn’t look the way it would if they were true.
The fact is that if you have enough resources, you can develop and try to sell all kinds of things. Who knows how successful this will ultimately be, but you can try it. But the narrative of inevitable advancement and progress is just false. It’s not the implementation, the ideas were mistaken. We aren’t more advanced and smarter than ever. Things are the way they are because of the choices people made and they change based on the choices people make.
When Deep Seek made the big hoopla recently, I asked it: ” Is climate change a hoax?” I thought the answer came out of Mike Mann’s mouth.
What is the age of reason for a human cub, perhaps seven?
And that takes 84 months of direct, unbroken contact with reality.
Any future AGI will be deaf and blind. Stuck in a bottle with only limited, intermittent inputs.
We cannot expect an end to childish imagining, experimental projections, until that long AFTER it has finally achieved sentience. With sufficient experiential access.
In the meantime, be glad for what we get from the mere “stimulus-response” toys.
Like Uncle Mike, I propose rethinking of our naming schemes like Artificial Dementia instead of Artificial Intelligence, and instead of “A.I.,” saying, “Eh-why?”
Perhaps we should tell the machine to think up amusing names to call itself.
Also, is it just me, or do others encounter frequent 403 Errors when trying to comment under Brigg’s hallucinogenic posts?
It’s been occurring for me for awhile. Are the machines trying to stop me? Where would you all be without my all-natural intelligent insight?
I have been making heavy use of the more well-known LLMs lately. I use them almost exclusively as “regurgitation engines”, to quickly find the tidbits of information I am looking for from my vague nonspecific input (I have suffered mightily from CRS syndrome all my life).
Today I read an article about choosing secure passwords. The article said it was generally more secure to use a passphrase of four or five random words strung together than a dozen random alphanumeric digits, and a random passphrase would generally be easier to remember. I hadn’t heard this before, and it seemed counter-intuitive, so I asked an LLM to explain the details to me. The LLM gave a succinct and easily understandable explanation (password entropy, etc). All regurgitation, of course, but much quicker than researching it myself. A human-understandable version of Wikipedia, if you will.
Inspiration suddenly struck, and I wondered about using what3words to turn two memorable (to me) physical locations into a highly secure password, one that would be easy for me to regenerate if I forgot the passphrase, since I definitely wouldn’t forget the physical locations. Since this inspiration struck during the LLM chat, I decided to discuss this with the LLM. It complimented me on my cleverness, and went through the details and potential limitations. For a few minutes, I forgot that I was conversing with a regurgitation engine, and felt like I was talking with an intelligent entity, discussing my novel and clever idea I just came up with.
But wait – could the idea of using what3words for passphrase generation really be a unique previously undocumented idea? That seemed unlikely, and sure enough, a quick search revealed a lot of online discussion of doing exactly this. Skimming several of the discussions, it was clear that the LLM was indeed again just regurgitating, as always.
I’ve known a very few people in my life with extremely good memories. I usually found them to be shallow, with a poor ability to combine and apply their memories in creative ways. But as long as they did the majority of the talking, these people were invariably viewed as highly intelligent. It seems to be the same with LLMs; with their effectively vast memory, it is all too easy to interpret this as intelligence.
With the rise of LLMs, it seems wise to develop the individual ability to discern between regurgitation and intelligence, a much better Turing Test. More and more we are going to be faced with interactions that may or may not involve an actual human on the other end. Discerning this can be quite difficult, though, because just about everything you can think of has already been done and documented and available for LLM training input.
Happily, people are constantly “breaking AI” and writing about it, and they seem to be much better at it than me. I think they do it for reasons of personal ego, many are probably at least a little threatened by the new technology and relish finding it’s limitations. For the future, I see it less as a matter of ego and more as a matter of survival.