Why The Recent Claims Of AGI Fail: Plus, A New Test For AGI

There are two new claims from last week that artificial general intelligence (AGI) has been reached, as detailed in Tree of Woe’s recent article on the subject. One claim is from an AI investment firm (of some sort), and the other by some academics.

Ed Feser beat me to an essay investigating the academics’ claim: “No, AI does not have human-level intelligence“. I’ll take a different tack criticizing the investment firm’s claim. I’ll take it for granted that all of you will first read Feser’s article, in which he explains how the loose hazy wishful (my word) definition of intelligence used by the academics is inadequate. I’ll assume you know what he has said on this below. For instance:

Having failed to provide a serious definition of intelligence, it is no surprise that they also fail to provide a serious account of how to go about detecting intelligence (since the latter task presupposes the former). There is a lot of hand waving about “a cascade of evidence,” and gee-whiz references to what LLMs can do. But all of this ultimately boils down to nothing more than a stale appeal to the Turing test. And the problem with the Turing test is that, of its very nature, it cannot distinguish genuine intelligence from a mere clever simulation. Indeed, it deliberately ignores the difference and focuses narrowly on the question of what would lead us to judge a machine to be intelligent, rather than the question of what would make it the case that a machine actually is intelligent.

Ed’s critique is philosophical, of course, and follows in part things like this: “The Limitations Of AI: General Or Real Intelligence Is Not Possible” (in which we go over Searle’s works). But philosophical arguments will not be, though they ought to be, convincing to those excited about AI. These folks hope that if they throw enough RAM and electricity at the problem, the Impossible Gaps from mere code to genuine intelligence can be bridged in a way that nobody has yet figured. “Emergence” will kick in and the computers will come alive.

My critique is not philosophical, but practical, from someone who writes code for a living, and from my thinking and experience with models.

At the end, I propose a test for AGI. Incomplete. Those interested in this subject can help refine this test.

My Critique

AI is models. Models do not parrot, or simply repeat back what they have coded to regard as history. They make predictions. Take this simple “regression” model pictured here. The data is in XXX, also fabulously called “the training data”, as if models can be trained like dogs. The data (the plot) reads “When X takes this value, Y took these values.”

The line is the model itself, a simple mathematical function of the observations. The line smooths over the observations, and simply states “When X takes this value, Y takes that value.” Not values, but here singular. It is then simplicity itself to propose any X and see the model’s corresponding Y. More sophisticated models might even state the uncertainty in the Y, given this X.

The X can be, as scientists thrill to say, “novel”, i.e. an X never before seen. Yet we can still produce a Y for it, as the picture shows. The vertical line is an X never before seen by the model. Its prediction is in red on the line (of course), and the Y real life gave us is in green.

When a new, or any X, is given the model, the model gets something done. It produces work. It may even give a good response, in the sense of Y’s closeness to Reality when Nature herself takes this never-before-seen X, and churns out a Y through whatever causes are required.

But this regression is clearly a model. It is math, nothing more. It is easy to see this, too, since it is compact. No one would say the model is alive, has reached sentience, has consciousness or somehow developed an intellect and will. Its just dumb code. Speechless code. Mute. Incapable of language. The code has not reached self-awareness.

Yet some will say the code has “learned”. As a metaphor, the word has its uses. But it is wrong. Strictly incorrect. There is nothing in code that can learn. It’s just code! The interpretation of it, the meaning, comes from us, from outside the computer.

We can also use that same distressing metaphor to say that the model has “figured out” the Y with our “novel” X. The real figuring came in the persons doing the coding, the code itself is just a machine. But, as unhelpful and misleading a this metaphor is, it is not wrong to say that the model has “figured” a new thing.

But it did so without intelligence. Novelty, in this simple sense, is thus not intelligence.

Our simple model can be juiced in many ways. For instance, by writing code that fetches new data, which searches are triggered by any number of flags in the code. The model can redo its own regression, coming up with better, tighter guesses of Reality. We can have it flag certain correlations, and task it to make new models like that regression, all inside the larger Meta Model.

The whole thing can be automated. Somebody pushes Go and off it runs, with no hands required to keep the thing rolling along. As long as nobody unplugs it, it can keep running and producing new mini-models (the new regressions).

But none of this is intelligence. Not even if we add more code that makes the model spit out English to explain its results. Not even if the eventual mini-models have never been thought of by the coders of the AI. This is still only code plugging along, doing what was told, because—all regular readers know this chant—all models only say what they are told to say, and AI is a model.

We—the collective we—know what the code is in all AI, because it is we who write it. That the output, produced by myriad combinations of the inputs (always newly arriving through user interaction) with the manipulations given by the coders, is unpredictable to the coders does not mean that the code was intelligent. It only means the intelligence of the coders was limited, because their creation is too hugeous. Too many knobs, switches, and dials, the combinations of which grow fast. Complexity of output is not intelligence.

There will be many claims of AGI. Most will be sincere, but, since money is on the line, a lot will be pure hype.

Here’s Sequoia’s claim:

A Functional Definition of AGI

AGI is the ability to figure things out. That’s it.*

*We appreciate that such an imprecise definition will not settle any philosophical debates. Pragmatically speaking, what do you want if you’re trying to get something done? An AI that can just figure stuff out. How it happens is of less concern than the fact that it happens.

A human who can figure things out has some baseline knowledge, the ability to reason over that knowledge, and the ability to iterate their way to the answer.

An AI that can figure things out has some baseline knowledge (pre-training), the ability to reason over that knowledge (inference-time compute), and the ability to iterate its way to the answer (long-horizon agents).

Some words follow about what an AI agent is (not important for us) and then this:

This is what it means to figure things out. Navigating ambiguity to accomplish a goal – forming hypotheses, testing them, hitting dead ends, and pivoting until something clicks. The agent didn’t follow a script. It ran the same loop a great recruiter runs in their head, except it did it tirelessly in 31 minutes, without being told how.

To be clear: agents still fail. They hallucinate, lose context, and sometimes charge confidently down exactly the wrong path. But the trajectory is unmistakable, and the failures are increasingly fixable.

There is a great deal of confusion in all this, except they’re right that none of this will settle any debates.

We have already seen that models “figure things out” in a metaphorical sense. This does not make them intelligent. We have also seen that we can code Meta Models to form mini-models, which coded actions if you want to call, metaphorically, “forming hypotheses” you can. But this isn’t intelligence.

The Meta Model is following a script, though, and the claim that it isn’t is a bluff. Of course it is following a script. That is what code is. Scripts. It is a mere label to call the Meta Model an “agent” that builds the mini-models.

Models do not hallucinate. See my “AI Cannot Hallucinate Nor Lie“. Models make inaccurate predictions (when they do not know the full causes and conditions of what it is they are predicting.)

It is well here to remind ourselves of the awful nasty habit computer scientists have in giving false misleading grandiose names to their packets of code. They simply cannot break themselves of this debilitating tic. Problem is, these inaccurate names, cute in one sense, become like sycophantic Yes Men, which leads the computer scientists to start believing their own press.

You’ve seen the effects of this in politicians and celebrities: the artificial worlds of hype and exaggeration they build for themselves become realer than Reality. Computer scientists are constantly jumping ahead of where they are, because the names they give their creations impel them. Consider the actual example of touting “A new non-linear regression model”, and then seeing how many more investors you can tease up by renaming it “Neural nets”. (Yes, this really happened.)

Finally, even though these critiques, the philosophical and practical, may be convincing for the moment, their effects will wear off. Too much competition from the glee and constant promotion of the AI “revolution.” (And in genuine modeling successes, like this.) It would be best if some kind of test of AGI were in hand, agreed upon by critics like Feser and myself, and by AI mavens and promoters. Some practical task which would hack through the hype and restore calm.

AGI Test

A rational intellect knows its own mind, knows it knows it, and wills what it thinks is good for it. That happens in man. How could we tell if happened in a machine? Not from “metrics” like “getting things done” or “solving problems”. Any soulless model does these things, as we have seen. How can we determine a “souled” model? Not by the “Turing test”. The goofy ELIZA chat program was fooling people computers had intelligence in the 1960s, passing Turing’s test with ease.

We’ve already seen, too, that Meta Models can build mini-models, with neither being intelligent. Too, complexity of output is no guarantee of intelligence. Cut open a ten pound bag of rice and scatter the contents wildly. The resulting pattern of strewn grain will be exceedingly complex, but there is nothing intelligent in it. Which is a reminder that it is we who bring our intelligence to bear when examining computer output. It is we who give those patterns (and lack of patterns!) meaning. To the computer, they are just electrical impulses, no one set being different than another in meaning.

You can input textbooks on logical syllogisms, and then have the AI model predict new ones, using new propositions/premises as input. But that does not mean the model has grasped the meaning of the syllogisms. But this also means that a test where a model spits out new mathematical theorems or syllogisms, that turn out true, would be inadequate. The trivial regression model above can do as much. You can’t distinguish between genuine intelligence and mimicry (Ed’s word) here by accurate predictions.

So here is one idea, not yet well thought out or complete. Please help if you can.

One thing man does it sit and contemplate. In-sights are gained in this way (and other ways, too, of course). When we are quiet, undistracted, and not even necessarily thinking of the subject at hand, in comes the realization, knowledge of a new universal truth.

Every serious student has had the experience when thinking hard on some subject of making a discovery, true discovering, but ones which (sadly) we learn have already been thought up by others long before. The precedance is only important in the race for credit (taking into account Stigler’s Law of Eponymy, of course). What we have done is still an instance of genuine intelligence. Every teacher knows this.

This suggests a test for AGI. Have the model “trained” only on an agreed upon set of elementary mathematics. Perhaps up to high school algebra. No hint of complex numbers or anything “advanced”. The Meta Model to manipulate the “training” into new objects can be built as usual.

Then close off all new input. Let the model contemplate. We only allow it to “ponder” what it has. We have already seen it will be able to make accurate predictions at times. But these will only be of the same level or order of the material it already has. Polynomials of astonishing complexity would be proof only that the model has no bugs. They would not be proof the model has demonstrated intelligence.

But coming up with complex numbers would. Deriving the calculus from principles it had no way of knowing, and could not be built from simple premises would. Discovering measure theory and Cantor’s Paradise would. These all require leaps of inductive intellection, and are a clean a test of intelligence as you can find.

I had thought of calling this the Gödel Test, but somebody beat me to it. However, their test is easier, only having to solve conjectures, which is no different than making predictions. We’re asking for more than answers to conjectures. We’re asking models to rediscover insights.

The name is still good, because in essence, or if you like, Gödel proved new axioms, i.e. these acts of inductive intellection, are necessary to move into new areas of thought. We’re asking the model to create entirely new, to it, areas of thought, which require what we can be sure it did not have in its code to begin with.

That’s only a sketch. We have to be incredibly cautious about the setup, because it’s too easy to fool ourselves, or to redefine success after the fact, as this sage video demonstrates.

Here are the various ways to support this work:

Subscribe at Substack (paid or free)
Cash App: $WilliamMBriggs
Zelle: use email: matt@wmbriggs.com
Buy me a coffee
Paypal
Other credit card subscription or single donations
Hire me
Subscribe at YouTube
PASS POSTS ON TO OTHERS

Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

10 Comments

hudbwu

February 9, 2026, 7:54 am

Caution: it may be that intelligence can not be confirmed by any behavioral test. For example, free will can not be confirmed by any behavioral test of an entity suspected of possesing it. Perhaps the same applies to intelligence.

Solving nontrival Sokoban puzzles may be an interesting area for looking for these tests, at any rate.
Briggs

February 9, 2026, 10:14 am

hudbwu,

You could be right.
Rudolph Harrier

February 9, 2026, 11:36 am

It’s an interesting thought experiment. However, we will never have a practical implementation of the test because it would be too hard to verify and the motivation to cheat is too high. As a very simple example: a company could take a large database of high school algebra texts and then create a model that discusses complex numbers. But most high school algebra texts have a single section defining complex numbers (if only so that they can ask students to solve problems like “factor x^2+bx+c into two linear parts” regardless of what b and c are.) But a company could claim truthfully “we didn’t put any complex analysis books in!”

And that’s assuming that they are even trying to be somewhat honest. They could have an absolutely massive database of textbooks, most lower level but including some graduate level complex analysis texts, and then heavily do the weighting to favor results that talk about complex numbers. Who is going to go through all the training process to check? And if they get really desperate they can just have the model output whatever they want it to say directly, like the recent moltbook debacle.

Further, no company is going to do all their training from scratch. They will base it on existing models, which are all “tainted” already. So even the honest companies aren’t going to be able to meet the requirements.

But most will try to claim that they have anyway, because that’s the way to get investor bucks.
shawn marshall

February 9, 2026, 1:19 pm

I asked Grok to assign probabilities to all the known facts about the Shroud of Turin and then calculate a probability of authenticity. Grok did some stuff – ?modeling? – and stated that thev Shroud was fake 99.99% certain – if I recall correctly. S o I started arguing with Grok with things I knew about the Shroud – pollens, Jerusalem limestone, blood and blood type, 3d info in image, the Sudarium of Oviedo and etc as I could think of it.
Grok finally said that by his calculation the Shroud was 95+% the burial cloth of Jesus Christ.
So it seems if you have enough info you can make the models behave.
I got nowhere on evolution because Grok overwhelmed me with citations to ‘scientific’ papers. I had not the expertise to make the model behave; it is still a garbage in garbage out world – very dangerous now though.
C- Marie

February 9, 2026, 5:51 pm

I understood much of the logic of this. Thank you, Matt!

God bless, C- Marie
Andrew P Partington

February 9, 2026, 6:16 pm

Gödel’s incompleteness theorem proves that no apparently complete logical system can rule out the paradox (“this sentence is not true”) AI hallucinations are actually the logical consequence of these sorts of paradoxes in the AI system.

Only a conscious person can perceive a paradox.
Briggs

February 10, 2026, 5:14 am

Andrew,

No, they’re just bad model predictions. See the “hallucination” article I referenced.
Briggs

February 10, 2026, 5:14 am

C-Marie,

I knew you could.
NLR

February 10, 2026, 8:57 am

The main reason people believe in AGI is because they want to. People want a superintelligent machine that will solve all their problems.

But even if one believes it is possible or likely that humans can make a machine more intelligent than us, why would it be easy for that machine to make one more intelligent than it? Maybe it gets harder and harder each time, so increasing intelligence is not exponential but logarithmic. Many things in nature exhibit diminishing returns, so why not that as well?

People speak as if computers are a “cheat code” for the universe, that they are so powerful that they can take over everything. But if that is true, why do they need such vast infrastructure to operate, why the constant infusion of electricity? As Uncle Mike comment on the substack version of this post:

“AI requires a nuclear reactor and a server farm to think. All I need is a sandwich.”
John Pate

February 13, 2026, 9:18 pm

Qualia are the key to intelligence and not epiphenomena. Fancy auto-complete will never think.

C-Marie on On Not ReadingMarch 12, 2026
Well, I do knpw that Malchi Martin and David McCullough were great authors.... The Final Conclave and John Adams, respectively.…
NLR on On Not ReadingMarch 11, 2026
People have a variety of different reading habits. So the best thing to accommodate that is to preserve a variety…
Johnno on On Not ReadingMarch 11, 2026
Briggs, have you tried ONE PIECE ?
Eyrie on On Not ReadingMarch 11, 2026
Yep, finding the same thing. Unfortunately many of the good authors have died. We lost Michael F Flynn a couple…
BDavi52 on On Not ReadingMarch 11, 2026
Definitely not you. Writing has changed, or if not writing, publishing has changed. It seems obvious that written works are…