The Great Smoothing: Another Reason Not To Fear (or trust) AI

The Great Smoothing: Another Reason Not To Fear (or trust) AI

Bad news, friends. Some people—none of you, I am sure—pass off AI as if it were their own work. There is a certain profession, which I’ll won’t reveal except to say it starts with “JOURNA”, that does this more-or-less routinely now. Perhaps they learned this in college, where, we hear, student cheating with AI is rampant.

The bad side of this is obvious, so I say nothing more about it. But there is a good side!

I bring you the terrific news (again) that there is no reason to panic about AI. As we discussed many times, what will happen is this: computer models—which is to say, AI—will begin to use AI output as part of its training data. And the output from those duly trained computer models (AI) will again be passed off as genuine by scurrilous operators.

Repeat the cycle.

This will lead to what I have been calling The Great Smoothing.

And the realization of this is the good news: there is no reason to panic over AI.

Again, we have discussed this before, too, but the nice lady I live with told me I ought to repeat myself to make myself understood. So here today is a simple example. If I had access to the code of, say, ChatGPT or some other LLM (the last word, we shall not forget, is Model), then I would prove it to you using it. Alas, I do not. I’m just one (canceled) man on the outer edge of the known internet.

Here is some data we used in Class. It is a time series of satellite of derived measure of Arctic sea ice in millions of square kilometers (no satellite can measure hip kilometers: thank you). That it is derived means we are looking already at a model of sea ice, and not actual sea ice, which nobody knows. But let that pass. Ignore this fundamental twist.

Let’s now AI-ify this; i.e. build a model. We want it to answer the prompt “What is the Arctic Sea Ice extent for date X”, where X is anything within a reasonable range.

All models smooth. Models cut down the highest peaks, and fill the lowest valleys. They are like balloons in a certain way. We can do a better job modeling peaks, push the ballon higher there, but at some expense elsewhere where the balloon distorts. No model is perfect. No model bats a thousand. No model exactly reproduces Reality. We use Reality for that.

But models have uses, and can be good. Here’s an AI of the data in red. (I used a loess with a span of 0.02, from R’s stock library.)

The original data is included. You can see the model smooths, but it’s not at all bad for many purposes. Success!

Now suppose some scientist comes along and tries to pass off this model data as if it were genuine.

“Briggs, that never happens. Scientists are honest, and Science itself is self-correcting.”

That so? Then how do we explain headlines like this recent one: “A medical journal says the case reports it has published for 25 years are, in fact, fiction“. Some 138 cases reports, some used in legal decisions, all faked. Hilarious, right?

“That’s just Science self-correcting itself.”

Well, have it your way. For now, this is proof that data used as genuine may indeed be fake. And we know that is certainly true in areas like “JOURNA”.

That red data is taken as if it real. Then the second generation AI comes along and trains on it. Then it produces output. Which is shown in this picture as green.

I also left the original and red (first generation AI). The green, second generation AI is smoother still. But it’s also still not terrible. We’ve seen many worse models. It would pass.

Yet, and I hope you saw this coming, here is bad boy scientist number two who passes off this second generation AI model data as his real data.

Then guess what happens. Yes, due to pressures of publish or perish another guy does it. Then another. And so on.

Here’s the result of 50 guys doing it:

The blue line is the fiftieth generation AI; that is, AI trained on data successively passed off as genuine. I don’t think 50 is too many, either, especially when you consider how rife cheating is becoming.

We’ve seen this with AI-generated images many times. Somebody starts with a picture and asks AI to generate the same picture again. The output looks similar to the input picture. So similar it takes a sharp man to see any departures. Maybe even the second generation, like ours above, looks close enough to the original nobody would notice.

But by the 50th? It heads right for the average, like our blue line. The peaks, the details in the image (especially behind the faces), have all been shaved away, and valleys all filled. The colors collapse to a murky medium brown. The Great Smoothing wins again.

This is guaranteed to happen. Perhaps some of it can be slowed if training data that is suspected as model output is rejected, but some of the slop will slip through. It is inevitable. Some of this can be mitigated by hard-coding rules like the successive picture example, to say maintain background integrity as they do for models which make “movies”, but you can’t code for all contingencies. It’s a never ending race to keep up.

The AI Slop Smoothing Test

This is easily tested. I mean the importance of it. Simply take an LLM and train it on known good data, i.e. that without AI model output. Then put the model through a range of tests, examining output across a variety of topics.

Then include that output as part of the new training data (we also keep the original training data). DO the same tests. And iterate for some small number of times, like our fifty.

Then look at the final output, particularly compared with the first. The Great Smoothing will win.

Again, this is good news if one of your fears was AI becoming AGI and taking over the world.

Here are the various ways to support this work:


Discover more from William M. Briggs

Subscribe to get the latest posts sent to your email.

7 Comments

  1. JH

    AI can be ‘gullible’ because it takes all the information fed into its optimization or backpropagation system. In many tasks, AI is almost as trustworthy as a skilled human and continues to improve. While AI can be useful in many situations, it, just like ‘Science’, can also be misused by people with bad intentions. So, embrace AI while also relying on your own judgment.

  2. Uncle Mike

    AI isn’t the entity smoothing the data. The alleged decline in Arctic sea ice was a key assertion in Algore’s 2006 “Inconvenient Truth” horror flick. That fake claim was made by fake scientists, human-ish, not AI. They smoothed the data to a straight line in one pass, not 50. I agree that AI does not possess real intelligence, but people are also lacking in that regard. Ask yourself why a trillion dollars have been spent on non-solutions to non-problems like the fake decline in sea ice, and see if you don’t arrive at the same conclusion. (Whether you do or not it proves my case.)

  3. Cary D Cotterman

    After spending years in school doing my own writing, then decades in my profession busting my ass to write somewhere around a thousand technical reports, I’m looking forward to the invention of a definitive test that will out “writers” who use AI to do their work.

  4. Rudolph Harrier

    I’ve started thinking of this as the “juice loosener problem” (after the Simpsons bit where Dr. Nick hawks an orange juice machine that makes a drop of orange juice from “only” a bag of oranges.) Here the oranges are bits of genuine information discovered by humans, with the drop of orange juice being the output of the LLM.

    If you have a huge amount of quality input, you can get enough output for your needs. Just as the Juice Loosener could make a glass of orange juice if you had a warehouse full of oranges, but the ratio between input and output is not good at all. Where this intersects with the main post is that we’ve pretty much used all the available human information out there. Simply saying “put more oranges into the loosener” isn’t going to work anymore. All the data sets that are easily available and which have not already been used are now tainted with AI produced data, and in most cases the AI produced data outnumbers the human produced data. Thus any attempt to improve things by simply putting in more data is going to inevitably run into a smoothing effect. I think that you can already see this with text generation, where sentences read much better than before but cliches are more common than ever. When models are applied to a new type of data, like video, you will see rapid progress because the data that exists is necessarily entirely human made at the start. However, inevitably the data will be more and more AI made leading to more and more of the smoothing.

    My prediction is that AI firms are going to keep shifting attention to new applications while old applications are left in a “okay but not great” state. This is because they know that they have no actual method to take things from “okay” to “great” due to the polluted data, so distracting the public with something new is the only way to keep the investor money flowing in.

  5. Johnno

    I’m panicking about the people who believe enthusiastically in the capabilities of the AI that the other people are panicking about.

    For all we know AI did a double-tap tomahawk on an Iranian girls elementary school, and that’s why we will now have mines littering the Strait of Hormuz, after the Pentagon switched over to OpenAI from Anthropic who did not trust the AI, and panicked about it to the DOD telling them to impose restrictions.

  6. NLR

    “AI” will not come alive, but also, it has been over three years and none of the “AI” predictions have come true. One was that “AI” was inherently informative and it would lead to propaganda breaking down. Also, journalists would be fired which would weaken media manipulation. Well that didn’t happen; media manipulation is as strong as ever. It doesn’t even make sense that journalists being fired would lessen propaganda, people who want to make it can just have machine-written articles.

    Then, there was the idea that it would lead to a renaissance of creativity in the arts. Well, a lot more stuff has been produced, but what creativity there is comes from the human. There are math and science textbooks being sold on amazon that are obviously machine generated, where one person has supposedly written textbooks on every area of physics. Is this an indication that with “AI” a person can be superintelligent? Obviously not; it is just a scheme to separate unsuspecting people from their money.

    Then there is the idea of inevitable creative destruction, that whatever was wrecked by “AI” new things would inevitably come about. Well, I’m still waiting for those inevitable “jobs that haven’t been invented yet” people were talking about so much in the early 2000’s. “AI” is degrading things, but if creative destruction is so inevitable as all that, then where are all these new good things?

    Then there is the whole “make money with AI”. It’s just a meme. Some people already have a job and are either allowed to use “AI” for it or forced to. But clearly, their income comes from the job itself, not the “AI”. Then you have people who work in IT who are making money designing or programming “AI”. Well, they already would have been making money working with computers. Then there are people who already have an audience who will buy much of what they produce. It is not the “AI” that is making them money, it is the audience. Then there are scammers. But “AI” is just one more means to trick people.

    The basic point is that people will go on at length that AI must work a certain way and make elaborate predictions based on it. But if the fundamental thesis is wrong and not in a subtle way, but where you can just look at the world and see that it does not apply, then there is no reason to believe in the all the elaborations upon that basic thesis.

  7. Mark

    However, haven’t The Great Smoothing and Murky Brown Collapse been strived for as something desirable for quite a while by now? These terms seem to accurately describe the dreams of the megacorps &c. wishing for interchangeable productive units and globally unified legal-financial frameworks, with no particularities and inconsistencies? If so, is the present argument really a case *against* A(G)I taking over the world (at least in the sense of sucking up all its capital which may very well be the only relevant sense for those pushing it)?

Leave a Reply

Your email address will not be published. Required fields are marked *