Well, you were wrong, weren’t you. They’ve all but disappeared from cocktail party discussions. Turns out machine learning algorithms didn’t triumph, either.

Yet something has to come up on top. What will reign supreme? Topological data analysis, baby! Or so says the folks interviewed by *Wired* in their article “Scientific Data Has Become So Complex, We Have to Invent New Math to Deal With It.” Story of some guys who say we’re in the midst of the “big data equivalent of a Newtonian revolution, on par with the 17th century invention of calculus.”

But before we wax eloquently about our newest warrior against uncertainty, let’s cast our minds back to the 1990s, when we regularly came across things items like this.

Neural nets are *universal* function approximators! Any function you can think of, and even those you can’t, can be tossed in the trash. Who needs ’em? Just think. Some function out there explains the data you have, and since this function is probably too complicated to discover mathematically, all we have to do is feed these brain-like creatures the data and they’ll figure out the function for you.

The more data you give them, the more they *learn*. Pictures of brains, pictures of synapses, pictures of naked interwoven dendrites! It was so sexy.

Well, as said, we know how that turned out. The cycle has since been repeated with other Holy Grail methods, though it has never reached the same peak as neural nets.

You have to hand it to the computer algorithms set. They have the best marketing team in science. Who wants to “estimate” the “parameter” of a non-linear regression when you can “input” data into a “thinking” machine? Why not embrace *fuzzy* logic, which is hip and cool, and eschew dull probability? Hey, all these things are equivalent, but nobody will notice.

Or maybe they will. Don’t forget to read the Machine Learning, Big Data, Deep Learning, Data Mining, Statistics, Decision & Risk Analysis, Probability, Fuzzy Logic FAQ.

Back to topological data analysis. Idea is to take enormous data sets and twist and turn them as you would donuts into coffee cups (let him would readeth understand) and store only the pattern and not the details (dimension reduction). I like this approach, and surely there will be plenty of neat and nifty tricks discovered (see the article for some fun ones).

It’s not a new idea. Remember “grand tours” of data? These were big about fifteen years ago. Cute graphics routines which let you pick off a few dimensions at a time and spin them round and round until you saw (if there was anything to see) how a “random” scatter of points collapsed to something predictable looking.

Slick stuff, and useful. *Wired* gives the example of the Netflix prize, where the idea was to find algorithms that made better preference guesses because “even an incremental improvement in the predictive algorithm results in a substantial boost to the companyâ€™s bottom line.” And, lo, some group won with an algorithm that did find an incremental improvement.

That’s our lesson: *incremental*. Human behavior is so complicated that it’s doubtful—I’d even say almost certain—that no Hari Seldon will ever exist. No human being, or machine created by one, is going to discover an equation or set of equations which predict behavior at finer than the grossest levels and for time spans greater than (let us call them) moments.

The boost was *incremental*. Meaning it was a tweak and significant uncertainty remained. That’s what the neural net folks never figured on. Even if we *knew* (100% certainty) what the weights were between “synapses”, it did *not* mean, and it was not true, that we knew with certainty the thing modeled.

Statisticians forget this, too. Equivalently, even if we knew (100% certainty) the values of the parameters in some model, it does not mean, and it is not true, that we know with certainty the thing modeled. This is why I argue endlessly for a return to focus on the things themselves we’re modeling, and away from parameters.

That’s another reason to like machine “learning” and this new-ish idea of topographical data analysis. The focus is on the right thing.

——————————————————————————

*A reader sent me this article, but I can’t recall who and I have lost the original email. I apologize for this. I hate not giving credit.*

One of my favorite anecdotes is the Philip Tetlock piece a couple years ago about the Ivy League grad students who were given the problem of predicting the rule for which side of a T-maze a reward appears, and seeing if they could beat a Norwegian rat given the same choice.

The students observed several hundreds or thousands of trials, built all kind of fancy models, etc etc. then both the students and ther at graduated from the training set to the test set.

The students predicted the correct side 52% of the time.

The rat predicted the correct side 60% of the time.

As Tetlock stated, “outsmarted by a rat”

Of course, the actual rule was simply the reward being distributed between the sides 60/40 i.i.d. Obviously. The rat simply always turned the same direction. Obviously.

——

On the topology, wonder you thoughts on the finding [?!] a couple years ago where when 3 by 3 neighborhoods of pixels in graphics images were examined, their distribution in 9-space seemed to cluster into (IIRC) a four dimensional Klein bottle. Apparently this was reproduced over a wide variety of images. This seemed credulous to me but I never saw a followup debunking it.

Oh, I dunno. Look at what neural networks have done to date: trains, planes, cars, Miley Cyrus. Well, the last may be an oops. The problem with NN is they need to be very large and are slow to train. Some NNs were surprised when the tiny ones smaller than those in an ant’s brain didn’t perform well — particularly with noisy data. Still, NNs are Nature’s solution to generality and learning.

Topological data analysis strikes me as pattern recognition on steroids. I predict they will succeed only with problems that are peculiarly suited for them.

Makes me wish I’d stayed in topology. My old professor, J.Douglas Harris, has been doing work lately at the interface between topology, statistics, and computer architecture. Something about adjoint functors and Stone-ÄŒech compactification. Heap big mojo. I do recollect many moons ago, when we were working on proximity spaces that there was a relation to statistics. I used to say that proximities were “t-tests without numbers.” Conversely, the t-test defined a proximity space on the data: which numbers were “close.” I guess the p-value turns out to be a metric of some sort.

The thing that killed neural networks is that existing computer technology is insufficient to implementing that idea. If the technology to build them ever comes along they may well live up to the hype.

Briggs, your statement: “This is why I argue endlessly for a return to focus on the things themselves weâ€™re modeling, and away from parameters.” I do not understand this. It is all models whether recognized or not. Even the brain interprets the retina output using models which develop early in life.

To echo Ye Olde Topologist/Statistician, “many moons ago” while the pages remained redolent of fresh ink, I bought Joseph Weizenbaum’s cautionary “Computer Power and Human Reason”, which was largely dismissed by the Young Turks in the programming realm for being “naively” humanistic.

JJB

Gotta side with Briggs on this one. These learning strategies seem to come and go out of style. We could make a list… genetic algorithms, kernel pca, support vector machines, independent component analysis etc etc. I’ve come to react to the latest ones with a yawn. My limited understanding of the topological methods is that they involve finding a hypersurface-like graph in the data using k-nearest neighbors or some such, then using that to estimate eigenfunctions of the Laplacian. Kind of neat, and it’s easy to cook up an example where it does a good job where many other methods fail. Of course, all the other methods can do that and are sold that way.

I’m glad Chaos Theory and Nonlinear Dynamics started looking at this years ago with Henri Poincare (poincare sections) as well as looking at attractors in a system using Attractor Reconstruction in Phase Space. Visually see how the system proceeds in time from the data and be able to draw conclusions about how the system’s attractor changes due to some change in the system, either from pharmaceutical intervention (therapeutic or exploratory) or from a certain diseases.