Anons emails about experience with bad modeling. These read very like posts you see here daily. Lightly edited to remove personal information, with links by me.
The post about the Lancet pollution article reminds me a paper rat hole I went down a few years ago after hearing Ahnold, ex-CA RINO governer, state that pollution killed 300k people in the US every year.
My first reaction, since I am not innumerate, is that represents over 10% of the annual deaths in the US.
Normally, I would just chalk it up to the complete ignorance of politicians and actors, but my curiosity was piqued, so I did some research. Since this was several years ago, I am missing some details, but I traced the 300k number to a 2013 paper from MIT as I recall. That paper was widely quoted in papers everywhere, as they always are.
I found the paper and discovered that it had no original research, but was rather a synthesis of some EPA research which measured increases in large particulate pollution combined with some studies on the effects of large particulate pollution on the life expectancy of people with serious pulmonary or respiratory diseases.
Their meta study on the effects of pollution on life expectancy found some effect on the order of reducing life expectancy by about 1-3 months for a base life expectancy of 10 years (with large error bars). Using that, they combined it with the EPA data on increasing large particulate pollution to determine that this effected about 300k people in the US (people with susceptible diseases who lived in the places where such pollution had worsened.).
Also, they made some attempt, but not very successful, to address the overlapping errors estimates of the various underlying studies. But even accepting that all was done correctly, the actual headline should’ve been “about 300k people in the US with serious pulmonary or respiratory diseases with estimated lifespans of 10 years will have those lifespans reduced by a month or two due to pollution”, but it’s hard to fit at the top of a newspaper article, even assuming that the reporters who repeated the nonsense without any understanding of what the studies actually said.
This is what in pre-The Science days we would have called advocacy. It is now The Science. Any tenuous connection they make must be accepted, you deniers.
It’s not only medicine, but in finance, too, as this next email shows.
I have been listening and thoroughly enjoying your podcasts and have discovered some of your older videos on YouTube through them. We have very kindred views on the current issues with science. My educational background is math and physics, but left graduate school to chase bigger bucks doing programming for the financial industry and then moving to quantitative investment research and management.
My last position before retiring was head of quantitative asset allocation research. While you talk mostly about the issues in social research and woke science, the situation in finance research is even worse.
Most studies are done using prices from CRISP, which has closing prices for all US stocks, something on the order of 8000 of them. However, about 7000 of those rarely trade so while they all have ‘official’ closing prices, most of those are just made up by market makers and would be quite different if you actually tried to buy or sell them.
Research in finance basically consists of backtests. You collect price data and some other variables that are associated with stocks or bonds, create a trading strategy, run it through the data and see how it performed. Then you find one that creates a great return and use it to buy and sell stocks.
Of course, they perform badly with new data.
But we are sophisticated, so we construct the parameters of the model using 80% of the data and then see how it works on the remaining 20%.
This was one of my greatest pet peeves which I had arguments about all the time, because if it doesn’t work on the remaining 20% (along with the original 80% as given), you throw it away and tweek the parameters again. So you gain no new info from this ‘rigorous’ backtest.
Somewhat better but with it’s own similar issues is the walk forward model. You start at year N to N+n. Estimate parms, run the model forward to N+n+m, then repeat from N to N+n+m, sort of a Bayesian approach. But again the parameters have to be picked so that it performs well over the entire period.
Then the wee-p and all the other evaluations of models have all the same problems as the rest of ‘science’. There are some researchers who have addressed the severe issues of investment models, but they are largely ignored in practice. Marcos Lopez de Prado has been one of the main advocates of the issues of investment research (like John Ioaniddis is to medical research). But there are some other seminal papers. You may be familiar with some of them, but if not they are worth looking into.
Statisticians call things like that backtest “cross-validation” (done in various ways). They way they cheat in finance is the way they cheat in science, too. Which is why I’m always emphasizing that models must be tested on data never before used or seen in any way.
Buy my new book and learn to argue against the regime: Everything You Believe Is Wrong.
Subscribe or donate to support this site and its wholly independent host using credit card click here. For Zelle, use my email: firstname.lastname@example.org, and please include yours so I know who to thank.