One Milwaukee Ballot Curiosity & One New Theoretical Voter Fraud Tool

Milwaukee is comprised of a number of wards, almost 500, from which was collected the number of votes each candidate received in both the 2016 and 2020 Presidential elections (official source link). A copy of that was given to me (I take it as is) from which comes the following analysis.

In 2016, 8 parties ran for President, including, of course, Trump and Hillary. The other 6 were minority candidates, including write-ins (I call these “Offvote” in the code). In 2020, there were 6 parties, 4 of which were minority.

In each ward, votes went to one or more parties. Some wards voted for more for one party than another. Here is the across-ward distribution of the ratio of total minority candidate votes to total ward votes.

In 2016 (red), the fraction of votes minority candidates received had a large spread. In 2020, the spread was much less. Meaning in 2020, more votes in wards went to either Biden or Trump than in 2016 when votes were more spread out.

This is curious because there were more total votes cast in 2020 than in 2016; 458,935 to 440,992, a 4% increase. Further, the largest fraction of minority candidates in 2020 was 0.000168, whereas in 2016 the largest was 0.000515, a difference of 3 times. One might have expected that with the 4% increase in total votes, there would at least be some wards which had more variability. Especially if there was an increase in mail-in ballots, in which people had all the time in the world to make their choice.

Further, there were 9 wards which only voted either Biden or Trump or no candidate at all, but there were only 5 wards which voted only Hillary or Trump or no candidate. Perhaps the increase in minority choices accounts for this. In these wards, in 2016, Trump got 9 total votes, Hillary none, and in 2020, Trump got 124 votes and Biden got 423. Every vote counts!

Looking at the picture suggests a break point of 0.0001 fraction of minority candidate support. Examining the data for just those wards, the following picture looks at the distribution of (Trump – Democrat) ward vote totals. Positive numbers are Trump gains; negative numbers are Democrat (Hillary or Biden) gains.

Seven wards stick out as curious in 2020, which are listed on the figure. These represent 12,093 total votes in Biden’s favor, whereas in 2016 those same wards were only 120 votes in Hillary’s favor, and even one ward that went Trump’s way. All wards therefore represent large increases for Biden over Hillary.

Now 12 thousand votes is a lot, so the suggestion is to examine these wards more closely.

This is, I emphasize, not proof of fraud. But they are amazing jumps: a lot higher than the 4% increase in total votes would predict.

The data and code for this are here: MilwaukeeVotes2020.csv, MilwaukeeVotes2016.csv, Milwaukee.R. The code was done in a hurry and is uncommented—you’re on your own.

Predictive Analysis

In earlier posts, I explained a terrific way to identify fraud is to examine the frequency of ballot markings, comparing early and late or in-person or mail-ins. If there are noticeable changes in these distributions, it could be evidence of ballot stuffing.

For example, suppose, as some (unconfirmed) reports said, that huge caches of Biden-only mail-in ballots were discovered, I mean ballots on which the only markings were for President and nothing else. This is improbable even if every single one of the ballots were filled in by earnest people wanting to select Biden.

There is some error rate in filling out ballots, or in any paperwork, and the larger the cache, the larger the chance of at least one error, i.e. at least one Trump vote made by mistake. Any cache over even in the large hundreds is deeply suspicious, and anything over 1,000 suggests these were not natural ballots.

Barring that kind of easy evidence, we can still look at distributions of markings where positions other than president were marked.

Here is some unverified, unofficial data, collected by an anonymous contributor to illustrate how this can be done using the method of predictive statistics. This is for illustration only.

In Michigan we have the following statistics of the number of votes cast for President and those, on the same physical ballot, for Supreme court:

Supreme court winner votes = 2,316,459
Presidential winner votes = 2,279,543

Supreme court winner votes = 2,369,012
Presidential winner votes = 2,790,648

In 2016, 98.4% of ballots had marking for both offices, showing a true civic spirit. But in 2020, only 84.9% of the ballots had markings for both offices, and there was also a distinct increase in the number of votes. In Michigan, because of the coronadoom, there were a gargantuan number of mail-in ballots sent out. We await the final numbers. The Secretary of State’s site does not yet have 2020 data posted.

We eschew all statistical tests! P-values are lousy evidence, as discussed endlessly here. Instead, we will make a prediction conditional on the 2016 data of how many ballots would have both positions marked, given we know there are 2,790,648 ballots in 2020.

In other words, it could be that in 2020 all 2,790,648 ballots had only one position marked. We can calculate the probability of this, assuming nothing other than the 2016 data and that there are only two possibilities, that the ballot has two markings or one. The math suggests the prediction of both positions marked is a beta-binomial distribution over the numbers 0, 1, …, 2,790,648 (see this marvelous book or this page for why).

If we do that, we get this picture of the probability of every possibility, from 0 to 2,790,648.

The spike to the right predicts we’d most likely see that many ballots with both positions filled out. The red vertical line shows the actual number of matching ballots.

Here, for fun and for people who haven’t seen these kinds of things, is a blow up of the same picture on the meat of the prediction.

The prediction was that we’d most likely see 2.74 and 2.75 million matching ballots. We saw 2.37 million.

The prediction was, of course, conditional on the old data, but it was also conditional on the assumption that people’s behavior would be the same. It clearly was not.

One possible change in behavior is an increase in cheating, i.e. ballot stuffing with Biden-only ballots (assuming the data is real!). There are, naturally, many other explanations besides this, such as marked disinterest in the Supreme court in 2020. Plus, we only looked at two ballot positions, when we could look at others, too.

This example is only to show you that hard probabilities, and not flawed p-values, can be used in detecting potential fraud.

Since the code to do this is so simple (in R), I paste it here:

newdbinom = function(x,, k.old, n.old){
   # New observables for predictive distribution given old data and guess of how many new data points there will be
   # beta-binomial
   a = k.old+1
   b = n.old-k.old+1
   (ans = exp(lchoose(,x)+lbeta(x+a,,b)))

# used like this:

newdbinom(0:2790648, 2790648, 2279543, 2316459)

Technical update Lot of newcomers, and so some misunderstanding of the shorthand I used in the second section. The model is not “binomial”: it is a parameter-free deduced model using only the premises that there are two states, matching or not, and there are a known number of ballots. The beta-binomial function has no parameters in the usual model sense; all its inputs are known with certainty. You’ll have to read the book or linked page to discover why.

Second, for excellent, unassailable reasons, I deprecate all “testing”, such as p-values and the like. Go to that page and read the articles on the subject, particularly “Everything wrong with p-values under one roof.” Or read the book (not light reading).

For the others who commented/emailed that this post was too technical, all I can say is that some things are not so easy. I’ll try and explain things more simply next time.

