Beware Dubious Data Dredging in Florida

by Howard Fienberg and Iain Murray
November 17, 2000

Countless data dredgers have weighed in on the presidential vote in Florida. Unfortunately, many of these analyses were published and circulated online, from where they quickly made their way into the spinning heads of sleep-deprived journalists and pundits. Numbers can be remarkably pliable in crucial times like these, and not all statistical analysis is created equal.

So let’s consider two statistical questions: Was the original vote count in Florida biased in favor of Republican candidate George Bush, and just how unlikely was the number of votes received by Reform Party candidate Pat Buchanan in Palm Beach?

In Online Journalism Review, Lynn Miller, a professor of communications at the University of Southern California, wrote that statistical analysis and probability theory show both candidates should come out of the recount process with equal adjustments (for example, Democratic candidate Al Gore gains 40 votes, but so does Bush). Since that was not the case, she said probability theory dictates that the original count must have been biased in favor of Bush. However, she backs this up with a few faulty assumptions, debilitating her analysis.

Miller compares voting to the flipping of a coin, as if there were only two potential results. There are at least four outcomes in any given two-party election: Candidate A, Candidate B, Spoiled (none of the above) and Void (the voter has tried to vote, but failed) — not to mention the fact that the presidential ballot contained 11 options. Votes for Candidates A and B are more likely than spoiled and void, but not overwhelmingly so. Where the electoral system is more complicated (either in design — like the Single Transferable Vote system — or in mechanics — such as complicated punch ballot papers), void votes will be more common. Any statistical analysis has to bear these facts in mind.

For void votes, it must be asked what type of voter is more likely to cast such ballots. In a complicated system, the less educated voter is more likely to fail to record his or her vote according to the rules of the system, which is why electoral systems should be as simple as possible. In a machine ballot, elderly or injured voters may be less likely to properly punch out their ballots (which calls into question the fairness of machine ballots to begin with).

If any candidate draws support disproportionately from either of these categories, as it seems Gore did, then that candidate is more likely to benefit from special counts of void votes than another. It would be an incorrect conclusion to say the initial count was unfair because the special count revealed more votes for one candidate than another.

Of course, these issues should never become post-election issues. They should be sorted out well in advance of any election. Hopefully, we will learn and do better next time.

As for the seeming “outbreak” of the Buchanan vote in Palm Beach, many have decided that there was something fishy happening on election night. Most fingers point to the now infamous “butterfly ballot.”

How did they decide the vote looked so strange? A display of each of Florida’s counties’ vote totals for Buchanan shows his numbers dramatically higher in Palm Beach than anywhere else. So something obviously must be wrong, right?

Perhaps not. Comparing Buchanan’s vote as a percentage of each counties’ vote reveals that Palm Beach is definitely not an aberrant community — as many as eight other counties saw Buchanan capture a larger percentage of their vote. The reason his vote looked so big in Palm Beach was because that county is one of Florida’s most populous.

Scale is important. Looking at Buchanan’s vote alone in each county yields a set of small numbers that look big on a chart. But as Patrick Anderson of the Anderson Economic Group in Lansing explains, when we include the votes for Bush and Gore on that chart, the scale increases dramatically, and Buchanan’s vote looks like a simple trend of no consequence.

The Buchanan vote in Palm Beach was treated as if it were the voting equivalent of a disease “cluster” — a village with a high number of cases of a specific cancer. This typically prompts cries of outrage and a hunt for a single cause — in Palm Beach, instead of some obscure chemical contaminant, the butterfly ballots took the heat. Later, a broader perspective usually demonstrates that the cluster is a coincidence, a bit of statistical noise. As we have seen, the Buchanan “cluster” is neither a cluster nor a coincidence. It is a matter of perspective.

Of course, we leave it to the partisans to decide whether or not Pat Buchanan’s votes are truly comparable to a disease outbreak. But the adage about “lies, damned lies and statistics” has proved its worth again.

Howard Fienberg is research analyst and Iain Murray is senior analyst with the Statistical Assessment Service (STATS), a nonprofit nonpartisan organization in Washington, D.C. dedicated to improving public understanding of scientific and quantitative information.

