When it comes down to writing about an election that hasn't happened yet, we have a LOT of analysis of polling and what it's "telling us."
Except, what if it's lying?
Brian Klaas wrote a penetrating piece at The Atlantic that should be required reading for the poll obsessives. Here are some excerpts:
He begins by noting what should be obvious, but I don't think is:
The widespread perception that polls and models are raw snapshots of public opinion is simply false. In fact, the data are significantly massaged based on possibly reasonable, but unavoidably idiosyncratic, judgments made by pollsters and forecasting sages, who interpret and adjust the numbers before presenting them to the public. They do this because random sampling has become very difficult in the digital age, for reasons I’ll get into; the numbers would not be representative without these corrections, but every one of them also introduces a margin for human error.
We think of polling as a quantitative measurement of the electorate. It simply is not. It's a qualitative lens we put over numbers and present them as quantitative truth. Yet, polls have been a staple of political coverage for a long time. Early polling was crap, so they introduced ways to collect a random sampling and then model from there. He writes:
The basic logic of the new, more scientific method was straightforward: If you can generate a truly random sample from the broader population you are studying—in which every person has an equally likely chance of being included in the poll—then you can derive astonishingly accurate results from a reasonably small number of people. When those assumptions are correct and the poll is based on a truly random sample, pollsters need only about 1,000 people to produce a result with a margin of error of plus or minus three percentage points.
The caveat here is you get a good, random sample of 1000 people. This was easier when we had land lines and people actually answered their phones.
These shifts in technology and social behavior have created an enormous problem known as nonresponse bias. Some pollsters release not just findings but total numbers of attempted contacts. Take, for example, this 2018 New York Times poll within Michigan’s Eighth Congressional District. The Times reports that it called 53,590 people in order to get 501 responses. That’s a response rate lower than 1 percent, meaning that the Times pollsters had to call roughly 107 people just to get one person to answer their questions. What are the odds that those rare few who answered the phone are an unskewed, representative sample of likely voters? Zilch. As I often ask my undergraduate students: How often do you answer when you see an unknown number? Now, how often do you think a lonely elderly person in rural America answers their landline? If there’s any systematic difference in behavior, that creates a potential polling bias.
To cope, pollsters have adopted new methodologies. As the Pew Research Center notes, 61 percent of major national pollsters used different approaches in 2022 than they did in 2016. This means that when Americans talk about “the polls” being off in past years, we’re not comparing apples with apples.
I suppose there's a bit of bias by asking young people if they answer unknown numbers, but...does anyone answer unknown numbers?
Then you get the "weighting" of various demographics.
No matter the method, a pure, random sample is now an unattainable ideal—even the aspiration is a relic of the past. To compensate, some pollsters try to design samples representative of known demographics. One common approach, stratification, is to divide the electorate into subgroups by gender, race, age, etc., and ensure that the sample includes enough of each “type” of voter. Another involves weighting some categories of respondents differently from others, to match presumptions about the broader electorate. For example, if a polling sample had 56 percent women, but the pollster believed that the eventual electorate would be 52 percent women, they might weigh male respondents slightly more heavily in the adjusted results.
Again, pollsters are guessing as to who will show up and actually vote. They make these guesses with an eye not towards accuracy but towards not repeating past mistakes.
Let's turn to the NYTimes polling guru, Nate Cohn.
...pollsters have made major methodological changes with the potential to address what went wrong four years ago. Many of the worst-performing pollsters of 2020 have either adopted wholesale methodological changes or dropped off the map. Some have employed a technique called “weighting on past vote,” with the potential to shift many otherwise Democratic-leaning samples neatly in line with the closer result of the 2020 election.
Basically, pollsters do not want a repeat of 2020 when they did dramatically underestimate Trump's strength - EVEN THOUGH HE WOUND UP LOSING.
Also, there was a pandemic going on. It was in all the papers.
Then there is this bit which...yeah.
It’s hard to overstate how traumatic the 2016 and 2020 elections were for many pollsters. For some, another underestimate of Mr. Trump could be a major threat to their business and their livelihood. For the rest, their status and reputations are on the line. If they underestimate Mr. Trump a third straight time, how can their polls be trusted again? It is much safer, whether in terms of literal self-interest or purely psychologically, to find a close race than to gamble on a clear Harris victory.
At the same time, the 2016 and 2020 polling misfires shattered many pollsters’ confidence in their own methods and data. When their results come in very blue, they don’t believe it. And frankly, I share that same feeling: If our final Pennsylvania poll comes in at Harris +7, why would I believe it? As a result, pollsters are more willing to take steps to produce more Republican-leaning results. (We don’t take such steps.)
Basically, he admits that other pollsters weight their samples weirdly, but the Times would never do so.
None of this address Klaas' points about the unreliability of the data in the first place, but it does suggest why we have seen an unbelievable amount of "herding" towards the same results across prestige polls.
The central conceit of a tied race has been the default of the political horse race press for months now. It goes back to before the primaries. What it leaves out is some basic facts of an actual robust sample size:
- Trump lost the 2020 election as an incumbent in the midst of a national emergency. You don't often see that. He has never been popular, yet polls show him more popular than ever, because....?
- In the GOP primaries, even after she dropped out, Nikki Haley was getting between 10-20% of the Republican vote. That's a remarkable protest vote.
At the same time, since Trump last lost an election, he has
- launched a coup against electoral democracy
- been impeached a second time
- seen his judges overturn Roe
- been convicted of 34 felonies
- ducked additional debates
- ducked the 60 Minutes interview
- has been more and more erratic in his speech
- his campaign appearances are just low energy
- threatened various forms of vengeance on his enemies
- had numerous members of the Republican Party come out against him
- held a hate rally at Madison Square Garden
Yet, according to Nate Cohn and other pollsters, we are supposed to believe that he has enlarged his electoral coalition?
The Harris Campaign made a decision in July to wed a campaign of joy with a fundraising and GOTV campaign based on fear. They actually are pretty OK with public polls saying that the swing states are tied. They have their own polling, which is traditionally more rigorous than public polls. Plouffe says that the late-breaking deciders (who the fuck are these people) are breaking overwhelmingly towards Harris. Early voting sure seems to favor Harris, if we account for gender dynamics.
It is still possible that Trump can win, because... he did so before.
Still, the novelty seems to have worn off. He still has Cult 45, but I just don't see how he's expanded beyond that.