I am not a pollster or a statistician, but I have been thinking about some factors that may cause problems with the polling of this year’s presidential election. I welcome input from anyone with expertise in this area.
1. Cell-phone only voters.
I gather from this piece in the New York Times caucus blog that several prominent pollsters now routinely include cellphone samples in their surveys in light of the growing number of Americans who use only or mostly cell-phones.
I also know that a Pew Research Center survey taken in July suggested that Americans who use cell-phones are not that different politically from the population at large. To be more precise, people who use cell-phones most of the time are very much like the electorate at large, while people who exclusively use cell-phones are a bit different, but are also less likely to vote:
The cell-only and cell-mostly respondents in the Pew poll are different demographically from others. Compared with all respondents reached on a landline, both groups are significantly younger, more likely to be male, and less likely to be white. But the cell-only and cell-mostly also are different from one another on many characteristics. Compared with the cell-only, the cell-mostly group is more affluent, better educated, and more likely to be married, to have children, and to own a home.
We know from many years of polling that married people, people with children and home-owners are all groups more likely to vote Republican than the population at large. The Pew study also found that
In the current poll, cell-only respondents are significantly more likely than either the landline respondents or the cell-mostly respondents to support Barack Obama and Democratic candidates for Congress this fall. They also are substantially less likely to be registered to vote and – among registered voters – somewhat less likely to say they are absolutely certain they will vote. Despite their demographic differences with the landline respondents, the cell-mostly group is not significantly different from the landline respondents politically.
Yet as Pew has found in the past, when data from landline and cell phone samples are combined and weighted to match the U.S. population on key demographic measures, the results are similar to those from the landline survey alone.
I get that the phenomenon of cell-phone-only users is probably not introducing large errors in poll findings.
My question is, does the proportion of cell-phone only Americans differ substantially from state to state, or is it a fairly uniform phenomenon across the country? To put it another way, are certain regions of the country, or states with a higher percentage of urban residents, more likely to have larger than average numbers of people who use cell-phones exclusively?
If any swing states have a particularly large number of cell-phone only residents, that would be interesting to know. It could affect the accuracy of polling in that state (depending on the methodology of the polling firm and whether it includes cell-phone samples).
2. Weekend samples for tracking polls.
I was unable to find the archive of Rasmussen’s 2004 presidential tracking poll results, but my memory is that there was a clear pattern whereby Kerry did a little better in the samples taken on weekdays, and Bush gained ground in the samples taken on weekends.
That created the appearance of small movement toward and away from each candidate, with the pattern repeating almost every week in the late summer and fall. I remember reading some speculation that Bush was consistently doing better on the weekends because Democratic-leaning demographic groups are more likely not to be at home on Fridays and Saturdays.
I would like to know whether that is true, and if so whether the major tracking polls (Gallup and Rasmussen) are doing anything to account for this problem.
When we see shifts in tracking polls, we assume voters are reacting to the news of the last few days, but perhaps this is just an illusion created by changes in the pool of people who answer the phone on certain days of the week.
3. Weighting for party ID, race or other factors.
What is considered the best practice in terms of weighting poll results if the sample differs from the demographics of those who voted in the 2004 presidential election?
A Survey USA Virginia poll recently found McCain leading Obama by 48 percent to 47 percent. Commenting on the finding, fladem pointed out that the poll
had 19% of the electorate made up of African Americans. In 2004 it was 21%. I have got to believe that African American participation will be higher than 2004.
I share fladem’s belief, not only because Obama is black, but also because Obama has at least 35 field offices in Virginia, a state Kerry wrote off.
(UPDATE: fladem tells me that there is some evidence that 2004 exit polls overstated the share of the black vote in Virginia.)
We know that registering new voters in groups likely to favor Obama is a crucial part of his campaign strategy. Speaking to David Broder, campaign manager David Plouffe
said that “turnout is the big variable,” and the campaign is devoting an unusually large budget to register scads of new voters and bring them to the polls. “That’s how we win the Floridas and Ohios,” he said, mentioning two states that went narrowly for George W. Bush. “And that’s how we get competitive in the Indianas and Virginias,” two of six or seven states that long have been Republican — but are targets this year.
“That’s why I pay more attention to the registration figures than to the polls I see at this time of year,” Plouffe said. “The polls will change, but we know we need 200,000 new voters to be competitive in Georgia, and now is when we have to get them.”
Should pollsters adjust state poll findings to reflect the Obama campaign’s massive ground game and voter registration drives? How would they do that?
If a polling firm routinely weights for race, should the pollsters assume that the racial breakdown of the electorate will be roughly the same in a given state as it was in 2004? If not, what should they assume?
I have a similar question with respect to party ID. We’ve seen in state after state that the Democratic Party has gained significant ground on the Republican Party in terms of voter registration. In Iowa, there were about 8,000 more Republicans than Democrats in the summer of 2004, but as of June 2008, there were more than 90,000 more Democrats than Republicans. That’s a huge shift in a state where about 1.5 people voted in November 2004.
Are pollsters weighting for party ID, and if so, are they accounting for the big gains in Democratic voter registration since the 2004 or 2006 elections?
4. The disparity in the two campaigns’ ground games.
I know that different pollsters use different screens to separate likely voters from the rest of the sample. One indicator sometimes used is whether the respondent voted in the last presidential election.
But the Obama campaign turned out incredible numbers of first-time voters during the Democratic caucuses and primaries. I laughed at the Des Moines Register’s final pre-caucus poll projecting that 60 percent of caucus-goers would be first-timers, but that turned out to be almost exactly right.
For the general election, the Obama camapign is building a field operation on a scale never seen before.
To further complicate matters, the Obama field operation is enormous in many states where Democrats have not competed in recent presidential races. I mentioned the 35 field offices in Virginia already. Soon there will be 22 field offices open in North Carolina. There are at least 26 Obama offices up and running in Indiana. Even North Dakota has four Obama field offices. Al Gore and John Kerry bypassed all of those states.
Obama’s ground game is going to be much bigger than Kerry’s ground game was even in the swing states Kerry targeted. Nevertheless, it seems reasonable to assume that the increased turnout of groups that skew toward Obama (e.g. blacks, voters under 30) will affect the demographic composition of the electorate more in states where Democrats had nothing going in 2004.
Should pollsters do anything to account for this factor? Could they take this into account even if they wanted to?
I know that some of my questions are unanswerable, but I appreciate any insight readers can provide.
(SECOND UPDATE: Thanks to the reader who wrote me to point out that the factors I mention, while not necessarily reflected in polling, may be reflected in online prediction markets that currently show Obama with a 20 percent greater chance of winning the election than McCain.)