The DPP finally settled on its presidential nomination procedures last week. Among the most controversial of the decisions was the question of whether to incorporate cell phones into the polling primary sample. At first glance, this might seem like an extremely arcane and technical matter, hardly the stuff of political controversy, much less the type of thing that could swing a presidential election. However, just as in tax laws and Google user agreements, the fine print matters more than you might expect. In this post, I want to look at why this has become such an important question.
A good starting place is with a recent TISR survey. The topic of this survey was satisfaction with President Tsai after three years in office, but we are not really concerned with that. This survey had roughly half the sample from landlines and half from cell phones. At the bottom of the report, TISR presents a breakdown of the two samples by age and education.
population | landlines | Cell phones | |
20-29 | 16.3 | 4.7 | 21.5 |
30-39 | 18.9 | 12.5 | 16.7 |
40-49 | 19.3 | 15.5 | 22.1 |
50-59 | 18.9 | 21.1 | 18.2 |
60-69 | 15.4 | 27.2 | 15.7 |
70&up | 11.3 | 18.9 | 5.8 |
. | |||
Primary school | 13.1 | 17.2 | 4.4 |
Middle school | 12.2 | 13.5 | 6.3 |
High school | 27.7 | 30.4 | 31.5 |
Technical college | 12.0 | 11.1 | 11.5 |
University | 27.3 | 21.5 | 36.6 |
Graduate school | 7.7 | 6.3 | 9.7 |
As you can see, the two types of samples are quite different from each other and from the population. Landlines drastically underrepresent younger voters and voters with higher education levels. Cell phones are much closer to the population on age, underrepresenting only the oldest category and overrepresenting only the youngest category. On education, however, cell phones significantly underrepresent people with lower education levels and significantly overrepresent people with higher education levels.
Almost no one simply presents the raw data as an estimate of the population. Instead, the respondents are weighted according to their share of the population. Typically, they will be weighted by variables that we have authoritative data on, such as age, sex, and region. Some analysts will also weight on education level, but this is much riskier since we don’t have great statistics for the population. (Government stats are based on household registration data, and not everyone’s education level is accurate in that database.) I don’t know exactly how the DPP weights its results, but I assume they use age, sex, and perhaps city/county. I don’t think they ask about education levels in their polling primary questionnaire.
Assume we only had the landline sample from above with 1000 responses. The 47 respondents aged 20-29 would be weighted up by multiplying each response by some number, on average 16.3/4.7=3.47, though that number would also be adjusted according to their sex and region. The estimate of the population would thus have 163 weighted responses from the 20-29 age group, not 47.
What this means is that, if those 47 people accurately reflected the 20-29 age group as a whole, the weighted estimate would be a pretty good estimation of the population. Think about what this means. If the only things skewing the sample are age, sex, and region, then weighting should solve that problem. Landlines should give a good estimate of the population. Of course, exactly the same logic applies to cell phones. Thus, landlines and cell phones should provide exactly the same estimate. It shouldn’t matter whether cell phones are included in the polling primary, and it shouldn’t matter what percentage of the responses are collected from cell phones.
Of course, you have probably already spotted the flaw in this logic. Age, sex, and region are NOT the only things skewing the samples. We can see quite clearly that education is also different in the two samples. The 20-29 year-olds who answer landline calls are not like the 20-29 year-olds who answer cell phones calls. What kinds of young people answer landline calls? My guess is that the overwhelming majority live with their parents, who still have landlines. One might imagine that people living with their parents have different socialization experiences, can be mobilized by different social networks, and get information from different sources.
TISR also asked whether respondents had only a cell phone, only a landline, or both. I don’t have much to comment about this; I just think it is neat.
population | Cell only | both | Landline only | |
20-29 | 16.3 | 28.7 | 10.4 | 1.9 |
30-39 | 18.9 | 23.8 | 13.7 | 2.8 |
40-49 | 19.3 | 19.3 | 20.3 | 6.5 |
50-59 | 18.9 | 13.5 | 22.8 | 8.3 |
60-69 | 15.4 | 9.9 | 23.2 | 31.5 |
70&up | 11.3 | 4.9 | 9.7 | 49.1 |
. | ||||
Primary school | 13.1 | 4.4 | 8.4 | 43.5 |
Middle school | 12.2 | 4.4 | 9.4 | 25.0 |
High school | 27.7 | 30.2 | 32.4 | 21.3 |
Technical college | 12.0 | 9.3 | 12.7 | 4.6 |
University | 27.3 | 39.1 | 29.3 | 5.6 |
Graduate school | 7.7 | 12.4 | 7.8 | 0.0 |
So if the people who answer cell phone and landline surveys are different in important ways (even when they are weighted to make them look demographically similar), what does this mean for the DPP’s polling primary? Conveniently, a recent TVBS poll report illustrates the importance of the DPP’s polling choices quite nicely. This poll is a few weeks old (conducted April 29-May 8), and used half cell phones and half landlines. TVBS weights their results by sex, age, region, and education, so the results presented below are all weighted. Most people probably only paid attention to the horse-race results. When you look at these, remember that TVBS usually has the KMT candidates several points stronger than most other polling organizations. Anyway, we aren’t really concerned about the KMT or Ko in this post; this is a post about Lai and Tsai. But just for fun, here is the big table:
KMT | DPP | IND | KMT | DPP | IND |
Han | Tsai | Ko | 39 | 25 | 26 |
Han | Lai | Ko | 39 | 24 | 27 |
Kou | Tsai | Ko | 31 | 24 | 30 |
Kou | Lai | Ko | 31 | 24 | 30 |
Chu | Tsai | Ko | 26 | 24 | 33 |
Chu | Lai | Ko | 27 | 25 | 33 |
Wang | Tsai | Ko | 15 | 23 | 38 |
Wang | Lai | Ko | 13 | 24 | 37 |
Han | Tsai | 50 | 38 | ||
Han | Lai | 48 | 40 | ||
Kou | Tsai | 43 | 36 | ||
Kou | Lai | 42 | 40 | ||
Chu | Tsai | 40 | 40 | ||
Chu | Lai | 37 | 43 | ||
Wang | Tsai | 27 | 39 | ||
Wang | Lai | 25 | 44 |
A couple of points are interesting. The overall results change much more as the KMT candidates are rotated in than with the DPP candidates. In the three-way races, support for the DPP is remarkably stable no matter which one is included. However, Ko takes quite a bit more support from some KMT candidates than others. In the two-way matchups Lai is usually 3 or 4 points ahead of Tsai, while in the three-way matchups they are essentially tied. You can see that having Ko included in the DPP polling primary question is beneficial to Tsai. Moreover, in the two-way matchups, Tsai is closest to Lai against Han. And the only time that Tsai actually beats Han Lai is in the three-way matchup with Han. This finding is not unique to this survey. Han and Ko soak up a lot of disillusioned voters that might otherwise turn to Lai. It is not a coincidence that the question the DPP will use in the polling primary is the three-way race with Han and Ko. This is Tsai’s best chance to win. She is by no means guaranteed victory, but using this question helps her odds immensely.
OK, back to cell phones and landlines. The reason that this TVBS poll is so useful is that their report broke down the results by cell phones and landlines. Here is the first question:
All
(100%) |
Landlines
(47%) |
Cell phones
(53%) |
|
Han | 39 | 41 | 38 |
Tsai | 25 | 27 | 23 |
Ko | 26 | 21 | 30 |
None | 7 | 7 | 7 |
undecided | 3 | 4 | 2 |
Both Han and Tsai do slightly better in the landline group, while Ko does quite a bit better in the cell phone group. Yes, you got that right. Tsai is 4% stronger in landlines than in cell phones. Here is the second question:
All
(100%) |
Landlines
(47%) |
Cell phones
(53%) |
|
Han | 39 | 41 | 38 |
Lai | 24 | 31 | 19 |
Ko | 27 | 17 | 35 |
None | 7 | 6 | 7 |
undecided | 3 | 5 | 2 |
Now you can see the difference. Lai is a LOT stronger in landlines than in cell phones; the gap is 12%. When you only ask landlines, Lai beats Tsai by 4%. However, if you only ask cell phones, Tsai is 4% better than Lai. When you put them together, Tsai comes out slightly ahead.
(By the way, also note that Han is exactly the same in both samples, and Ko is much stronger among cell phone respondents.)
Lai is screaming that the polling primary has been rigged against him. It is true that they choose the best question for Tsai. It is also true that Tsai does better with half the sample taken from cell phones than if all responses are from landlines. However, what the stats listed above show is that an all-landline sample is not representative of the whole population. That is, the method that Lai considers to be the default was skewing the estimate dramatically in his favor. If the DPP had adopted a 100% cell phone sample, he would have had a good argument that it was biasing the estimate unfairly toward Tsai (though the tables above indicate that cell phones are not quite as skewed as landlines). However, the two sources balance each other relatively well. A 50-50 split (plus weighting for age, sex, and region) is actually not a bad balance. It is certainly more representative of the overall population than either a pure landline or a pure cell phone sample. I’m inclined to argue that the DPP’s decision to use a 50-50 sample should be seen more as undoing the previous bias toward Lai than as creating a new, unfair bias toward Tsai.