Everyone wants to know what the latest poll says, but the side doing badly always finds reasons to doubt the poll results. In Taiwan, these doubts are often legitimate. There is often a significant gap between polling results and election results. In this post, I’m going to explore some of the reasons for those gaps.
This doesn’t mean that I think polls are useless. Quite the contrary. Polls are often the best objective evidence we have for how a race is shaping up. The point of this article is merely to point out that polls are not perfect, and they need to be interpreted with a critical eye.
Sampling Error
The goal of the survey is to estimate the population mean. Theoretically, Ma Ying-jeou has an actual approval rating in the general population. If you knew how every person in Taiwan felt about Ma, you could simply count up the percentage who approve of him. In reality, we never quite know this number. Instead, we take a sample of the population, and infer what the population mean is from the sample mean. Sampling depends on randomness and the law of large numbers to produce accurate estimates. However, it is important to remember that what you get is always an estimate, and it is almost certainly a little different from the true population mean.
Reputable polling organizations always report their results with a margin of error. For example, they might report that Ma Ying-jeou’s approval rating is 35%, with a margin of error of 3%. In other words, if you took the same exact poll an infinite number of times, you would get different estimates of Ma’s approval rating, but 95% of those estimates would fall between 29% and 41%. This 3% error is sampling error. It is based on the assumption of a true random sample, in which each person in the population has the exact same probability of being sampled. Most surveys are not actually true random samples, but even if everything were perfectly random, we would still have this sampling error.
The sampling error is quite simple to estimate. If n is the sample size (the number of people interviewed), then the sampling error is simply 1/√n. If n=1600, then the sampling error is 1/40=2.5%. The larger n is, the more precise your estimate is.
Look at the hypothetical example of Ma’s approval rating. A range from 29% to 41% is pretty big, and a 29% approval rating has very different political implications than a 41% approval rating. A 3% error would imply a sample size of 1067. Now think about polls that estimate candidate’s support in various areas of the electorate. If they have a sample size of 1000 and they estimate the support in five different areas, each area has a sample size of roughly 200. This yields a sampling error of about 7%. So if the point estimate were 35%, the possible range would be from 21% to 49%. This is basically useless.
That’s the theory. In practical experience, the estimate usually stabilizes by about 400 cases. Since most media surveys poll somewhere from 800 to 1000 people, if they divided the sample into two equally sized groups (eg: men and women), you are probably ok. If there are more groups (eg: KMT supporters, DPP supporters, and independents), you’ll need a correspondingly bigger sample. If the groups are different sizes (eg: Minnan, Hakka, Mainlanders, and aborigines), some of the estimates might be ok, and some might be useless.
Just remember, even if everything is done perfectly, you can’t avoid sampling error.
Nonsampling Error
Pollsters want you to focus on sampling error; it makes them look scientific and objective. However, I think sampling error is probably the least of our worries. I believe (without much evidence) that most of the deviations between poll results and the true population mean are the results of nonsampling errors. There are lots of different sources of nonsampling errors. Here, I’ll discuss a few that are common (or commonly suspected) in Taiwan.
Coverage is almost certainly a problem. The theoretical population (eg: all eligible voters in Xinbei City) and the pool from which people are sampled are usually different. Most surveys are done by telephone. Almost everyone has a telephone these days, but that doesn’t mean you can reach everyone. Some people only have cell phones, and most polls don’t call cell phones. Some households have many members, and while every household telephone might have an equal probability of being called, members of large households have a smaller probability of being interviewed. Some people are never home to answer the phone. Most surveys are done in the evening. Some people are never at home in the evening. Lots of voters in Xinbei City don’t actually live in Xinbei City. Their household registration might be there, but they actually live (and answer telephones) in another county or city where they work, study, or whatever. It is not difficult to imagine that some people are systematically undersampled. If these people have different political stances than people who are in the actual (not theoretical) population, then the estimate might be off.
Every polling organization has its own methodology, and these different practices make a difference. One of the easiest effects to see is in the percentage of non-responses. There are always a percentage of respondents who will answer, “I don’t know.” Most organizations will try to get these people to express an opinion, but eventually you have to accept some level of non-responses. Different organizations will have different practices. Do you ask twice? Three times? Do you allow the interviewer to rephrase the question (in a standardized way)? Do you allow the interviewer to explain difficult ideas (eg: “ECFA is a trade agreement with the mainland”)? Some organizations (usually academic centers) hire only students as interviewers, others hire only women (believed to be less threatening), and others use anyone they can find. None of these practices are objectively right or wrong. What you want is consistency. For example, TVBS consistently shows low levels of non-responses, and United Daily News consistently shows high levels of non-responses. Both of these indicate consistent methodological practices. (I once saw a Korean survey with zero non-responses. That scared me.) If you look at my page with survey results for this year’s elections, you will see that the I-tel results are all over the place. Not coincidentally, they don’t bother to report their non-responses or even their sample size. I don’t trust their results at all.
Many people believe that the political affiliations of the polling organization are important. If someone calls up and says, “I’m from TVBS” (generally thought to be a pro-KMT organization), they might answer differently than if the interviewer claims to be from the DPP. I haven’t seen much data on this, but the little I have seen suggests that this effect is not very large. I have to believe that if it were large, we’d have noticed by now and the various organizations would have stopped using their own names. However, it is hard for many people to imagine that the identity of the polling center is irrelevant.
In a related vein, many people believe that some respondents will not answer sincerely. Often, this is tied to lingering fears (from the martial law era) about the consequences of expressing support for the wrong side. In the early 1990s, this was clearly a problem. Expressions of support for the DPP were clearly too low. However, I think fear is a minor issue now. Most people have figured out that the secret police don’t listen in to TVBS interviews and haul away DPP supporters in the middle of the night.
On the other hand, some people just don’t want to answer questions. They might value privacy, or they might just find surveys annoying. One common assumption is that supporters of one party are eager to express their opinion, while supporters of the other party are more reserved. Unfortunately, there doesn’t seem to be a consensus on which party is which; everyone who discusses this effect seems to believe that supporters of the party they prefer are more reserved.
The order of the questions and the question wording can affect the results. “Are you satisfied with Ma’s performance,” “Do you approve of Ma’s performance,” and “Do you trust Ma” will all produce slightly different results. If you ask that question at the beginning of an interview, you often get different results than if it is the last question. General attitudes (Overall, do you like Ma?) are more affected by specific attitudes (How do you think Ma is doing on health care reform?) than vice versa. For this reason, most polls will ask who you intend to vote for at the beginning of the questionnaire, before they ask your opinion on traffic, pollution, welfare, and so on.
So far, I’ve assumed that the polling organization is sincere and wants to produce a good estimate of the population. This is not always the case. Sometimes they want to produce a result that paints their side in a positive light for advertising purposes (or perhaps sycophantic purposes). I recently saw a survey commissioned by one of the township mayors in Taipei County. (Since he was nice enough to give me the report, I won’t tell you who it is.) It included questions like this. 1) Our township has six libraries with twelve reading rooms and 450,000 books. This is the most in Taipei County. Are you satisfied with the township government’s efforts in this area? 2) The township government has made improvements in transportation, such as the bridge connecting us to XX, which now only takes 10 minutes and securing a promise for an exit ramp on the new expressway which will save our residents lots of time. Are you satisfied with the government’s efforts in this area? 3) Since XX became mayor, he has paid special attention to social welfare, making extra efforts to take care of those in need. For example there are programs to give meals to seniors living alone, education subsidies, scholarships, and disaster relief. Are you satisfied with Mayor XX’s efforts in social welfare. Question #13 was If Mayor XX runs for another elected office so that he can continue to serve the people of XX, will you support him? You might not be surprised that 79% said they would support him. Of course, after all of those leading questions, this tells us almost nothing about his actual support in the population. (I’m dying to see if he runs for legislator next year!)
I don’t think that most of the polls we see publicly reported are of this nature. This kind of poll can destroy a polling organization’s credibility pretty quickly.
Going from polls to elections
Theoretically, a poll is not supposed to predict election results. A poll is supposed to be a snapshot of public opinion at the time it is taken, not a month in the future when people cast their votes. Of course, everyone uses polls to predict election results, but you must remember that this assumes no changes in public opinion between the poll and the election. Sometimes things happen. The president is shot the day before the election. Lu Hsiu-yi kneels down begging people to vote for Su Tseng-chang the night before the election, and this is played over and over on TV. All the vote-buying is usually done in the last two or three days.
Strategic voting often causes disconnect between poll results and election results. I’m not sure why people don’t tell pollsters about their intentions to vote strategically, but they never seem to do so. They seem to express their sincere intentions right until the last minute, when they shift their support to another candidate who they think needs their vote more. This is especially common in multi-member districts.
Turnout matters. Higher percentages of people express intention to vote in polls than actually turn out to vote. Every poll includes some people who express intent to vote but actually don’t and a smaller number who say they won’t vote but actually do. Elections are decided by actual voters, not eligible voters (the theoretical population) or respondents who express intention to vote (the group actually reported). One practice in American surveys is to ask respondents the likelihood that they will vote on a scale of 1 to 10. Then, assuming that turnout will be 40% in a typical midterm election, they only look at the top 40% on that question. I don’t think anyone does anything like this in Taiwan (thankfully).
Prediction Models
So we know that the polls aren’t exactly right, but we still want to know what is going to happen in the election. Most people have a prediction model that allows them to translate survey results into a prediction.
Most of these prediction models are informal. For example, some people throw out the non-responses and just assume that the voters will act exactly as the respondents who express opinions. Others assume that DPP voters are under-represented in the polls, and they have a variety of methods for adding the right number. Some multiply the DPP candidate’s total by 1.1, others add a straight 5%, and so on. Some people try to think about the turnout, and what different turnout levels will do to the survey results. They might think that if turnout is low, then the DPP candidate will do better than in the survey, while if turnout is high, the KMT candidate might do better. Most of us have these kinds of informal rules to help us make sense of survey results. Of course, most of us realize that our informal models are really just subjective guesses.
Sometimes there are formal prediction models. A few weeks ago, both TVBS and UDN produced formal prediction models from their survey results using statistical models. UDN splashed these results all over the front page, screaming that Su would beat Hau by 51-49, even though their actual poll showed Hau leading 47-40. Statistical models all share a common assumption: people who don’t express are not different from people who do express an opinion. By looking at how people who express an opinion answer other questions (party ID, various issues, demographics), the model can figure out how much each answer affects the probability that the respondent prefers Su or Hau. Then the model takes these probabilities and plugs them into the non-respondents’ answers. There are a couple of problems here. One, these models will give you different results when you ask different questions. If you just use demographic categories, you will get different results than if you ask a question about corruption in the Xinsheng Elevated Expressway case. So what are the right questions to ask? No one quite knows. The other problem is more fundamental. People who don’t answer the vote intention question are probably different from people who do. If a person identifies strongly with the KMT, thinks the Flora Expo is great, is very satisfied with the road repair program, and still says he isn’t sure who he will vote for, you might not want to just assume that he is a Hau voter. There might be a good reason that he is undecided (and you will almost certainly not be able to understand this from the available survey data). In sum, these statistical prediction models may appear to be more “scientific,” but they are just as subjective as any other prediction model, formal or informal.
So what are we to do? (What? Do you really expect direct answers from this blog? Go watch a cable TV talk show!)
I like to see results repeated in several polls from several polling organizations before I get too excited about them. I also pay close attention to sample sizes before looking at any subset of the data. Most importantly, I try to remind myself that polls convey important information, but they leave out some things too. The polls in Taipei City say that the race is close. Su will almost certainly do better than any DPP candidate has ever done in Taipei City. They don’t tell me what turnout will be like or whether there are a bunch of KMT supporters who are disgusted with Hau but will eventually vote for him (while crying or holding their noses). The polls in Xinbei City suggest that Chu is more likely than Tsai to win, but the poll results are also close enough that I wouldn’t be shocked if Tsai won. Again, translating poll results into election results is a tricky business, and sometimes it is as important to remind ourselves how much we don’t know as how much we do.
Tags: polls
November 16, 2010 at 11:27 pm |
Very well done sir!