The Big Reveal goes awry

This morning was the big reveal. The experts would announce the results of the surveys and tell us who was going to head the ticket and who was going to take the second spot, and then we would have a great photo op with the two candidates raising their hands and promising to lead the joint ticket to victory.

It didn’t quite work out that way.

The rules of the game were not spelled out clearly in the agreement three days ago, so as you might expect, the two sides did not agree on what to look at, much less what the results were. The KMT said that there were nine surveys, of which Hou won 8, and the TPP said that there were only 6 valid surveys and both sides won three each. That is, the KMT thought that Hou was clearly victorious, while the TPP argued that the situation is still unresolved. At present, both sides say that they are still working towards cooperation, but all signs point to a deadlock and potential breakdown.

Before we get into that, let’s look at the results.

There were nine polls proposed for consideration. Each poll asked two questions: “Do you support a Hou-Ko ticket or Lai-Hsiao ticket?” and “Do you support a Ko-Hou ticket or a Lai-Hsiao ticket? Each survey had a different margin of error based on the sample size. Everyone agrees on this set of facts. They don’t agree on much else.

 pollsterQ1
Hou
Ko
Q1
Lai
Hsiao
Q2
Ko
Hou
Q2
Lai
Hsiao
Margin
of error
1UDN42.0036.0041.0036.00±2.90
2KMT38.2030.6038.8029.30±2.55
3東森39.9334.5339.3034.80±2.94
4匯流46.1041.6048.3039.20±2.17
5ETToday41.6037.1039.6036.20±2.83
6鏡電視46.5034.9046.6033.10±2.94
7世新40.8235.8646.0132.22±2.94
8好好聽44.6039.5044.0037.20±2.97
9TPP39.7033.0044.0032.00±2.98

The first argument was about which polls should be used. The TPP representative* argued that three of them should not be considered. #5 was conducted using text messages rather than by calling telephones. Respondents had to actively opt in rather than passively be sampled. #3 and #8 were conducted purely by landlines; no cell phones were called. The TPP argued that this meant they could not represent the entire population. The KMT rejected these arguments, arguing that all nine surveys should be considered as valid. The two sides agreed to set this question aside and look first at the other six surveys. However, neither side yielded on this question.

(* The TPP did not appoint Ko’s high school classmate as its expert, as he had suggested he planned to do. Instead, they appointed the head of a polling company. Perhaps one of his staff members persuaded him they needed a professional for this job.)

Personally, I think this was an entirely predictable argument. The TPP should have insisted on only using surveys that included cell phones during the original negotiations. They made a mistake when they didn’t specify what kinds of surveys would be acceptable to them.

A second, and the most intractable, argument was over how to interpret the margin of error. I’m afraid we need to do a very short lesson in statistics to understand each side’s position. So bear with me.

When you do a survey, you always get a point estimate. That is, you get a number such as “35.2% of people support Ko.” This number is almost always wrong. The actual percentage in the full population is almost always a little bit different. The margin of error helps us to understand how much different the actual number might be from our point estimate.

Imagine flipping a coin. The actual probability of getting heads is 50%. If you flip that coin twice, there’s a good chance you won’t get heads exactly one time. Fortunately, large numbers are our friends. If you flip the coin 1000 times, you probably won’t get exactly 500 heads, but you will get pretty close to that. In fact, we know the margin of error for that estimate with 95% confidence is 1/√n, where n is sample size. Since 1/√1000=0.032, we can be 95% sure that we will get between 468 and 532 heads. To put it another way, if we flip the coin 1000 times and get exactly 487 heads, we can be 95% confident that the actual probability is somewhere in the interval of 487 plus or minus 32. Lo and behold, 500 is inside the 95% confidence interval of 455 to 519. If you repeat this exercise an infinite number of times, 95% of the confidence intervals around your point estimates will contain 500.

So let’s imagine a survey with sample size 1068 that shows Ko at 34% and Hou at 30%. Is 34% higher than 30%? Should Ko win this point? 1/√1068=0.030, so Ko’s confidence interval is from 31% to 37%. Ko is statistically significantly higher than 30%, so he wins, right? Not so fast. There is one problem with that, and this is the critical point. Hou’s 30% support is also taken from this survey, so it is also a point estimate and it also has a margin of error. Hou’s 95% confidence interval is between 27% and 33%. These two confidence intervals overlap a little, so we cannot be 95% confident that Ko’s support is, in fact, higher than Hou’s support. Even though Ko leads by 4% and the margin of error is only 3%, 34% and 30% are actually not statistically significantly different.

(This is a point we often ignore in common discourse, where we usually just look at a single number and the margin of error, but if you’ve ever tried to publish a paper in a serious academic journal, you will know that overlapping confidence intervals are the kiss of death, no matter how small the overlap is. Ko, as a distinguished academic, should be well aware of this.)

So, let’s look at how the KMT wants to interpret the results.

 pollsterHou- KoMoEWho is higher?Who gets a point?
1UDN1.00±2.90neitherHou
2KMT-0.60±2.55neitherHou
3*東森0.63±2.94neitherHou
4匯流-2.20±2.17neitherHou
5*ETToday2.00±2.83neitherHou
6鏡電視-0.10±2.94neitherHou
7世新-5.19±2.94KoKo
8*好好聽0.60±2.97neitherHou
9TPP-4.30±2.98neitherHou

Only one of these results is statistically significantly different, so the score is 8 to 1. Even if you throw away the three surveys that the TPP disputes, the score is still 5 to 1 in favor of Hou.

(Actually, it isn’t clear why Ko is deemed to be significantly higher in survey #7. It looks like that should also be within the margin of error. Just after Mrs. Garlic pointed that out to me, a reporter asked Eric Chu about this at the news conference. He did not give a clear answer. We’ll come back to this in a minute.)

(Let’s step back and look at the point estimates and ignore the margin of error for a minute. Ko actually does better in five of the nine surveys. And if you throw away the three disputed surveys, Ko wins 5 out of 6. Remember, Ko was the one who introduced margin of error into this discussion in the first place. Oops.)

OK, now let’s look at how the TPP wants to interpret the results.

 pollsterKo-Lai-(Hou-Lai)MoEWho is higher?Who gets a point?
1UDN0.00±2.90neitherHou
2KMT1.90±2.55neitherHou
3*東森    
4匯流4.60±2.17KoKo
5*ETToday    
6鏡電視1.90±2.94neitherHou
7世新8.83±2.94KoKo
8*好好聽    
9TPP5.30±2.98KoKo

The TPP insists that there are only 6 valid poles, and they think that Ko was significantly higher in three of the polls and no one was significantly higher in the other three. Therefore, each side gets three points.

You can see what they’re doing here. They are using margin of error in the intuitive way, treating it as the standard to differentiate between two point estimates. If the two point estimates differ by more than the margin of error, they consider that to be a statistically significantly different period.

That’s not good statistics, but it is an intuitive argument that they can make to the public. This is why they have yelled loudly that they were comfortable yielding 3%, but the KMT is unfairly insisting on a 6% advantage.

Once again, I have to conclude that Ko did a lousy job of negotiating the terms of this contest. He didn’t stipulate he would yield 3%. He said the result would have to be outside the margin of error. I suppose he might have thought he was yielding 3%; after all, what kind of idiot would yield 6%?

You might have noticed that the point estimates in the two tables are different. This is the third obvious point of contention. In the KMT tables, the numbers are support for Hou minus support for Ko. Lai’s results are ignored. In the TPP tables, they compare Ko’s advantage over Lai to Hou’s advantage over Lai. This doesn’t make a whole lot of difference, but it’s just one more thing that they should have clarified in advance. This whole process was extremely sloppy.

I think these different definitions of the dependent variable might be behind the KMT’s judgment that Ko was the winner of survey #7. The numbers in the KMT table suggest that Ko was not, in fact, significantly higher than Hou on survey #7. (5.19 < 2*2.94) However, in the TPP table, even using the KMT definition of margin of error, Ko was in fact higher than Hou. (8.83 > 2*2.94) I’m guessing the KMT did not want to press the issue and insist on its definition of the point estimate. Since it was not absolutely clear that there was no significant difference, I’m guessing they decided to yield this point. After all, they don’t want to humiliate Ko. They want his support eventually, or at least his supporters’ votes. There’s no harm in giving him a little face.

If this was, in fact, an olive branch, my instincts tell me it’s not nearly enough. The fact that Ko has decided to dispute these results suggests to me that he is still planning on running for the presidency. Everyone is still talking about finding a way to cooperate, but the KMT thinks it has won while the TPP’s thinks it’s still negotiating. Ko suggested this morning that there were still five days before the registration deadline, and that’s enough time to holding traditional polling primary by doing a new survey. (Well, why didn’t they think about that three days ago when there were still eight days left!?) I can’t imagine the KMT agreeing to this. They have to think that the process is already over.

There are three plausible outcomes to this controversy. First, Ko could completely surrender and agree to run as Hou’s vice presidential candidate. I think this is looking less and less likely all the time. Second, he could decide not to run, but also not to serve as Hou’s running mate. He would just drop out and focus on the legislative candidates. The KMT would be on its own, and the TPP would go back to being an opposition party that distrusts both major parties. Third, he could declare that this process was a flat-out fiasco that, if anything, showed him to be the stronger candidate. Then he would righteously announce he was running for the presidency. If I had to guess right now, this is what I would say is most likely.

In either of the first two scenarios, I suspect the two candidates have done themselves no favors. They both want to get votes from each other ‘s supporters, but gluing together in an uneasy coalition will be more difficult now because of the higher level of acrimony between the two sides. Likewise in the third scenario, strategic voting will be more difficult because both sides will be more likely to stick with their first choice no matter what and less likely to think of the other side as their clear second favorite. I suspect both sides would have been better off if they had never started this attempted cooperation, which is quickly turning into a calamitous debacle.

update: As several of the commenters pointed out, the margin of error is a little more complicated than I made it out to be. Sorry, I’m not a great methodologist. The precise formula is quite complicated and you need to know the covariance, which I don’t know how to calculate. They were discussing this in my Line group, and someone gave an example of n=1000 and covariance is 3% so the two variables would have a margin of error of 4.243%. In retrospect this is probably why survey #7 was deemed to show Ko leading, and survey #9 was deemed to show no significant difference.

So I’m wrong about the exact calculation, but the larger point I was trying to make is that the margin of error that you see reported in the media is not actually the standard for determining whether two variables are statistically significantly different.

25 Responses to “The Big Reveal goes awry”

  1. Shelley Rigger Says:

    Dr. Garlic, I believe you saw this coming from the start.

  2. forestation Says:

    Ko certainly doesn’t understand statistics, even though he thinks he does. But he might have stumbled upon a valid argument by accident.

    Like you said, the confidence interval of the estimated gap between two candidates is wider than the estimated share of each candidate (about double).

    However, how confident we are in the vote share gap is different from how confident we are that Ko is leading, because for the latter we only care about the tail errors on one side of the distribution.

    Furthermore there’s the issue of the DK/None of the Above votes, and how you assume they would vote. This is especially true if you are not directly comparing Ko to Ho, but “Ko vs Lai” to “Ho to Lai.

    If for each of the 6 polls we make some reasonable assumptions and simulate the probability that Ko would do better vs Lai than Ho would, it is quite possible that three of them would indeed indicate a probability greater than 95%.

    I strongly doubt either side would’ve gone through that analysis – there hasn’t been enough time. And let’s be honest, not sure their selected experts would have had the statistical wherewithal either.

    In the end, it’s on Ko for signing on to ill defined terms that he didn’t actually understand.

    • forestation Says:

      I should add that the agreement doesn’t mention win probability or confidence levels, only margins of error. But the margin of error of what? That’s not defined at all, and I don’t think they know the answer, or that it’s the right thing to measure.

  3. Red Says:

    If there is any silver lining to this, it is that it will hopefully piss off the TPP voters and make them see what an untrustworthy snake the KMT is, and make them vote Green in January

    • Coolpot Says:

      Don’t think many will change their votes if TPP doesn’t drop out of the race. Even if TPP dropped out of the race, I don’t think much if any TPP supporters will vote green either, it’s most likily that some will rather not cast any votes as a form of protest due to the current state of politics.

  4. Pascal Says:

    If both sides had been serious about resolving the question with polling – instead of trying to shape an agreement that each believed would favor them – then it could’ve been done with a simple average over an intuitive time window. Anything else was bound to produce disagreements over sampling, comparability and other technicalities that are much less straightforward than the MoE. Hard to see that there’s any way of restarting and amicably concluding this process in the next few days. If I were the KMT, I might gamble on ceding Ko’s interpretation of the MoE and a tie in the sample up to now, but using the upcoming polls as tiebreakers while insisting on full inclusion of all polls regardless of method. That should still give them a reasonable shot.

    • Pascal Says:

      Also, I wonder if there’s been any talk of an agreement where both candidates run for now, but agree to drop out and endorse the other if polling later in the campaign shows the latter with a clearly better chance? That’d leave more time to hash out a robust process for measuring support, and I suppose the VP spot isn’t that big of a prize.

      • Coolpot Says:

        Not likely for KMT to do so and there is only a slight chance that TPP does so in such a situation. It will be a nightmare for DPP if TPP actually go ahead to endorse KMT. Based on online reactions, DPP is worried about any collation between the opposition parties and I do hope they wake up and actually demonstrate how their policies in next 4 years will change for Taiwan for the better. Interesting days ahead.

  5. Mike Says:

    Perhaps, one of my biggest surprises was that as a clinician, Ko did not go for one of the more advantageous methods of aggregating polling results – performing a meta-analysis that would shrink down the margin of error, by applying either a random effects or fixed effects model.

    “Ko, as a distinguished academic, should be well aware of this” – the issue is that there are many academics (particularly clinicians) that don’t pay attention to statistics. 😅

  6. Shen-yi Liao Says:

    “So let’s imagine a survey with sample size 1068 that shows Ko at 34% and Hou at 30%. Is 34% higher than 30%? Should Ko win this point? 1/√1068=0.030, so Ko’s confidence interval is from 31% to 37%. Ko is statistically significantly higher than 30%, so he wins, right? Not so fast. There is one problem with that, and this is the critical point. Hou’s 30% support is also taken from this survey, so it is also a point estimate and it also has a margin of error. Hou’s 95% confidence interval is between 27% and 33%. These two confidence intervals overlap a little, so we cannot be 95% confident that Ko’s support is, in fact, higher than Hou’s support. Even though Ko leads by 4% and the margin of error is only 3%, 34% and 30% are actually not statistically significantly different.

    (This is a point we often ignore in common discourse, where we usually just look at a single number and the margin of error, but if you’ve ever tried to publish a paper in a serious academic journal, you will know that overlapping confidence intervals are the kiss of death, no matter how small the overlap is. Ko, as a distinguished academic, should be well aware of this.)”

    I think this is the point that Eric Chu was trying to make, and I think it is statistically incorrect. While one can infer statistical significance (at alpha = 0.05) from non-overlapping 95% CIs, one CANNOT infer statistical nonsignificance from overlapping 95% CIs.

    As this paper ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4877414/ ) says:

    “If two confidence intervals overlap, the difference between two estimates or studies is not significant. No! The 95 % confidence intervals from two subgroups or studies may overlap substantially and yet the test for difference between them may still produce P < 0.05. Suppose for example, two 95 % confidence intervals for means from normal populations with known variances are (1.04, 4.96) and (4.16, 19.84); these intervals overlap, yet the test of the hypothesis of no difference in effect across studies gives P = 0.03. As with P values, comparison between groups requires statistics that directly test and estimate the differences across groups. It can, however, be noted that if the two 95 % confidence intervals fail to overlap, then when using the same assumptions used to compute the confidence intervals we will find P 0.05 for the difference.”

    It’s true that this is a “kiss of death” at some academic journals, but that is just because the inferential mistake is unfortunately an extremely common one, as this other paper ( http://www.edmeasurement.net/5245/Belia-2005-CIs-SEs.pdf ) demonstrates about visual inference (see figure 2, especially A).

    To be clear, I am not claiming that the difference *is* statistically significant. I have not done the statistical test. I am only objecting to the inference from the overlapping boundaries of 95% CIs to statistical nonsignificance. Moreover, I think there is an open question with respect to the original agreement what is meant by “within the margin of error”. But, of course, the (strategic?) ambiguity was the whole point.

    • alanwatson1958 Says:

      Indeed you cannot just add two margins of error as the KMT are doing. a) if the two samples are independent, you need to add the variances. This leads to a combined margin of error than adding the two margins together. b) if the two samples are correlated, combining the two samples will lead to a new margin of error less than you would get by adding the variances. In any situation like an opinion poll where a decrease in one % leads to an increase in another the two values are inevitably correlated. There is a standard formula for looking at this, the co-variance matrix of a multinomial distribution. The overall result is bound to be between what the KMT and the TPP say, but probably rather closer to the TPP.

  7. CHING-CHIH LU Says:

    I’m talking from a purely technical perspective. Is it possible that the margin of error is biased?

    The numerator of P(1-P)/N uses P = 0.5, which gives the largest possible estimation. However, if we look at the blue-white vote share as a binominal distributed random variable, and treat all other outcomes as the flip side, P = 0.4 at best. This lowers the margin of error.

    Or we should reduce the valid sample to blue-white and Lai votes only to make it a true binominal distribution. The sample size on the denominator is lowered by much more than 10%, which makes the margin of error larger.

    I haven’t dig deep into the sampling yet. Two sets of tickets are drawn from the same poll, so they are not exactly independent. This makes things even more complicated.

    I’m an economist and I’m used to use econometrics models with causality assumptions by design. I thought there should be statistical tests better than comparing confidence intervals, but that’s beyond my knowledge. We can at least run a bootstrapping to give a proper probability though.

    • frozengarlic Says:

      As I remember it, we use p=0.5 because that gives the largest margin of error, and we generally want to be conservative when we claim to have found a significant difference.
      We absolutely do NOT want to discard all the undecideds and other non-responses. Those are important parts of the population.
      Anyway, the actual MoE is much larger than 1/sqrt(n) because none of these are truly random samples (in which every member of the population has an equal probability of being sampled). No poll in the world today is actually a perfect random sample, but we politely agree to ignore that.

      • Ching-Chih Lu Says:

        Thanks for your reply. This is not my field, it is great to get insight from experts.

        I understand the conservative view to take a larger MoE for normal use. Nevertheless, Mr. Ko should have negotiated a narrower confidence interval on 11/15. Say, a 90% confidence interval rather than the traditional 95%. It is even better for him to secure a smaller MoE. Well, he didn’t do any of those. That’s his mistake, not my problem. An agreement is an agreement, he should not walk back from what he had signed.

        However, these two sets of tickets (Hou-Ko vs. Lai and Ko-Hou vs. Lai) in one poll are not independent, which causes a bigger problem than the imperfect random sampling problem you mentioned.

        Using your example:

        “So let’s imagine a survey with sample size 1068 that shows Ko at 34% and Hou at 30%. Is 34% higher than 30%? Should Ko win this point? 1/√1068=0.030, so Ko’s confidence interval is from 31% to 37%……. Hou’s 95% confidence interval is between 27% and 33%.”

        Yes, there are 2% overlapping between them, which suggests the difference is statistically insignificant. Yet, each poll is a random sampling to represent the larger population. If we do the same poll at the same time to the same amount of people, we might get a different result.

        Should Ko’s support fall to the lower end of the interval, at 31%, it could indicate a sample with a higher concentration of Lai supporters. These green leaning voters are unlikely to vote for Hou either, suggesting Hou’s support might be even lower than 30% in this sample.

        The difference may be significant in your example if we account for the flawed assumption of independence.

        Some of my economist friends tried very hard yesterday to find a proper test for this situation, but I didn’t see anything plausible.

  8. Red Says:

    The good news is that suddenly, for a week at least, statistics is sexy, and all 23 million people in Taiwan are getting a refresher education on margin-of-error, chi-square, ANOVA and all that good stuff. Nerds rejoice

  9. danspottw Says:

    Did we get clarity on why KMT was oddly specific about looking back to Nov. 7 for polls?

    • frozengarlic Says:

      Good question. I don’t know. The earliest poll in the data set is the United Daily News poll which was released on the 11th. Perhaps they weren’t sure what question they would be using as the standard for comparison and there was another poll with a different question that they were looking at.

  10. Tom Says:

    To summarize the multiple issues with the margin of error used in comparing the polls between Hou and KP so actually neither side is correct.

    1. The first question still unclear is are they comparing the percentage support of Hou/KP (in the Hou/KP vs Lai poll – poll 1) with KP/Hou (in the KP/Hou vs Lai poll poll 2), or the difference in the poll 1 versus difference in poll 2. Seems like KMT says the first, TPP says the second. Of course, the margin of error for the latter is even higher involving estimates of 3 instead of 2 proportions, which is detriment to TPP which they don’t realize.

    2. The poll numbers are estimating a proportion (whole numbers numerator/denominator) not a mean, so the 95%CI boundaries follow the binomial distribution of p +/- 1.96*√(p(1-p)/n), where p is percentage support, n is sample size. Therefore the margin of error is dependent not just on n, but also p, and the largest error when p is 0.5 or 50%, and all other poll numbers have smaller error and narrower 95%CI than what they have used (to the detriment of KMT). Then you can calculate the 95%CI of both candidate’s poll numbers and see if they overlap to determine statistical significance, however……

    3. The numbers reported in each polls are not raw proportions. They have been processed to some extent eg to fit age-distribution of the population. Unless the formula for such standardization is known, the actual margin of error cannot calculated, as error start to accumulate when processing data (to the detriment of TPP).

    4. Furthermore, they cannot simply add their errors in comparing them as a difference, because the two proportion estimates are not from independent samples. Also they cannot simply discard the other percentages in the poll supporting DPP or no response when analyzing.

    5. There should be some statistical test (whether binomial distribution comparisons, chi-squared test, Mcnemar test or something else) which can be calculated using stats software like SAS, R, STATA or SPSS to test if there is a significant difference between the two proportions, so that you get to the final output of P<0.05 or not to say if there is significant difference between the two. Surely there are enough statistical experts to figure this out. Then the software makes the decision.

    6. I agree with Mike the best way to reduce the margin of error is to pool/meta-analyze all the eligible polls data (with subject level data available) to maximal sample size and provide the best estimate of each candidate's support and their difference, which would help TPP.

    In conclusion, lots of political expertise for part and personal gain from both sides, but lack of statistical expertise and time spent on deciding how to interpret the polls. Or perhaps the greyness rather than the black and white surrounding the poll rules were deliberate.

  11. erik Says:

    There’s really no reason to add in the MoE since they’re only comparing within the exact same datasets.

    Needless to say that KMT not only wanted the MoE being added to their number only, but also add another 3% from Ko. Thats pretty shameless of them.

  12. Enlaces 20/11/2023 | capitalismo desnudo - Turismo Enla Manchuela Says:

    […] La gran revelación sale mal Ajo Congelado. Sobre la ruptura de la candidatura presidencial conjunta de Taiwán. […]

  13. Kilto Says:

    How much do you reckon the official announcement of Hsiao as Lai’s running mate is going to improve the DPP’s chances? I wish she were the Presidential candidate tbh, but maybe next time…

    • Red Says:

      I don’t think Hsiao is going to win the DPP more votes than it’s already got. The main kind of person she appeals to is a progressive, internationalist, Green, bilingual, feminist, Westernized voter – which is what the DPP’s already got as a base.

      • John Says:

        The type of voter you describe does not particularly like Lai and may still be resentful toward him for 2019. They would never vote for anybody else but they could very well have decided to stay home in January. Hsiao will be a huge motivating factor for them to turn out, which will also help the DPP in legislative races.

  14. Collin Says:

    “These two confidence intervals overlap a little, so we cannot be 95% confident that Ko’s support is, in fact, higher than Hou’s support. ” – This is incorrect. The difference between two independent normally distributed random variables is still normally distributed. The variance is the sum of the two variances, but the STD / MoE is not. Assuming the same sample size for those two, which gives 3% MoE, the combined MoE is sqrt(2) * 3% ~= 4.2%, not 6%.

    I do want to echo Mike’s response that using random-effect analysis is the best way to aggregate the data rather than the weird “point” rules. If it’s done, Ko can easily prove the statistical significance.

  15. Taiwan update | Fruits and Votes Says:

    […] 16 November I wrote about the plan by Taiwan’s opposition parties to present a joint ticket. It didn’t go well. The three-way contest continues, although the dynamic is becoming ever more two-way. The president […]

Leave a comment