the Taitung race and ecological inference

A few weeks ago, I promised to eventually get around to writing about Taitung. I don’t have a blow-by-blow account of what is happening there, but I perhaps can use a closer look at some of the historical election results to shed some light on the race.

Taitung, on the southeastern part of the island, is traditionally a deep blue area. It is ethnically diverse, and places with fewer Min-nan residents have historically been more challenging for the DPP. Even today, the DPP has almost no presence at the county assembly or town mayor level. In the 2016 election, Taitung and Hualien on the east coast were the only places on the main island that produced more votes for Eric Chu than for Tsai Ing-wen. However, the two-term legislator from Taitung is Liu Chao-hao 劉櫂豪, from the DPP. Liu has previously run for and lost county magistrate four times, and this year he is trying again. There isn’t a whole lot of info coming out of this relatively obscure race, but my impression is that observers think Liu has a good chance of winning. There have been several complaints from within the KMT camp that their candidate, Rao Ching-ling 饒慶玲, is extremely weak, and of course she responds that these are unfair attacks and that she is winning. Who can tell?


Instead of looking at the past two months, I’m going to look at the two main candidates’ past electoral performance. Here’s a summary of the county-wide races over the past two election cycles. The DPP candidates in the non-presidential races are all Liu Chao-hao. The KMT candidate in the 2012 legislative race was current KMT nominee Rao Ching-ling.


  KMT DPP others
2009 magistrate 56354 50802  
2012 prez 72823 33417 3313
2012 LY 22553 31658 21932
2014 magistrate 64272 53860  
2016 prez 43581 37517 16565
2016 LY 23616 42317  
  KMT% DPP% Others%
2009 magistrate 52.6 47.4  
2012 prez 66.5 30.5 3.0
2012 LY 29.6 41.6 28.7
2014 magistrate 54.4 45.6  
2016 prez 44.6 38.4 17.0
2016 LY 35.8 65.2  

The DPP has done markedly better when Liu is on the ballot than in presidential races. While Tsai did not break 40% either time, Liu has broken 40% all four times. However, he only got a majority once. In the 2012 race against Rao Ching-ling, former county magistrate Wu Chun-li 吳俊立 split the blue vote, and Liu was able to win with only 41.6%. To put it another way, Rao was so weak that she couldn’t even soak up 42% of the votes, even though there were plenty of blue votes available.

You will note that there are a lot more votes in the presidential and county magistrate elections than in the LY elections. That is because 30,000-40,000 indigenous voters vote in the special indigenous districts in LY elections rather than in the normal district elections. Anecdotally, we know that indigenous voters overwhelmingly vote blue. Are they the difference between the DPP’s victories in the legislative races and losses in the presidential and magistrate races? Who knows. Surveys don’t give any precise answers because there are never enough cases to break out indigenous voters. If a survey has 1000 respondents and 2% of the population is indigenous, you expect 20 indigenous respondents. That’s simply nowhere near enough to produce even a bad estimate. And if you want to know about indigenous voters in Taitung (as opposed to those in some urban area in the north), you’re even more in the dark.


Warning: Extremely Boring Methodology Section

I do have a potential solution, but it’s going to require a bit of explanation. What I’m doing goes by the name of ecological inference. In a nutshell, I am trying to infer individual level behavior from aggregate level data. More specifically, when there were two elections on the same day, I’m trying to figure out how people voted in both of them. Did they cast straight tickets or split tickets? The basic problem looks like this:

    KMT DPP Total
legislator KMT ? ? 55000
  DPP ? ? 75000
  total 70000 60000 130000

If there is a district with 130,000 voters, all of whom vote in both the presidential and legislative elections, we will know how many total votes each candidate got. What we want to know is the four missing cells. The table shows the aggregate totals, but we actually have a little more information since we know the totals for each precinct.

About 20 years ago, Gary King at Harvard proposed a solution to this problem. King’s solution looks at the bounds defined for each cell by each precinct. If the KMT gets a very high or very low percentage of presidential votes in a given precinct, it can be quite informative in defining the logical bounds for how the legislative vote breaks down. Likewise, if the numbers of votes for the KMT are quite different in a given precinct, that implies there must have been at least a certain level of split ticket voting. At any rate, these bounds and a few other parameters help to define a distribution, and then you start taking random draws from that distribution. The algorithm assumes that the underlying distribution is the same for all precincts, though the observed level of split-ticket voting in a given precinct is a random draw from that underlying distribution. With each simulation, the algorithm slightly tweaks the parameters of the distribution. After a large number of simulations, the results stabilize. Essentially, the algorithm will eventually settle on the solution with roughly the highest levels of straight ticket voting that the data will support. Of course, these are simulations and you are drawing lots of random numbers, so the solution is slightly different each time.

King’s method is controversial. Some studies using it have been published in top journals. However, the solution that the algorithm produces is not guaranteed to be correct, and it may be biased toward straight-ticket voting. Nonetheless, we don’t have any better solution. If you have something better, cough it up. Otherwise, let’s go forward with the understanding that this isn’t foolproof but it’s the best we have at the moment.

The matrix above has two rows and two columns. Unfortunately, life is rarely that simple. Taiwanese elections certainly do not fit into a 2×2 matrix. For one thing, the 2012 and 2016 presidential elections have had three candidates. More importantly, the presidential and legislative elections have different numbers of valid votes, and we need the same number of voters in each precinct. Instead of valid votes, I need to look at everyone who voted in the bigger election. That means adding another column and three more rows to my matrix. In the presidential vote, some people cast invalid votes. In the legislative election, we have invalid votes and indigenous voters. That still leaves a small group of people who are eligible to vote in the bigger election but not the smaller election. These are usually people who have recently moved into the district and so are not eligible to vote for the legislative candidate but are still eligible to vote for the presidential candidate. (About 1% of precincts actually have one or two more legislative votes than presidential votes. I made the numbers add up by creating the necessary number of invalid presidential votes.) That means I will have at least a 5×3 matrix, and I might have even more rows if there are more than two legislative candidates. However, I’m not very interested in invalid votes or people who just moved, so I combined these two categories, yielding at least a 4×3 matrix.

Running the model takes a lot of computer time. It also required me to learn rudimentary R. (R is the new statistical software that all the young technical wizards are using these days. I’m a SPSS and Stata dinosaur.) One of Gary King’s students, Olivia Lau, wrote a package (eiPack) to run the Ecological Inference algorithm on RxC matrices (the original solution was only for 2×2 matrices). As you might imagine, this solution involves a lot more parameters, random draws, simulations, and it takes a lot more computing power. You simply can’t run all the data at once. You have to run it overnight or over a weekend, see what it has produced, and then set it off on the next round. Typically, I double the number of simulations each round, so each time I’m dissatisfied, the next round takes twice as much time. When I first started doing this, I used as few as 50,000 simulations. Then I realized I needed to add lines for invalid voters, and the number of simulations needed skyrocketed. In a few districts, I had to run nearly 40 million simulations before the algorithm produced a solution that looked reasonable to me. Each of those *&%#$#^& districts took my computer (with a 3.9 gHz CPU) nearly 10 hours. (There doesn’t seem to be any clear pattern for why some districts take longer than others. 2012 Taipei 8 seemed like a fairly straightforward blue vs green district, but the 19.2 million simulation model still showed nearly 50,000 votes split between then KMT and DPP. That clearly was not what actually happened. We would have heard something about Lai Shi-pao’s irresistible appeal to Tsai Ing-wen voters. In the next round, with 38.4 million simulations, everything popped into place, with only about 1600 split tickets. Modelling is an art as well as a science.)

I’ve been working on and off with this thing for the better part of a year, and I still haven’t done all the districts I want to do. In a few days, we will get a huge new trove of election results, and I’ll be even further away from finishing. Hooray!

This concludes the extremely boring methodology section.



So how did Liu win his two legislative elections? In the 2016 presidential race, Chu and Soong got about 60,000 votes while Tsai only got 37,000, and it was a straight DPP vs KMT legislative race. However, Liu crushed his KMT opponent, 42,000 to 23,000. What happened to all those blue votes?

Here is my estimate of how the votes broke down. Remember, this is only an estimate. It is not an actual reported result.

2016     President    
legislator Chu (K) Tsai (D) Soong (P) invalid total
Chen (K) 15575 672 7179 190 23616
Liu (D) 8843 31645 1629 201 42318
Invalid/move 522 294 680 425 1921
Indigenous 18569 4932 7062 466 31029
Total 43509 37543 16550 1282 98884

The first thing to do is to subtract indigenous votes. According to this estimate, about 25,000 indigenous voters voted for Chu or Soong, while only about 5,000 voted for Tsai. That reduces the blue partisan advantage among Han voters to 35,000 to 32,000. In other words, the 2016 district race was actually fought on almost neutral partisan turf. We generally think of Taitung as solidly blue territory, but in this race it was not.

However, while the underlying partisan structure was roughly neutral, Liu still won in a landslide. To do this, he had to win a significant number of blue presidential voters. The estimates show that he took over 10,000 blue presidential votes, while the KMT candidate was held to less than 1,000 of Tsai’s votes. Interestingly, most of Liu’s blue support came from Chu, not Soong. Liu clearly has crossover appeal. On a neutral playing field, this strong crossover appeal (and ability to absorb all the green vote) made him an easy winner.

Liu’s election in 2012 is also instructive. Remember, in 2012 the KMT candidate was Rao Ching-ling, who is also the KMT nominee in this year’s county magistrate election.

2012     President    
legislator Ma (K) Tsai (D) Soong (P) invalid total
Rao (K) 21283 780 336 154 22553
Liu (D) 6129 25048 353 155 31685
Wu (I) 15969 2622 284 140 19015
(Green) 168 165 158 61 552
others 1054 541 700 69 2364
Invalid/move 527 476 476 163 1642
Indigenous 27605 3728 1071 362 32766
Total 72735 33360 3378 1104 110577

Taitung was bluer in 2012 than in 2016 (as was the entire country). The blue presidential candidates won Taitung 76,000 to 33,000. As in 2016, most of this margin came from indigenous voters. After subtracting them, the blue advantage was reduced to 48,000 to 30,000, which is still a sizeable margin. So how did Liu win the 2012 election in this solidly blue territory?

As in 2016, Liu won a significant number of blue camp votes. He took about 6500 from Ma and Soong voters. In addition, the blue vote was split between Rao Ching-ling and former county magistrate Wu Chun-li. Rao was not even able to soak up half of the Ma voters who were eligible to vote in the district election.

So Liu won his two legislative races because he had significant crossover appeal, his opponents were weak, and indigenous voters did not vote in those elections. However, in this year’s county magistrate race, indigenous voters will vote. In the 2016 race, indigenous voters in Taitung favored blue presidential candidates 84%-16%, and in 2012 the gap was even wider, 88%-12%. Further, because turnout in indigenous villages is extremely high in local elections, nearly 10,000 more indigenous voters voted in 2014 than in 2012 or 2016. That seems like an insurmountable firewall for the KMT.

However, let’s look more closely at the 2014 race. I broke down the 2014 election by county assembly districts and then added the results together to get an overall picture for Taitung County. There really isn’t any meaningful party competition at the county assembly level, so the first few rows of this table don’t convey much useful information. We also can’t see much evidence of Liu’s crossover appeal since that is baked into the magistrate totals and the assembly figures are meaningless. We are mostly interested in the last row, for indigenous voters.

2014     Magistrate    
Assembly Huang (K) Liu (D)   invalid total
KMT 24970 19449   523 44942
DPP 3918 3290   249 7457
Others 7552 16691   368 24611
Invalid/move 1365 1027   1129 3521
Indigenous 26400 13387   1108 40895
Total 64205 53844   3377 121426

According to my estimates Liu only lost the indigenous vote 2 to 1, not 7 to 1. Tsai was only able to get 4000-5000 indigenous votes, but Liu won 13,000. Liu may have some personal appeal to indigenous voters, or perhaps party labels simply don’t matter as much in a local election. Still, Huang Chien-ting’s 黃健庭 13,000 advantage among indigenous voters was the difference between winning and losing. Liu actually won by about 2,000 votes among Han voters, but indigenous voters put Huang over the top.

Taken together, you can see why Liu might have a chance to win this year, even with the presence of the indigenous votes. First, Liu has demonstrated a strong crossover appeal to people who normally vote blue. Second, Rao appears to be a weak KMT nominee. She was unable to defend even half of the pool of available blue votes in 2012, and anecdotal evidence suggests she is a clear step down in popularity from Huang Chien-ting (who Liu beat among Han voters in 2014). Third, indigenous voters tilt the partisan balance in Taitung blue, but (based on one data point) indigenous voters are not nearly as overwhelmingly blue in county magistrate elections as in presidential elections.  Maybe the fifth time is a charm.

One Response to “the Taitung race and ecological inference”

  1. jaichind Says:

    I did something like this right after the 2012 and 2016 elections for both Taidong and Hualien and came to similar conclusions. What triggered me to do this was me noticing that in the 2010 Hualien legislative by-election was much closer than I had expected. Then I realized “Oh Yeah, the Aborigines does not vote in the Hualien legislative elections. Of course that will cut into the KMT lead when it is clear anti-incumbency will start to hit the Ma/KMT administration ”

    Of course DPP does have a shot Taidoing this year for the reasons you gave as well as the fact that KMT rebel Kuang is running in the race. It is not clear to me living in the USA and without any contacts locally but it seems Kuang and her husband Wu who is in theory working on the Rao campaign might be playing a good cop bad cop routine with KMT high command. Kuang might end up getting some Wu faction votes giving the DPP a chance.

    Of course this year will be a anti-DPP year. There will be a small to medium size wave against the DPP. So with that poorer macro environment I suspect DPP will fall short.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: