One of the fun little quirks of Taiwanese elections is that the candidates draw numbers, and these registration numbers become an integral part of their campaign messages over the last few weeks. This is the only time of the year when people happily shout “#2!” or “#5” at their friends. When candidates draw the numbers, everyone cheers the lucky number 1 and everyone groans at the unlucky number 4. (“Four” sounds similar to “death” in Mandarin and is exactly the same in Taiwanese.) Do these numbers really matter?

Answer #1: Of course not! Don’t be silly. They are randomly drawn, so the effect should be random. Winning and losing depends on important characteristics, such as party affiliation and whether or not a candidate has had a bribery scandal, not stupid things like registration numbers. Why are you wasting time with this topic, FG?

Answer #2: Of course it matters! People are not machines. Some people are superstitious and vote for lucky numbers or avoid unlucky ones. More importantly, people make choices in different ways. Some might look down the list until they find a candidate they like. Some people think voting is a duty and care more about the act than the choice, so they might vote for the person at the top of the list because that’s the first place they look. Names might be easier to find at some places on the ballot than others. Scholars in the USA have found that there is an advantage to being at the top of the ballot, and pollsters often randomize questions and answers to avoid skewing the results.

Well, this sounds like a good old-fashioned empirical question. We have two competing theoretical perspectives, both of which make sense. Which one do the data support? Fortunately, I have a bunch of data perfectly formatted to answer this kind of question. I’m going to look at whether candidates with a particular registration number won at a higher rate than expected.

First, our hypotheses. We have the superstitious hypotheses:

H1: Reg #1 should have more winners (than random).

H2: Reg #4 should have fewer winners.

We also have the easy-to-find hypotheses, where the candidates at the top and bottom of the ballot are the favored ones. (Technically, I think the ballots still run right to left, but it’s the same idea.)

H3: Reg #1 should have more winners.

H4: The highest registration number should have more winners.

Second, methodology. I look at all candidates in all district elections (including indigenous but not including party list) since 1985. I exclude all candidates who ran uncontested. For each candidate, I have a registration number, the number of seats elected in the district, the number of candidates running in the district, and whether the candidate won or lost. Each candidate’s probability of winning is calculated as the number of seats divided by the number of candidates. I sum up all the probabilities and actual wins by registration numbers. For example, there have been 286 legislative candidates who drew #1. These are expected to produce 107.94 winners, but in reality 118 won.

Is 118 higher than 107.94? Now we need some basic statistics. If you flip a coin 100 times, you would expect to get 50 heads. However, you usually will get some other number, just by random chance. If you get 53 heads, you probably shouldn’t announce that the coin is unfair. If you get 75 heads, you might want to inspect the coin. In a statistical sense, we can judge whether 53 or 75 are really higher than the expected 50 using a cumulative binomial distribution. Basically, this calculates the probability of getting 1 head, 2 heads, 3 heads, … , and n heads, and sums up all those probabilities. If the cumulative sum is very large or very small, your coin might not be fair. By convention, most scientists agree that 95% is a reasonable threshold. So if our cumulative probability falls under .025 or above .975, we’ve got an interesting result. Otherwise, we can’t reject the null hypothesis that this is just a random result.

In our example, the 286 candidates were expected to win 107.94 times, for an average winning percentage of .377. [Some real statistician can probably tell me why this is not exactly correct, but I think it’s ok for a quick and dirty study like this one.] The cumulative probability of at least 118 wins is .901, which seems pretty high. It isn’t over .975, so we can’t reject the null hypothesis with 95% confidence. Legislative candidates who drew the lucky #1 won more than expected, but not so much more that there is a statistically significant difference. It could just be random.

Here is the full table for legislative candidates since 1986:

reg | n | Exp (win) | win | P (cum) |

1 | 286 | 107.94 | 118 | 0.901 |

2 | 286 | 107.94 | 104 | 0.339 |

3 | 259 | 94.51 | 98 | 0.698 |

4 | 217 | 80.27 | 82 | 0.625 |

5 | 187 | 70.77 | 71 | 0.546 |

6 | 167 | 64.37 | 59 | 0.220 |

7 | 140 | 56.54 | 54 | 0.365 |

8 | 125 | 51.68 | 48 | 0.283 |

9 | 116 | 47.50 | 44 | 0.287 |

10 | 105 | 43.61 | 43 | 0.494 |

11 | 97 | 40.71 | 49 | 0.964 |

12 | 79 | 33.53 | 29 | 0.180 |

13 | 71 | 30.20 | 28 | 0.344 |

14 | 60 | 25.35 | 24 | 0.415 |

15 | 56 | 23.57 | 27 | 0.856 |

16 | 54 | 22.50 | 26 | 0.865 |

17 | 45 | 18.19 | 20 | 0.760 |

18 | 40 | 16.01 | 16 | 0.567 |

19 | 37 | 14.68 | 16 | 0.732 |

20 | 30 | 11.31 | 12 | 0.677 |

21+ | 215 | 70.84 | 64 | 0.179 |

. | ||||

last | 286 | 107.94 | 115 | 0.822 |

. | ||||

all | 2672 | 1032.02 | 1032 | 0.508 |

So how do the hypotheses look? As previously pointed out, #1 is high but not statistically significant. We thought #4 might be unlucky, but those candidates actually won slightly more than expected. The superstitious hypotheses don’t look so good. H4 says that the last number should be useful for finding the candidate on the ballot. In fact, candidates with the last number did slightly better than expected, though again, this is doesn’t reach statistical significance.

There is another number that is interesting. I combined all the numbers above 20, and candidates with these numbers didn’t do so well. It’s not statistically significant, but it is suggestive. If the easy-to-find hypothesis is correct, maybe voters won’t look as hard for candidates way down on a long ballot. We thus have three suggestive but not conclusive results. If only we had more data.

Wait a minute. The rest of the world looks at legislative data and thinks they have done an exhaustive study. I’m Frozen Garlic. Of course I have more data! That table was only for legislators. I have data on every election except neighborhood chiefs and some earlier township council elections.

reg | n | Exp (win) | win | P (cum) |

1 | 7520 | 4041.93 | 4070 | 0.746 |

2 | 7519 | 4041.49 | 4073 | 0.770 |

3 | 6786 | 3675.21 | 3655 | 0.315 |

4 | 5781 | 3123.67 | 3171 | 0.897 |

5 | 4675 | 2499.74 | 2443 | 0.050 |

6 | 3800 | 2011.96 | 1984 | 0.186 |

7 | 2982 | 1562.59 | 1560 | 0.469 |

8 | 2321 | 1195.10 | 1221 | 0.864 |

9 | 1828 | 924.77 | 939 | 0.755 |

10 | 1431 | 712.30 | 700 | 0.266 |

11 | 1169 | 576.86 | 564 | 0.235 |

12 | 939 | 457.45 | 471 | 0.821 |

13 | 788 | 379.97 | 377 | 0.430 |

14 | 653 | 311.81 | 309 | 0.428 |

15 | 530 | 246.88 | 253 | 0.718 |

16 | 451 | 205.61 | 212 | 0.743 |

17 | 373 | 166.89 | 163 | 0.363 |

18 | 315 | 138.59 | 139 | 0.542 |

19 | 257 | 110.98 | 106 | 0.287 |

20 | 218 | 92.51 | 95 | 0.660 |

21+ | 1003 | 382.43 | 357 | 0.052 |

. | ||||

last | 7508 | 4035.59 | 4053 | 0.661 |

. | ||||

all | 51339 | 26858.74 | 26862 | 0.513 |

In this data set of over 50,000 candidates, #1 doesn’t seem to have any great advantage. #4 looks borderline lucky, which is completely unexpected. #5 is the unluckiest number, though these still don’t reach conventional levels of statistical significance. I certainly don’t have any theoretical reason why #4 should be good or #5 should be bad, so I’m going to go with the random blip idea. I don’t see much hope for the superstition hypotheses.

However, the other set of hypotheses might still be plausible. #1 doesn’t look that promising, and neither does the last registration number. However, candidates down at the end of long ballots are noticeably less successful. Those with registration numbers of 21 and higher were expected to win 382.43 times but only won 357 times, a borderline significant result.

If the easy-to-find idea is right, it should only matter when some candidates are hard to find. That is, when there are only a few candidates, everyone is easy to find. Let’s look at races with only a few candidates and races with lots of candidates. First, here are the elections with 9 or fewer candidates.

reg | n | Exp (win) | win | P (cum) |

1 | 6086 | 3328.53 | 3320 | 0.418 |

2 | 6086 | 3328.49 | 3368 | 0.849 |

3 | 5353 | 2962.21 | 2912 | 0.086 |

4 | 4348 | 2410.67 | 2448 | 0.876 |

5 | 3242 | 1786.74 | 1768 | 0.260 |

6 | 2367 | 1298.96 | 1290 | 0.363 |

7 | 1550 | 850.22 | 842 | 0.347 |

8 | 889 | 482.70 | 495 | 0.806 |

9 | 394 | 211.22 | 218 | 0.769 |

. | ||||

last | 6078 | 3323.84 | 3330 | 0.568 |

. | ||||

all | 30315 | 16659.74 | 16661 | 0.508 |

There isn’t very much going on here. #3 tends to be less successful, but we don’t have any theoretical explanation for that result. None of the other registration numbers differ very much from their expected values. With only a few candidates, numbers don’t seem to matter very much.

What about the elections with 10 or more candidates?

reg | n | Exp (win) | win | P (cum) |

1 | 1434 | 713.40 | 750 | 0.975 |

2 | 1433 | 713.00 | 705 | 0.346 |

3 | 1433 | 713.00 | 743 | 0.946 |

4 | 1433 | 713.00 | 723 | 0.710 |

5 | 1433 | 713.00 | 675 | 0.024 |

6 | 1433 | 713.00 | 694 | 0.164 |

7 | 1432 | 712.37 | 718 | 0.627 |

8 | 1432 | 712.40 | 726 | 0.772 |

9 | 1434 | 713.55 | 721 | 0.663 |

10 | 1431 | 712.30 | 700 | 0.266 |

11 | 1169 | 576.86 | 564 | 0.235 |

12 | 939 | 457.45 | 471 | 0.821 |

13 | 788 | 379.97 | 377 | 0.430 |

14 | 653 | 311.81 | 309 | 0.428 |

15 | 530 | 246.88 | 253 | 0.718 |

16 | 451 | 205.61 | 212 | 0.743 |

17 | 373 | 166.89 | 163 | 0.363 |

18 | 315 | 138.59 | 139 | 0.542 |

19 | 257 | 110.98 | 106 | 0.287 |

20 | 218 | 92.51 | 95 | 0.660 |

21+ | 1003 | 382.43 | 357 | 0.052 |

. | ||||

last | 1430 | 711.75 | 723 | 0.733 |

21+, not last | 832 | 311.50 | 286 | 0.036 |

21+, last | 171 | 70.93 | 71 | 0.537 |

. | ||||

all | 21024 | 10199.00 | 10201 | 0.514 |

So now #3 does better than expected but #5 is the terrible number. If you have a theoretical reason why #3 is great in large fields but a disaster in small ones, I’d love to hear it. I’m going to chalk this up to the joys of random numbers. Anyway, those are not the droids we are looking for.

Look at #1! #1 was expected to win 713.40 times, but it actually won 750 times. At least 750 wins have a cumulative probability of .975. With 95% confidence, we can reject the null hypothesis that the difference between 750 and 713.40 is just due to random chance! Result!

Moreover, look at the bottom of the table. I have added two lines. One looks at candidates with registration numbers of 21 and higher but who were not the last candidate on the (long) ballot. These candidates were expected to win 311.50 times, but they actually only won 286 times, for a cumulative probability of 0.036. That isn’t less than 0.025, but it’s close. Moreover, when you consider that candidates with numbers of 21 and higher who were the last person on the ballot won at almost exactly the expected rate, it seems more convincing.

[Eagle-eyed readers might wonder why the number of cases and expected wins aren’t all the same for #1-#10. The minor differences are due to cases in which a candidate was ruled ineligible or died after drawing numbers but before the election. In these rare cases, a district might have had a #9 but no #3, for example.]

So let’s sum up. Registration numbers seem to matter, but only in very large fields of candidates. When there are lots of candidates, it is an advantage to be the #1 candidate. It is also a disadvantage to be down at the bottom of a long, long ballot. However, if you are the last person on a long ballot, voters can find you just as easily as they can find people near the top. In large fields, it’s helpful to be in an easy-to-find place on the ballot.

There is no evidence that #1 is intrinsically lucky, since it doesn’t help at all in small fields. #4 isn’t unlucky at all. Really, there is not much evidence that any numbers are particularly lucky or unlucky.

Most importantly, any effects are very small. We had to look at an enormous set of data to tease out any results. All the other stuff, such as charisma, party affiliation, local networks, etc. are far, far more important.

December 30, 2015 at 1:01 am |

This post made me very happy. Thanks for writing it! As an American it was a little difficult to wrap my mind around the 50+ candidate Taipei County ballots from Hung Hsiu-chu’s early days.

December 30, 2015 at 2:23 am |

I didn’t check this time, but if I recall correctly, there was one local election that had even more than the 50 candidates in the 1995 Taipei County legislative ballot. I wonder how many candidates there were on the Afghani ballot for the Kabul district with 35 seats or the Japanese upper house ballot with all 48(?) seats in one district. I’m told the Japanese district was nicknamed the “sad” district.

December 30, 2015 at 11:54 pm |

I like three things about this post. First, the logic is clear and the analysis well written. Second: its an elegant example of a moderation analysis. Ballot order can matter, but only on longer ballots. Add in the interaction statistics, and I think this could be published as a research report. Third, Nathan concludes with modesty about the size of the effects he has found–important in an academic context that is overly obsessed with significance testing and tends to ignore effect sizes and meaningfulness of analyses.

Bravo!

December 31, 2015 at 3:13 am |

Thanks. I was just noodling around, and I didn’t really expect to find anything. If I wanted to publish this, I’d have to do it a lot more carefully. At the very least, I’d have to control for incumbency and party affiliation. I’d probably also want to demonstrate that the effect also shows up in vote share, which is much harder to operationalize.

January 5, 2016 at 5:27 pm |

The party list vote this year features 18 parties. The ballot will be 73cm long. The CEC says that, if in the future there are more than 18 parties, they will consider making a ballot with two rows. I didn’t consider the physical structure of the ballot, but I wonder if 73cm makes some parties hard to find.

http://www.storm.mg/category/k24996

January 14, 2016 at 3:42 pm |

I suspect that the KMT is going to lose a few votes to the MKT on the party vote because the latter appears much higher on the list and some people with dyslexia or lots of carelessness may confuse 民國黨 and 國民黨.

See, for example, this post on a similar issue from 2005:

http://pinyin.info/news/2005/names-of-political-parties/

But I don’t know any way to test for that.

January 14, 2016 at 4:36 pm |

I don’t know any way to test for that either, but I’m less worried about it than you are. For one thing, the KMT’s name on the ballot is five characters, 中國國民黨, while the MKT’s is only three characters 民國黨. For another, they have a party logo on the ballot. For yet another, that’s why they spend so much time and energy telling you which number to vote for. I can’t remember exactly where, but somewhere in the back of my head I remember a case of an election with a minor candidate with exactly the same name as a major candidate. I thought maybe it was a dirty trick by the other party, but the major candidate ended up with something like 80,000 votes and the minor candidate got something like 200. In other words, the voters can usually figure out who they want to vote for.

January 14, 2016 at 5:38 pm

They have party logos on the ballots now? That’s welcome. But I think that in this particular case it will only muddy the waters further. The MKT logo and the KMT’s are very similar, esp. if rendered in just black and white.

January 14, 2016 at 7:51 pm

This reminds me of the early 90s Eddie Murphy movie, The Distinguished Gentleman. But I agree, people are rarely so confused.