What’s your lucky number?

One of the fun little quirks of Taiwanese elections is that the candidates draw numbers, and these registration numbers become an integral part of their campaign messages over the last few weeks. This is the only time of the year when people happily shout “#2!” or “#5” at their friends. When candidates draw the numbers, everyone cheers the lucky number 1 and everyone groans at the unlucky number 4. (“Four” sounds similar to “death” in Mandarin and is exactly the same in Taiwanese.) Do these numbers really matter?

Answer #1: Of course not! Don’t be silly. They are randomly drawn, so the effect should be random. Winning and losing depends on important characteristics, such as party affiliation and whether or not a candidate has had a bribery scandal, not stupid things like registration numbers. Why are you wasting time with this topic, FG?

Answer #2: Of course it matters! People are not machines. Some people are superstitious and vote for lucky numbers or avoid unlucky ones. More importantly, people make choices in different ways. Some might look down the list until they find a candidate they like.  Some people think voting is a duty and care more about the act than the choice, so they  might vote for the person at the top of the list because that’s the first place they look. Names might be easier to find at some places on the ballot than others. Scholars in the USA have found that there is an advantage to being at the top of the ballot, and pollsters often randomize questions and answers to avoid skewing the results.

Well, this sounds like a good old-fashioned empirical question. We have two competing theoretical perspectives, both of which make sense. Which one do the data support? Fortunately, I have a bunch of data perfectly formatted to answer this kind of question. I’m going to look at whether candidates with a particular registration number won at a higher rate than expected.

 

First, our hypotheses. We have the superstitious hypotheses:

H1: Reg #1 should have more winners (than random).

H2: Reg #4 should have fewer winners.

We also have the easy-to-find hypotheses, where the candidates at the top and bottom of the ballot are the favored ones. (Technically, I think the ballots still run right to left, but it’s the same idea.)

H3: Reg #1 should have more winners.

H4: The highest registration number should have more winners.

 

Second, methodology. I look at all candidates in all district elections (including indigenous but not including party list) since 1985. I exclude all candidates who ran uncontested. For each candidate, I have a registration number, the number of seats elected in the district, the number of candidates running in the district, and whether the candidate won or lost. Each candidate’s probability of winning is calculated as the number of seats divided by the number of candidates. I sum up all the probabilities and actual wins by registration numbers. For example, there have been 286 legislative candidates who drew #1. These are expected to produce 107.94 winners, but in reality 118 won.

Is 118 higher than 107.94? Now we need some basic statistics. If you flip a coin 100 times, you would expect to get 50 heads. However, you usually will get some other number, just by random chance. If you get 53 heads, you probably shouldn’t announce that the coin is unfair. If you get 75 heads, you might want to inspect the coin. In a statistical sense, we can judge whether 53 or 75 are really higher than the expected 50 using a cumulative binomial distribution. Basically, this calculates the probability of getting 1 head, 2 heads, 3 heads, … , and n heads, and sums up all those probabilities. If the cumulative sum is very large or very small, your coin might not be fair. By convention, most scientists agree that 95% is a reasonable threshold. So if our cumulative probability falls under .025 or above .975, we’ve got an interesting result. Otherwise, we can’t reject the null hypothesis that this is just a random result.

In our example, the 286 candidates were expected to win 107.94 times, for an average winning percentage of .377. [Some real statistician can probably tell me why this is not exactly correct, but I think it’s ok for a quick and dirty study like this one.] The cumulative probability of at least 118 wins is .901, which seems pretty high. It isn’t over .975, so we can’t reject the null hypothesis with 95% confidence. Legislative candidates who drew the lucky #1 won more than expected, but not so much more that there is a statistically significant difference. It could just be random.

Here is the full table for legislative candidates since 1986:

reg n Exp (win) win P (cum)
1 286 107.94 118 0.901
2 286 107.94 104 0.339
3 259 94.51 98 0.698
4 217 80.27 82 0.625
5 187 70.77 71 0.546
6 167 64.37 59 0.220
7 140 56.54 54 0.365
8 125 51.68 48 0.283
9 116 47.50 44 0.287
10 105 43.61 43 0.494
11 97 40.71 49 0.964
12 79 33.53 29 0.180
13 71 30.20 28 0.344
14 60 25.35 24 0.415
15 56 23.57 27 0.856
16 54 22.50 26 0.865
17 45 18.19 20 0.760
18 40 16.01 16 0.567
19 37 14.68 16 0.732
20 30 11.31 12 0.677
21+ 215 70.84 64 0.179
.
last 286 107.94 115 0.822
.
all 2672 1032.02 1032 0.508

So how do the hypotheses look? As previously pointed out, #1 is high but not statistically significant. We thought #4 might be unlucky, but those candidates actually won slightly more than expected. The superstitious hypotheses don’t look so good. H4 says that the last number should be useful for finding the candidate on the ballot. In fact, candidates with the last number did slightly better than expected, though again, this is doesn’t reach statistical significance.

There is another number that is interesting. I combined all the numbers above 20, and candidates with these numbers didn’t do so well. It’s not statistically significant, but it is suggestive. If the easy-to-find hypothesis is correct, maybe voters won’t look as hard for candidates way down on a long ballot. We thus have three suggestive but not conclusive results. If only we had more data.

Wait a minute. The rest of the world looks at legislative data and thinks they have done an exhaustive study. I’m Frozen Garlic. Of course I have more data! That table was only for legislators. I have data on every election except neighborhood chiefs and some earlier township council elections.

 

reg n Exp (win) win P (cum)
1 7520 4041.93 4070 0.746
2 7519 4041.49 4073 0.770
3 6786 3675.21 3655 0.315
4 5781 3123.67 3171 0.897
5 4675 2499.74 2443 0.050
6 3800 2011.96 1984 0.186
7 2982 1562.59 1560 0.469
8 2321 1195.10 1221 0.864
9 1828 924.77 939 0.755
10 1431 712.30 700 0.266
11 1169 576.86 564 0.235
12 939 457.45 471 0.821
13 788 379.97 377 0.430
14 653 311.81 309 0.428
15 530 246.88 253 0.718
16 451 205.61 212 0.743
17 373 166.89 163 0.363
18 315 138.59 139 0.542
19 257 110.98 106 0.287
20 218 92.51 95 0.660
21+ 1003 382.43 357 0.052
.
last 7508 4035.59 4053 0.661
.
all 51339 26858.74 26862 0.513

In this data set of over 50,000 candidates, #1 doesn’t seem to have any great advantage. #4 looks borderline lucky, which is completely unexpected. #5 is the unluckiest number, though these still don’t reach conventional levels of statistical significance. I certainly don’t have any theoretical reason why #4 should be good or #5 should be bad, so I’m going to go with the random blip idea. I don’t see much hope for the superstition hypotheses.

However, the other set of hypotheses might still be plausible. #1 doesn’t look that promising, and neither does the last registration number. However, candidates down at the end of long ballots are noticeably less successful. Those with registration numbers of 21 and higher were expected to win 382.43 times but only won 357 times, a borderline significant result.

If the easy-to-find idea is right, it should only matter when some candidates are hard to find. That is, when there are only a few candidates, everyone is easy to find. Let’s look at races with only a few candidates and races with lots of candidates. First, here are the elections with 9 or fewer candidates.

reg n Exp (win) win P (cum)
1 6086 3328.53 3320 0.418
2 6086 3328.49 3368 0.849
3 5353 2962.21 2912 0.086
4 4348 2410.67 2448 0.876
5 3242 1786.74 1768 0.260
6 2367 1298.96 1290 0.363
7 1550 850.22 842 0.347
8 889 482.70 495 0.806
9 394 211.22 218 0.769
.
last 6078 3323.84 3330 0.568
.
all 30315 16659.74 16661 0.508

There isn’t very much going on here. #3 tends to be less successful, but we don’t have any theoretical explanation for that result. None of the other registration numbers differ very much from their expected values. With only a few candidates, numbers don’t seem to matter very much.

What about the elections with 10 or more candidates?

reg n Exp (win) win P (cum)
1 1434 713.40 750 0.975
2 1433 713.00 705 0.346
3 1433 713.00 743 0.946
4 1433 713.00 723 0.710
5 1433 713.00 675 0.024
6 1433 713.00 694 0.164
7 1432 712.37 718 0.627
8 1432 712.40 726 0.772
9 1434 713.55 721 0.663
10 1431 712.30 700 0.266
11 1169 576.86 564 0.235
12 939 457.45 471 0.821
13 788 379.97 377 0.430
14 653 311.81 309 0.428
15 530 246.88 253 0.718
16 451 205.61 212 0.743
17 373 166.89 163 0.363
18 315 138.59 139 0.542
19 257 110.98 106 0.287
20 218 92.51 95 0.660
21+ 1003 382.43 357 0.052
.
last 1430 711.75 723 0.733
21+, not last 832 311.50 286 0.036
21+, last 171 70.93 71 0.537
.
all 21024 10199.00 10201 0.514

So now #3 does better than expected but #5 is the terrible number. If you have a theoretical reason why #3 is great in large fields but a disaster in small ones, I’d love to hear it. I’m going to chalk this up to the joys of random numbers. Anyway, those are not the droids we are looking for.

Look at #1! #1 was expected to win 713.40 times, but it actually won 750 times. At least 750 wins have a cumulative probability of .975. With 95% confidence, we can reject the null hypothesis that the difference between 750 and 713.40 is just due to random chance! Result!

Moreover, look at the bottom of the table. I have added two lines. One looks at candidates with registration numbers of 21 and higher but who were not the last candidate on the (long) ballot. These candidates were expected to win 311.50 times, but they actually only won 286 times, for a cumulative probability of 0.036. That isn’t less than 0.025, but it’s close. Moreover, when you consider that candidates with numbers of 21 and higher who were the last person on the ballot won at almost exactly the expected rate, it seems more convincing.

[Eagle-eyed readers might wonder why the number of cases and expected wins aren’t all the same for #1-#10. The minor differences are due to cases in which a candidate was ruled ineligible or died after drawing numbers but before the election. In these rare cases, a district might have had a #9 but no #3, for example.]

 

So let’s sum up. Registration numbers seem to matter, but only in very large fields of candidates. When there are lots of candidates, it is an advantage to be the #1 candidate. It is also a disadvantage to be down at the bottom of a long, long ballot. However, if you are the last person on a long ballot, voters can find you just as easily as they can find people near the top. In large fields, it’s helpful to be in an easy-to-find place on the ballot.

There is no evidence that #1 is intrinsically lucky, since it doesn’t help at all in small fields. #4 isn’t unlucky at all. Really, there is not much evidence that any numbers are particularly lucky or unlucky.

Most importantly, any effects are very small. We had to look at an enormous set of data to tease out any results. All the other stuff, such as charisma, party affiliation, local networks, etc. are far, far more important.

9 Responses to “What’s your lucky number?”

  1. ジェームス (@jmstwn) Says:

    This post made me very happy. Thanks for writing it! As an American it was a little difficult to wrap my mind around the 50+ candidate Taipei County ballots from Hung Hsiu-chu’s early days.

    • frozengarlic Says:

      I didn’t check this time, but if I recall correctly, there was one local election that had even more than the 50 candidates in the 1995 Taipei County legislative ballot. I wonder how many candidates there were on the Afghani ballot for the Kabul district with 35 seats or the Japanese upper house ballot with all 48(?) seats in one district. I’m told the Japanese district was nicknamed the “sad” district.

  2. petergries88 Says:

    I like three things about this post. First, the logic is clear and the analysis well written. Second: its an elegant example of a moderation analysis. Ballot order can matter, but only on longer ballots. Add in the interaction statistics, and I think this could be published as a research report. Third, Nathan concludes with modesty about the size of the effects he has found–important in an academic context that is overly obsessed with significance testing and tends to ignore effect sizes and meaningfulness of analyses.

    Bravo!

    • frozengarlic Says:

      Thanks. I was just noodling around, and I didn’t really expect to find anything. If I wanted to publish this, I’d have to do it a lot more carefully. At the very least, I’d have to control for incumbency and party affiliation. I’d probably also want to demonstrate that the effect also shows up in vote share, which is much harder to operationalize.

  3. frozengarlic Says:

    The party list vote this year features 18 parties. The ballot will be 73cm long. The CEC says that, if in the future there are more than 18 parties, they will consider making a ballot with two rows. I didn’t consider the physical structure of the ballot, but I wonder if 73cm makes some parties hard to find.

    http://www.storm.mg/category/k24996

  4. pinyinnews Says:

    I suspect that the KMT is going to lose a few votes to the MKT on the party vote because the latter appears much higher on the list and some people with dyslexia or lots of carelessness may confuse 民國黨 and 國民黨.

    See, for example, this post on a similar issue from 2005:
    http://pinyin.info/news/2005/names-of-political-parties/

    But I don’t know any way to test for that.

    • frozengarlic Says:

      I don’t know any way to test for that either, but I’m less worried about it than you are. For one thing, the KMT’s name on the ballot is five characters, 中國國民黨, while the MKT’s is only three characters 民國黨. For another, they have a party logo on the ballot. For yet another, that’s why they spend so much time and energy telling you which number to vote for. I can’t remember exactly where, but somewhere in the back of my head I remember a case of an election with a minor candidate with exactly the same name as a major candidate. I thought maybe it was a dirty trick by the other party, but the major candidate ended up with something like 80,000 votes and the minor candidate got something like 200. In other words, the voters can usually figure out who they want to vote for.

      • pinyinnews Says:

        They have party logos on the ballots now? That’s welcome. But I think that in this particular case it will only muddy the waters further. The MKT logo and the KMT’s are very similar, esp. if rendered in just black and white.

      • Greg Says:

        This reminds me of the early 90s Eddie Murphy movie, The Distinguished Gentleman. But I agree, people are rarely so confused.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: