Just sharing stuff…
Of math-slayin’, the inverted V, and linearity
Summarizing our group chat discussion on the VP race
by Erika, Clarissa David, Jose Ernie Lope, and Chris Monterola
Erika wants to thank Mariel Rae Kho Fang and Alyson Lao Yap for their insights on the GIF. She also wants to thank Chris Alis and Rouelli Sabino for the datasets.
In this blog post, we take a peek at the data on the controversial Philippine VP race that prompted many Pinoy academics to download, visualise, and analyse the 2016 election results in the wake of the allegations of systematic cheating by one of the camps after two local professionals performed and presented some of their (flawed) statistical analyses on the election numbers. Many of our colleagues have already responded to the weak and erroneous assertions of the two. The most notable ones, in our opinion, are the following.
Others have also done some more in depth analysis on the different aspects of the data/issue.
Here, the aim is just to provide a descriptive analysis of the time series data in question since the authors are already in the opinion that the issue has already been sufficiently addressed by peers elsewhere (check links above).
TL;DR. The order of transmission of votes (from ones bailiwicks to another’s) gave rise to the inverted V pattern; there was no “magic” involved.
It all started with the “inverted v” plot. Many question the sudden shift in the trend of the difference between the Marcos and Robredo votes (M minus R) when plotted against the % transmitted votes (%T). In Figure 1 below, we reproduce this %T vs (M-R) plot; the x-axis is (%T) and the y-axis is (M-R). At around 63%, we see the onset of a trend reversal. This observed trend in the second regime continued until Robredo (Leni) finally overtook Marcos (BBM). This came as a surprise to many, and some even insinuated that there’s systematic cheating in the VP race.
The trend is indeed suspect IF the votes were transmitted in a (uniformly) random manner, geographically speaking. But, was the transmission of votes really random? In the GIF below (click to enlarge), we provide a “replay” of what happened. In the viz, we can clearly see that the votes were not randomly transmitted; the transmission was actually skewed toward Marcos’s bailiwicks [Region I, CAR, and Region III] and the NCR where BBM was more popular than Leni. “Skewed” here is used loosely to mean that the votes from certain regions were processed first.
In the figure below, we show two snapshots taken from the GIF above. The one on the left is frozen at ~50% transmission while the one on the right at ~76%.
Looking at the varying transmission rates of each region and taking into consideration the preferences of voters from each region, it does make more sense now why we are seeing two trends when we plot the BBM-Leni vote difference as a function of % transmission. BBM’s lead was widening in the first regime (left leg) since it was mostly the regions where he’s most favoured that were initially transmitting vote counts. Then votes from Leni’s bailiwicks started to come in, which caused the BBM-Leni gap to narrow; this rush of votes from Leni’s balwarte eventually surpassed the inflow of votes from Marcos’s. And, at 96.14% precincts reporting (as of May 18th 2016, 4:56:15 pm), Leni already had 14,023,093 votes and BBM, 13,803,966.
Below, we present results obtained from statistical models (to show the validity of our assumptions). In all the models, it is assumed that the voter preferences are the same; i.e. voters from Ilocos are mainly for BBM, those from Bicol, Leni, etc. In the first model (red filled circles), we reconstruct what actually happened following the voter preferences and order of transmission; the result is an “inverted V” pattern if we plot (%T) vs (M-R). In the second model (green unfilled triangles), the plot shows the resulting trend if the order of the transmission of votes were reversed (i.e. starting with Leni’s bailiwicks instead of BBM’s). As expected, we see a “V” plot instead of an “inverted V”. Finally, for the sake of completeness, we also looked at what happens if the transmission were completely random; a linear trend is produced (blue squares).
In layman’s terms, please.
If the explanation above is a bit tedious and overwhelming, consider the following analogy (Table 1). Imagine 5 sections (in a school). Classes A to C are for JaDine while Classes D to E are for KathNiel.
When the school Principal asked to have the number of students who were for KathNiel and for JaDine counted, students in classes A and B (yellow-filled boxes) were still having their recess; hence, the teachers were only able to collect for classes C, D, and E (blue-filled boxes). The “partial and unofficial” results indicate that 98 of the students were for KathNiel and only 65 were for JaDine. After recess, the teachers were finally able to survey classes A & B. Results then showed that 140 students were actually for JaDine, while only 113 students were for KathNiel. There is no “magic” there. It’s just that the order by which the classrooms were surveyed were skewed toward the KathNiel-fans-dominated classrooms.
Another issue that was raised was that the trends were “too linear”. Some analysts incorrectly suggest that the linearity is evidence of cheating. Wrong. It’s just a matter of ratio and proportion.
TL;DR. The linearity is expected. It’s just a matter of ratio and proportion. If in a given administrative region the ratio of BBM::Leni votes is 2::3, for example, this ratio is expected to be maintained; same concept behind surveys albeit a larger “sample” size and that it’s early returns for the former (vs sampling).
Take, for example, a school of 1000 students; 200 of the students are KathNiel fans, while 800 are JaDine. If in the same school, only 100 of the students are present (or sampled), and if the (random) 100 are asked who they are for, KathNiel or JaDine, there’s a high likelihood that 20 would be for KathNiel and 80 for JaDine. The ratio would still be the same; that is, 1 KathNiel fan for every 4 JaDine fans. This is also why surveys (like SWS and Pulse Asia) work.
The linearity that arose in the plots can be attributed to a simple ratio and proportion, the same concept behind surveys. The “inverted V” pattern can be attributed to the order of (regional/provincial) transmission and voter preferences (bailiwicks). An obvious inflection is observed somewhere between the 60%-65%; this is when the votes from other regions have started coming in. In the first regime, the transmission was mostly coming from Marcos’s bailiwicks; regardless, we observe a linear trend (for that first regime). At around 60-65% transmission, we see a transition; this is when the votes from Leni’s bailiwicks started to come in. Yes, it is still expected to be linear.
When looking at the % transmission of each region, it is important to have an idea of the number of registered voters coming from that region. Note that a 50% transmission for NCR translates to much more votes than 50% transmission for, say, ARMM. Below, we plot the total number of registered voters for each region. Thanks to Christian Alis for providing us with the data.
Taking this into consideration, we recommend that the reader also take a look at the map below to fully appreciate Figure 1 (GIF) above. In the figure below, the filled circles are scaled using the number of registered voters in each region. The yellow and red bars show the normalized number of votes for Leni and BBM, respectively.
Finally, if you want more stats and pretty viz, check out this story by the Thinking Machines.