*Of math-slayin’, the inverted V, and linearity*

Summarizing our group chat discussion on the VP race

by Erika, Clarissa David, Jose Ernie Lope, and Chris Monterola

*Erika wants to thank Mariel Rae Kho Fang and Alyson Lao Yap for their insights on the GIF. She also wants to thank Chris Alis and Rouelli Sabino for the datasets.*

In this blog post, we take a peek at the data on the controversial Philippine VP race that prompted many Pinoy academics to download, visualise, and analyse the 2016 election results in the wake of the allegations of systematic cheating by one of the camps after two local professionals performed and presented some of their (flawed) statistical analyses on the election numbers. Many of our colleagues have already responded to the weak and erroneous assertions of the two. The most notable ones, in our opinion, are the following.

- Lost in Transmission: A Gentle Note by Mig Barreto García
- Weighing in on the Leni-Bongbong data debate: a growth rate perspective by Jan Carlo Punongbayan
- A Facebook post by Mariel Rae Kho Fang

Others have also done some more in depth analysis on the different aspects of the data/issue.

- A Facebook post by Chris Monterola on the linearity of the observed trends
- A Facebook post by Chris Alis on fraud detection in the election data set
- Bongbong, Leni, and Benford: The Philippine Vice Presidential Race by Gendith and Aris Sardane-Calamba

Here, the aim is just to provide a descriptive analysis of the time series data in question since the authors are already in the opinion that the issue has already been sufficiently addressed by peers elsewhere (*check links above*).

TL;DR. The order of transmission of votes (from ones bailiwicks to another’s) gave rise to the inverted V pattern; there was no “magic” involved.

It all started with the “inverted v” plot. Many question the sudden shift in the trend of the difference between the Marcos and Robredo votes (M minus R) when plotted against the % transmitted votes (%T). In Figure 1 below, we reproduce this %T vs (M-R) plot; the *x*-axis is (%T) and the *y*-axis is (M-R). At around 63%, we see the onset of a trend reversal. This observed trend in the second regime continued until Robredo (Leni) finally overtook Marcos (BBM). This came as a surprise to many, and some even insinuated that there’s systematic cheating in the VP race.

The trend is indeed suspect **IF** the votes were transmitted in a **(uniformly) random** manner, geographically speaking. But, was the transmission of votes really random? In the GIF below (*click to enlarge*), we provide a “replay” of what happened. In the viz, we can clearly see that the votes were not randomly transmitted; the transmission was actually skewed toward Marcos’s bailiwicks [Region I, CAR, and Region III] and the NCR where BBM was more popular than Leni. “Skewed” here is used loosely to mean that the votes from certain regions were processed first.

In the figure below, we show two snapshots taken from the GIF above. The one on the left is frozen at ~50% transmission while the one on the right at ~76%.

Looking at the varying transmission rates of each region and taking into consideration the preferences of voters from each region, it does make more sense now why we are seeing two trends when we plot the BBM-Leni vote difference as a function of % transmission. BBM’s lead was widening in the first regime (left leg) since it was mostly the regions where he’s most favoured that were initially transmitting vote counts. Then votes from Leni’s bailiwicks started to come in, which caused the BBM-Leni gap to narrow; this rush of votes from Leni’s balwarte eventually surpassed the inflow of votes from Marcos’s. And, at 96.14% precincts reporting (as of May 18th 2016, 4:56:15 pm), Leni already had 14,023,093 votes and BBM, 13,803,966.

Below, we present results obtained from statistical models (to show the validity of our assumptions). In all the models, it is assumed that the voter preferences are the same; i.e. voters from Ilocos are mainly for BBM, those from Bicol, Leni, etc. In the first model (red filled circles), we reconstruct what actually happened following the voter preferences and order of transmission; the result is an “inverted V” pattern if we plot (%T) vs (M-R). In the second model (green unfilled triangles), the plot shows the resulting trend if the order of the transmission of votes were reversed (i.e. starting with Leni’s bailiwicks instead of BBM’s). As expected, we see a “V” plot instead of an “inverted V”. Finally, for the sake of completeness, we also looked at what happens if the transmission were completely random; a linear trend is produced (*blue squares*).

**In layman’s terms, please.**

If the explanation above is a bit tedious and overwhelming, consider the following analogy (Table 1). Imagine 5 sections (in a school). Classes A to C are for **JaDine** while Classes D to E are for **KathNiel**.

When the school Principal asked to have the number of students who were for KathNiel and for JaDine counted, students in classes A and B (yellow-filled boxes) were still having their *recess*; hence, the teachers were only able to collect for classes C, D, and E (blue-filled boxes). The “partial and unofficial” results indicate that 98 of the students were for KathNiel and only 65 were for JaDine. After recess, the teachers were finally able to survey classes A & B. Results then showed that 140 students were actually for JaDine, while only 113 students were for KathNiel. There is no “magic” there. It’s just that the order by which the classrooms were surveyed were skewed toward the KathNiel-fans-dominated classrooms.

Another issue that was raised was that the trends were “too linear”. Some analysts incorrectly suggest that the linearity is evidence of cheating. Wrong. It’s just a matter of ratio and proportion.

TL;DR. The linearity is expected. It’s just a matter of ratio and proportion. If in a given administrative region the ratio of BBM::Leni votes is 2::3, for example, this ratio

is expectedto be maintained; same concept behind surveys albeit a larger “sample” size and that it’s early returns for the former (vs sampling).

Take, for example, a school of 1000 students; 200 of the students are KathNiel fans, while 800 are JaDine. If in the same school, only 100 of the students are present (or sampled), and if the (*random*) 100 are asked who they are for, KathNiel or JaDine, there’s a high likelihood that 20 would be for KathNiel and 80 for JaDine. The ratio would still be the same; that is, 1 KathNiel fan for every 4 JaDine fans. This is also why surveys (like SWS and Pulse Asia) work.

The linearity that arose in the plots can be attributed to a simple ratio and proportion, the same concept behind surveys. The “inverted V” pattern can be attributed to the order of (regional/provincial) transmission and voter preferences (bailiwicks). An obvious inflection is observed somewhere between the 60%-65%; this is when the votes from other regions have started coming in. In the first regime, the transmission was mostly coming from Marcos’s bailiwicks; regardless, we observe a linear trend (for that first regime). At around 60-65% transmission, we see a transition; this is when the votes from Leni’s bailiwicks started to come in. Yes, it is still expected to be linear.

When looking at the % transmission of each region, it is important to have an idea of the number of registered voters coming from that region. Note that a 50% transmission for NCR translates to much more votes than 50% transmission for, say, ARMM. Below, we plot the total number of registered voters for each region. Thanks to Christian Alis for providing us with the data.

Taking this into consideration, we recommend that the reader also take a look at the map below to fully appreciate Figure 1 (GIF) above. In the figure below, the filled circles are scaled using the number of registered voters in each region. The yellow and red bars show the normalized number of votes for Leni and BBM, respectively.

Finally, if you want more stats and pretty viz, check out this story by the Thinking Machines.

%d bloggers like this:

is this the same presentation that the congress will see? How come there is allegations of cheating from BBM camp.

LikeLike

Hello, Baybs. I don’t suppose anyone will use this since this is an unofficial (and personal) post. There are allegations because some professionals made some wrong assumptions on the time series (order of how the votes were transmitted). The assumptions were flawed, and therefore the conclusions that followed were flawed as well.

Thanks.

LikeLike

Looks to me that this analysis came from hardwork but it’s not necessarily encouraging a realization about what integrity is all about. It would be safe to discuss what is the basis of this numbers are. If you happen to have voted, you knew for a fact that it is very easy to manipulate the results using the unused ballots. Did you happen to guard until 5pm to check if those volunteers didn’t participate into manufacturing unused ballots in favor of the Liberal Party? The idea of bailiwicks (won by Leni) is already a mind conditioning tactic in the first place. Many people claim, for instance ARMM, Maguindanao, etc. Leni have won there because the data transmitted into the server appears to have zero votes for the rest of the candidates. How sure are we that it is real? Did we even bother to find out if the numbers being tallied by those machines came from real votes or from a shaded unused ballot?

LikeLike

Thanks for the insight, Jann, and for appreciating the effort put in this post! Note that we are only analysing the more tangible datasets that we have at hand. In this post, we are only providing a descriptive analysis and visualization of the election data. In the process, we are also challenging the claim that because there is a sudden change in trend (inverted v) and that the trends (%T vs M-R) are linear there’s systematic cheating.

Unfortunately, for your questions, we can only speculate on the possibilities. These are beyond the scope of this write-up, which zeros in on objectivity and reproducibility.

Thank you for dropping by, and sharing your thoughts. Cheers!

LikeLike

To add to the fray, here is a tally of reported incidents during the elections:

http://kontradaya.org/2016/05/kontra-daya-summary-report-as-of-7pm/

I think, what these works show is that we have to be careful in making claims that cheating happened on the basis of statistics. However, as my first sentence seem to show, the other side also has to be careful that it did not happen. After all, what we can gather are the transmitted votes. Our analysis do not include what’s behind the votes.

LikeLike

Thanks, Anne, for sharing the link. I definitely agree.

LikeLike

i like the Jadine and Kathniel anlysis… explaining the process in laymans terms… itwas just the BBM camp cannot accept they loss

LikeLike