26.8 C
Thursday, February 29, 2024

Dissecting the data

- Advertisement -

“What happened to the votes for the eight other candidates?”

While the Commission on Elections, Parish Pastoral Council for Responsible Voting and even ABS-CBN Data Analytics have all vouched for the accuracy of the results of the May 9 presidential elections, skeptics, however, remain unconvinced.

We sought the opinion of an expert based in France who has done extensive research on Artificial Intelligence, Machine Learning and Robotics but who prefers to remain anonymous. Here’s what she said.

The consistent 68:32 relationship between the votes for Marcos Jr. and Robredo gives an almost perfect linear equation that is absolutely incredible in any electoral context.

Since election results come in successively from various geographical regions, these should reflect variable voting bias (e.g., Marcos Jr. would have much higher results than Robredo in Ilocos and vice-versa in Robredo’s home province) and therefore vary noticeably in both directions.

She cited the recent French elections as an example. There, vote proportions zigzagged visibly until they settled at the final Macron-Le Pen scores of 58.55 percent-41.45 percent—and this was the run-off where there were only 2 candidates.

- Advertisement -

The situation is more complex in the Philippines because of the presence of many other candidates in a single round, so variation should be higher.

She disputed the view that the partial election results consistently mirrored the national vote due to the random pattern of receipt of the transmitted results: It is true that sample statistics closely mirror population proportions in highly controlled studies where polling institutes or scientific researchers exert utmost effort to ensure truly random samples (in the mathematical sense of “identically and independently distributed” samples).

The key point is this: random pattern of receipt is not the same as random sampling in the mathematical sense. On election night where the paramount goal is reporting speed, even the most advanced countries must sacrifice random sampling in the mathematical sense and live with a random pattern of receipt. That is the reason successive partial results are never constant.

Recall that in the last US presidential elections where results were also reported as they came in, the Biden:Trump ratio oscillated based on whether they came in from red or blue states. Ditto for the Macron:Le Pen ratio in France this month.

The law of large numbers says that as sample size approaches that of the population it is deemed to represent, the target value calculated from the sample approaches that of the underlying population.

The law of large numbers explains the importance of big data in information technology and the incredible power yielded by giants like Google and Amazon who have the logistics to collect and store massive datasets. Less data-rich analysts must rely on their expertise to carefully randomize small samples so that their underlying distribution approximates that of the population. This is the standard procedure in carefully controlled scientific studies but impossible in an election context where sample statistics are reported as they come.

The earliest delivered samples are of necessity extremely small relative to that of the population (the total number of votes) and have little chance of mirroring the final result.

In the 2020 US elections, Trump led the race in the early counting stages. But as succeeding samples came in, they were integrated with previous samples and, as the aggregate sample size became larger and closer to the total population size, so did the accuracy of the election result. When Biden won in the end, Trump’s ignorance of the law of large numbers led him to cry foul and claim that his victory had been confiscated.

But that’s also the concrete proof that, contrary to the PPCRV’s claims, the random pattern of receipt of results leads to highly variable intermediate predictions.

And contrary to the PPCRV’s claims, the law of large numbers does not justify but rather casts doubt on Comelec’s 68:32 ratio which never budged as the aggregate sample size grew throughout the counting process.

Her conclusion: “The Comelec could have rigged the election results with a bit more subtlety, as this bulldozer approach is an insult to the electorate’s intelligence.”

Be that as it may, Congress convenes on May 24 to canvass the results of the election for president and vice-president. The lawmakers are expected to declare what we all know by now, that Ferdinand Marcos Jr. won as president and Sara Duterte as vice–president by a landslide, confirming opinion survey results since late last year.

With doubts raised, however, about the poll results, including the ratio of the votes counted for the two top presidential bets (68:23, or 100 percent), someone asked, correctly, if we may add: Whatever happened to the votes for the 8 other candidates? Thrown into the wastebasket?


- Advertisement -


Popular Articles