Gemini 3.1 Pro is surprisingly good at classifying banking transactions

I gave the following prompt, along with a PDF of my transaction history from the 30th of January to the 25th of February, to three different SOTA LLMs:

Please try to accurately classify each of the transactions into categories and get the total per category. Try to do research on the ones you don’t know, and if you don’t know and can’t find out what one is, classify it under “other”.

The models I used were Gemini 3.1 Pro, GPT 5.2 Thinking, and Claude Opus 4.6, each with the web search tool enabled. Of the three, Gemini did best. Claude got 51/60. GPT got 52/60. And Gemini got 59/60.

What surprised me was Gemini’s ability to identify transactions from services with very vague identifiers, and that are very much country-specific (from South Africa). For example, it noted,

AE ON OKAVANGO / AE MARLBOROUGH ST: “AE” stands for Astron Energy, a major fuel station network.

This is correct, and the name Astron Energy has only been around and growing since 2021/2. Many South Africans would still probably know it only by its previous name, Caltex. Yes, that means it was probably (almost definitely) in the training data, but to accurately identify the two letters AE as Astron Energy is not easy. And both other models did not get the actual name of this one right (GPT put it in the correct category, but did not initially know that it was Astron Energy; only after I asked it did it find the correct name).

Some more hard ones that the models had to classify were:

BYC (FNB’s Bank Your Change initiative) — Gemini ✓, Claude ✓, GPT x
MOMMEDSCH DB (Momentum medical insurance) — Gemini ✓, Claude x, GPT ✓
MOMGAP (Momentum insurance, gap cover) — Gemini ✓, Claude x, GPT x
PayFast*Melon Mobil (subscription to a relatively new SA mobile network operator) — Gemini ✓, Claude x, GPT x
SweepSouth (a home cleaning service) — Gemini ✓, Claude ✓, GPT x

It amazes me how Gemini seemingly effortlessly identified and explained each of these.