Claude 3.5 vs GPT 4o vs. Gemini - Interesting Results on Simple Math

I am doing an event for some students and collecting payments. I wanted to calculate the total amount collected quickly. There were 38 entries and in the payment column, most people paid $120 for their student, some paid $210 with a small discount for two students, and others had a free code. I thought this was pretty straightforward. My results were mixed.

ALL, miscalculated the results the first time. I just cut and pasted the fields and expected this to be a pretty simple task for GPT, then Claude and then Gemini. I then manually did the calculation using the sum feature to get the correct total. I knew they were all off. I tried GPT4, and also 3.5. The worst performer, sadly, was Sonnet 3.5. I wish I had recorded the results but maybe I'll do screenshots to see the outputs.

However, I did export the data into a spreadsheet CSV and then ran the same tests and all, except Sonnet got it right. I was surprised by these results. I am going to upgrade to the paid version fo Claude but this makes me nervous relying on GPT for math calculations. I have double check more often after these results.

2 comments