### Model google/gemma-4-31B-it ### Public score sources - GPQA Diamond: 84.3 — https://huggingface.co/google/gemma-4-31B-it - HLE (Humanity's Last Exam): 19.5 — https://huggingface.co/google/gemma-4-31B-it - AIME 2026: 89.2 — https://huggingface.co/google/gemma-4-31B-it - MMLU-Pro: 85.2 — https://huggingface.co/google/gemma-4-31B-it - MMMLU: 85 — https://huggingface.co/google/gemma-4-31B-it - BigBench Hard (BBH): 91.5 — https://huggingface.co/google/gemma-4-31B-it - MathVision: 85.6 — https://huggingface.co/google/gemma-4-31B-it - OmniDocBench 1.5: 91 — https://huggingface.co/google/gemma-4-31B-it - OpenAI MRCR v2 (8-needle): 66.4 — https://huggingface.co/google/gemma-4-31B-it - τ³-Bench: 76.9 — https://huggingface.co/google/gemma-4-31B-it - LiveCodeBench: 80 — https://huggingface.co/google/gemma-4-31B-it ### BenchPress output - SWE-Lancer IC SWE Diamond Freelance ($): 64363.3 - Vending-Bench 2: 6236.9 - Codeforces Rating: 2315.1 - GDPval (Artificial Analysis ELO): 1776 - Chatbot Arena Elo: 1414.5 - MATH-500: 98.5 - GSM8K: 97.5 - COLLIE: 97.1
Model
google/gemma-4-31B-it
Public score sources
BenchPress output