Total Tests: 4
Wrong Tests: 0
Avg Score: 10.00
| Category | Tests | Wrong | Avg Score |
|---|---|---|---|
| Anti-AI Tricks | 2 | 0 | 10.00 |
| Domain specific | 1 | 0 | 10.00 |
| Puzzle Solving | 1 | 0 | 10.00 |
Aibenchy
Benchmarks generated from Aibenchy test suites at 2026-02-16T00:55:25.158Z
Models Evaluated: 10
Total Runs: 40
Total Wrong: 21
| Rank | Model Name | Company ⓘ | Avg Score ⓘ | Value Score ⓘ | Tests Correct ⓘ | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #1 | Z.ai: GLM 5 Reasoning (medium) — | Z.ai | 10.00 | 625.72 | 4/4 | ||||||||||||||||
| Total Tests: 4 Wrong Tests: 0 Avg Score: 10.00
| |||||||||||||||||||||
| #2 | StepFun: Step 3.5 Flash No Reasoning Free Available — | StepFun | 8.00 | 0.00 | 3/4 | ||||||||||||||||
| Total Tests: 4 Wrong Tests: 1 Avg Score: 8.00
| |||||||||||||||||||||
| #3 | Z.ai: GLM 5 No Reasoning — | Z.ai | 7.75 | 628.03 | 3/4 | ||||||||||||||||
| Total Tests: 4 Wrong Tests: 1 Avg Score: 7.75
| |||||||||||||||||||||
| #4 | MiniMax: MiniMax M2.5 No Reasoning — | MiniMax | 7.75 | 545.23 | 3/4 | ||||||||||||||||
| Total Tests: 4 Wrong Tests: 1 Avg Score: 7.75
| |||||||||||||||||||||
| #5 | Z.ai: GLM 4.7 Flash No Reasoning — | Z.ai | 5.50 | 6,306.47 | 2/4 | ||||||||||||||||
| Total Tests: 4 Wrong Tests: 2 Avg Score: 5.50
| |||||||||||||||||||||
| #6 | Qwen: Qwen3 Coder Next Reasoning (medium) — | Qwen | 3.25 | 88,219.33 | 1/4 | ||||||||||||||||
| Total Tests: 4 Wrong Tests: 3 Avg Score: 3.25
| |||||||||||||||||||||
| #7 | Qwen: Qwen3 Coder Next No Reasoning — | Qwen | 3.25 | 83,461.74 | 1/4 | ||||||||||||||||
| Total Tests: 4 Wrong Tests: 3 Avg Score: 3.25
| |||||||||||||||||||||
| #8 | Z.ai: GLM 4.7 Flash Reasoning (medium) — | Z.ai | 3.25 | 442.97 | 1/4 | ||||||||||||||||
| Total Tests: 4 Wrong Tests: 3 Avg Score: 3.25
| |||||||||||||||||||||
| #9 | MiniMax: MiniMax M2.5 Reasoning (medium) — | MiniMax | 3.25 | 143.91 | 1/4 | ||||||||||||||||
| Total Tests: 4 Wrong Tests: 3 Avg Score: 3.25
| |||||||||||||||||||||
| #10 | OpenAI: GPT-4o-mini No Reasoning — | OpenAI | 1.00 | 16,920.47 | 0/4 | ||||||||||||||||
| Total Tests: 4 Wrong Tests: 4 Avg Score: 1.00
| |||||||||||||||||||||
Choose the first model, then click a second model to open a side-by-side page.