| Model | Final score | Trajectory (per try) | Total time | Total tokens | Notes | Files |
|---|---|---|---|---|---|---|
| anthropic/claude-haiku-4-5 skeleton · anthropic_claude-haiku-4-5__skeleton |
13/13 | first-try 13/13 | 13.2s | 2.9k | play · raw · plan · score · meta | |
| google/gemini-2.5-flash skeleton · google_gemini-2.5-flash__skeleton |
13/13 | first-try 13/13 | 16.2s | 3.7k | play · raw · plan · score · meta | |
| google/gemma-4-26b-a4b-it skeleton · google_gemma-4-26b-a4b-it__skeleton |
13/13 | first-try 13/13 | 52.3s | 3.0k | play · raw · plan · score · meta | |
| x-ai/grok-code-fast-1 skeleton · x-ai_grok-code-fast-1__skeleton |
13/13 | first-try 13/13 | 1m01s | 4.9k | play · raw · plan · score · meta | |
| openai/gpt-5-mini skeleton · openai_gpt-5-mini__skeleton |
13/13 | first-try 13/13 | 1m04s | 5.4k | play · raw · plan · score · meta | |
| meta-llama/llama-4-scout skeleton · meta-llama_llama-4-scout__skeleton |
13/13 | base 0/2→r1 13/13 | 17.8s sum of 2 calls |
5.0k | play · raw · plan · score · meta | |
| x-ai/grok-4.3 skeleton · x-ai_grok-4.3__skeleton |
13/13 | base 0/2→r1 13/13 | 1m33s sum of 2 calls |
5.6k | play · raw · plan · score · meta | |
| qwen3:8b skeleton · qwen3_8b__skeleton |
13/13 | base 0/2→r1 13/13 | 13m32s sum of 2 calls |
22.4k | play · raw · plan · score · meta | |
| qwen2.5-coder:7b skeleton · qwen2.5-coder_7b__skeleton |
6/7 | base 6/7→r1 6/7→r2 6/7→r3 6/7 | 3m58s sum of 4 calls |
8.4k | failed: ts_compiles | play · raw · plan · score · meta |
| gemma4:e2b skeleton · gemma4_e2b__skeleton |
6/7 | base 6/7→r1 6/7→r2 6/7→r3 6/7 | 4m59s sum of 4 calls |
17.2k | failed: ts_compiles | play · raw · plan · score · meta |