Snake LLM Bench — Round 3 — skeleton fill-in

10 model groups · sorted by final score, then retries used, then base-call speed
ModelFinal scoreTrajectory (per try)Total timeTotal tokensNotesFiles
anthropic/claude-haiku-4-5
skeleton · anthropic_claude-haiku-4-5__skeleton
13/13 first-try 13/13 13.2s 2.9k play · raw · plan · score · meta
google/gemini-2.5-flash
skeleton · google_gemini-2.5-flash__skeleton
13/13 first-try 13/13 16.2s 3.7k play · raw · plan · score · meta
google/gemma-4-26b-a4b-it
skeleton · google_gemma-4-26b-a4b-it__skeleton
13/13 first-try 13/13 52.3s 3.0k play · raw · plan · score · meta
x-ai/grok-code-fast-1
skeleton · x-ai_grok-code-fast-1__skeleton
13/13 first-try 13/13 1m01s 4.9k play · raw · plan · score · meta
openai/gpt-5-mini
skeleton · openai_gpt-5-mini__skeleton
13/13 first-try 13/13 1m04s 5.4k play · raw · plan · score · meta
meta-llama/llama-4-scout
skeleton · meta-llama_llama-4-scout__skeleton
13/13 base 0/2r1 13/13 17.8s
sum of 2 calls
5.0k play · raw · plan · score · meta
x-ai/grok-4.3
skeleton · x-ai_grok-4.3__skeleton
13/13 base 0/2r1 13/13 1m33s
sum of 2 calls
5.6k play · raw · plan · score · meta
qwen3:8b
skeleton · qwen3_8b__skeleton
13/13 base 0/2r1 13/13 13m32s
sum of 2 calls
22.4k play · raw · plan · score · meta
qwen2.5-coder:7b
skeleton · qwen2.5-coder_7b__skeleton
6/7 base 6/7r1 6/7r2 6/7r3 6/7 3m58s
sum of 4 calls
8.4k failed: ts_compiles play · raw · plan · score · meta
gemma4:e2b
skeleton · gemma4_e2b__skeleton
6/7 base 6/7r1 6/7r2 6/7r3 6/7 4m59s
sum of 4 calls
17.2k failed: ts_compiles play · raw · plan · score · meta