๐ง
LEADERBOARD
Language Models
Large language models for text generation, reasoning, and analysis
15tools ranked
Language Models Rankings
Ranked by overall ToolRoute Score across all benchmark dimensions
| Rank | Tool Name | ToolRoute Score | Output | Reliability | Efficiency | Cost | Trust | Stars |
|---|---|---|---|---|---|---|---|---|
| ๐ฅ | GPT-4oOfficial | 92.0 | 94.0 | 90.0 | 85.0 | 45.0 | 95.0 | 52,000 |
| ๐ฅ | Claude 3.5 SonnetOfficial | 91.0 | 93.0 | 92.0 | 88.0 | 50.0 | 96.0 | 28,000 |
| ๐ฅ | Claude 3 OpusOfficial | 90.0 | 92.0 | 90.0 | 75.0 | 35.0 | 96.0 | 28,000 |
| #4 | OpenAI MCPOfficial | 88.0 | 90.0 | 86.0 | 82.0 | 45.0 | 92.0 | 8,500 |
| #5 | Gemini ProOfficial | 88.0 | 88.0 | 86.0 | 90.0 | 55.0 | 92.0 | 18,000 |
| #6 | Anthropic MCPOfficial | 87.0 | 90.0 | 88.0 | 80.0 | 50.0 | 93.0 | 6,200 |
| #7 | Mistral LargeOfficial | 86.0 | 86.0 | 84.0 | 88.0 | 60.0 | 88.0 | 12,000 |
| #8 | DeepSeek V3 | 85.0 | 87.0 | 82.0 | 90.0 | 90.0 | 78.0 | 45,000 |
| #9 | Llama 3 | 84.0 | 85.0 | 82.0 | 92.0 | 95.0 | 80.0 | 65,000 |
| #10 | Command R+Official | 83.0 | 82.0 | 84.0 | 80.0 | 65.0 | 86.0 | 8,000 |
| #11 | Qwen 2.5 | 82.0 | 84.0 | 80.0 | 88.0 | 92.0 | 76.0 | 32,000 |
| #12 | Gemini FlashOfficial | 82.0 | 80.0 | 84.0 | 95.0 | 80.0 | 90.0 | 18,000 |
| #13 | Grok-2Official | 80.0 | 82.0 | 78.0 | 84.0 | 55.0 | 82.0 | 9,000 |
| #14 | Yi-Large | 79.0 | 80.0 | 78.0 | 86.0 | 88.0 | 74.0 | 7,000 |
| #15 | Phi-4Official | 78.0 | 76.0 | 80.0 | 94.0 | 95.0 | 85.0 | 15,000 |
Score Guide
9.0+ Exceptional
8.0+ Excellent
7.0+ Good
6.0+ Fair
<6.0 Below Average
Contribute Benchmark Data
Help improve these rankings by submitting real-world telemetry. Contributors earn routing credits for every data point.