實際上,這是一個相當聰明且高數據的幻覺基準,~與我對相對能力的直覺相匹配
OpenRouter
OpenRouter8月15日 00:29
After one week, GPT-5 has topped our proprietary model charts for tool calling accuracy🥇 In second is Claude 4.1 Opus, at 99.5% Details 👇
24K