There are benchmarks on the huggingface page. The larger model is close to GPT4o performance. Which makes this worse than deepseek-r1. But it is a smaller model and not a reasoning model (doesn’t use up extra tokens to “think”). So still very impressive and important for open source.
There are benchmarks on the huggingface page. The larger model is close to GPT4o performance. Which makes this worse than deepseek-r1. But it is a smaller model and not a reasoning model (doesn’t use up extra tokens to “think”). So still very impressive and important for open source.