llm-colosseum
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM (by OpenGenerativeAI)
human-learn
Natural Intelligence is still a pretty good idea. (by koaning)
llm-colosseum | human-learn | |
---|---|---|
4 | 1 | |
942 | 780 | |
74.6% | - | |
9.4 | 5.3 | |
8 days ago | 4 months ago | |
Jupyter Notebook | Jupyter Notebook | |
MIT License | MIT License |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llm-colosseum
Posts with mentions or reviews of llm-colosseum.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2024-03-27.
- LLM Colosseum
- Evaluate LLMs in Real Time with Street Fighter III
-
LLM Colosseum: Make LLMs fight in SFIII
Hello guys,
Tired of current boring LLMs benchmark ? I'm sharing with you a fun project built during the Mistral AI SF hackathon.
Using a RL framework, we made LLMs fight against each other in real time in Street Fighter III. You can find the repo here : https://github.com/OpenGenerativeAI/llm-colosseum.
Aside from the fact that it's very funny to see Mistral and others performing Hadouken, we found that it is a great way to benchmark language models. They need to quickly understand their environment and take actions accordingly.
With >400 fights, check out the ELO ranking on the HF space here : https://huggingface.co/spaces/junior-labs/llm-colosseum
human-learn
Posts with mentions or reviews of human-learn.
We have used some of these posts to build our list of alternatives
and similar projects.