llm-colosseum
instinct.cpp
llm-colosseum | instinct.cpp | |
---|---|---|
4 | 3 | |
942 | 10 | |
74.6% | - | |
9.4 | 9.5 | |
9 days ago | 2 days ago | |
Jupyter Notebook | Jupyter Notebook | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llm-colosseum
- LLM Colosseum
- Evaluate LLMs in Real Time with Street Fighter III
-
LLM Colosseum: Make LLMs fight in SFIII
Hello guys,
Tired of current boring LLMs benchmark ? I'm sharing with you a fun project built during the Mistral AI SF hackathon.
Using a RL framework, we made LLMs fight against each other in real time in Street Fighter III. You can find the repo here : https://github.com/OpenGenerativeAI/llm-colosseum.
Aside from the fact that it's very funny to see Mistral and others performing Hadouken, we found that it is a great way to benchmark language models. They need to quickly understand their environment and take actions accordingly.
With >400 fights, check out the ELO ranking on the HF space here : https://huggingface.co/spaces/junior-labs/llm-colosseum
instinct.cpp
What are some alternatives?
parsee-datasets - Datasets, case studies and benchmarks for extracting structured information from PDFs, HTML files or images, created by the Parsee.ai team. Datasets also on Hugging Face: https://huggingface.co/parsee-ai