llm-colosseum
SKAB
llm-colosseum | SKAB | |
---|---|---|
4 | 9 | |
942 | 295 | |
74.6% | - | |
9.4 | 4.8 | |
9 days ago | 8 months ago | |
Jupyter Notebook | Jupyter Notebook | |
MIT License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llm-colosseum
- LLM Colosseum
- Evaluate LLMs in Real Time with Street Fighter III
-
LLM Colosseum: Make LLMs fight in SFIII
Hello guys,
Tired of current boring LLMs benchmark ? I'm sharing with you a fun project built during the Mistral AI SF hackathon.
Using a RL framework, we made LLMs fight against each other in real time in Street Fighter III. You can find the repo here : https://github.com/OpenGenerativeAI/llm-colosseum.
Aside from the fact that it's very funny to see Mistral and others performing Hadouken, we found that it is a great way to benchmark language models. They need to quickly understand their environment and take actions accordingly.
With >400 fights, check out the ELO ranking on the HF space here : https://huggingface.co/spaces/junior-labs/llm-colosseum
SKAB
What are some alternatives?
Tegridy-MIDI-Dataset - Tegridy MIDI Dataset for precise and effective Music AI models creation.
raccoon_dataset - The dataset is used to train my own raccoon detector and I blogged about it on Medium
fma - FMA: A Dataset For Music Analysis
indonlu - The first-ever vast natural language processing benchmark for Indonesian Language. We provide multiple downstream tasks, pre-trained IndoBERT models, and a starter code! (AACL-IJCNLP 2020)
COVID-CT - COVID-CT-Dataset: A CT Scan Dataset about COVID-19
medmcqa - A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address realworld medical entrance exam questions.
openfema-samples - Code, dataset, and analysis samples that utilize the OpenFEMA API.