llm-colosseum
Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM (by OpenGenerativeAI)
enterprise-h2ogpte
Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform (by h2oai)
llm-colosseum | enterprise-h2ogpte | |
---|---|---|
4 | 1 | |
942 | 66 | |
74.6% | - | |
9.4 | 7.8 | |
8 days ago | 6 days ago | |
Jupyter Notebook | Python | |
MIT License | Apache License 2.0 |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llm-colosseum
Posts with mentions or reviews of llm-colosseum.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2024-03-27.
- LLM Colosseum
- Evaluate LLMs in Real Time with Street Fighter III
-
LLM Colosseum: Make LLMs fight in SFIII
Hello guys,
Tired of current boring LLMs benchmark ? I'm sharing with you a fun project built during the Mistral AI SF hackathon.
Using a RL framework, we made LLMs fight against each other in real time in Street Fighter III. You can find the repo here : https://github.com/OpenGenerativeAI/llm-colosseum.
Aside from the fact that it's very funny to see Mistral and others performing Hadouken, we found that it is a great way to benchmark language models. They need to quickly understand their environment and take actions accordingly.
With >400 fights, check out the ELO ranking on the HF space here : https://huggingface.co/spaces/junior-labs/llm-colosseum
enterprise-h2ogpte
Posts with mentions or reviews of enterprise-h2ogpte.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2024-04-01.