LLM Colosseum: Make LLMs fight in SFIII

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • llm-colosseum

    Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM

  • Hello guys,

    Tired of current boring LLMs benchmark ? I'm sharing with you a fun project built during the Mistral AI SF hackathon.

    Using a RL framework, we made LLMs fight against each other in real time in Street Fighter III. You can find the repo here : https://github.com/OpenGenerativeAI/llm-colosseum.

    Aside from the fact that it's very funny to see Mistral and others performing Hadouken, we found that it is a great way to benchmark language models. They need to quickly understand their environment and take actions accordingly.

    With >400 fights, check out the ELO ranking on the HF space here : https://huggingface.co/spaces/junior-labs/llm-colosseum

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts