-
Medusa
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads (by FasterDecoding)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Recent releases for exllamav2 brings working fp8 cache support, which I've been very excited to test. This feature doubles the maximum context length you can run with your model, without any visible downsides.
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.