Tested: ExllamaV2's max context on 24gb with 70B low-bpw & speculative sampling performance

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • exllamav2

    A fast inference library for running LLMs locally on modern consumer-class GPUs

  • Recent releases for exllamav2 brings working fp8 cache support, which I've been very excited to test. This feature doubles the maximum context length you can run with your model, without any visible downsides.

  • Medusa

    Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads (by FasterDecoding)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Open Sustainable Technology

    1 project | news.ycombinator.com | 30 May 2024
  • Explaining in Style: Training a GAN to Explain a Classifier in StyleSpace

    1 project | news.ycombinator.com | 30 May 2024
  • Benchmarking foundation models for time series

    1 project | news.ycombinator.com | 29 May 2024
  • LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power

    1 project | dev.to | 29 May 2024
  • Ipyblender: Blender Engine in a Ipython Notebook

    1 project | news.ycombinator.com | 29 May 2024