Tested: ExllamaV2's max context on 24gb with 70B low-bpw & speculative sampling performance

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

exllamav2

17 3,065 9.9 Python

A fast inference library for running LLMs locally on modern consumer-class GPUs

Recent releases for exllamav2 brings working fp8 cache support, which I've been very excited to test. This feature doubles the maximum context length you can run with your model, without any visible downsides.

Medusa

4 1,925 7.7 Jupyter Notebook

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads (by FasterDecoding)
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Open Sustainable Technology

1 project | news.ycombinator.com | 30 May 2024
Explaining in Style: Training a GAN to Explain a Classifier in StyleSpace

1 project | news.ycombinator.com | 30 May 2024
Benchmarking foundation models for time series

1 project | news.ycombinator.com | 29 May 2024
LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power

1 project | dev.to | 29 May 2024
Ipyblender: Blender Engine in a Ipython Notebook

1 project | news.ycombinator.com | 29 May 2024

Tested: ExllamaV2's max context on 24gb with 70B low-bpw & speculative sampling performance

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA Post date: 2 Nov 2023

exllamav2

Medusa

InfluxDB

Related posts

Open Sustainable Technology

Explaining in Style: Training a GAN to Explain a Classifier in StyleSpace

Benchmarking foundation models for time series

LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power

Ipyblender: Blender Engine in a Ipython Notebook

Tested: ExllamaV2's max context on 24gb with 70B low-bpw &amp; speculative sampling performance

This page summarizes the projects mentioned and recommended in the original post on /r/LocalLLaMA Post date: 2 Nov 2023

exllamav2

Medusa

InfluxDB

Related posts

Open Sustainable Technology

Explaining in Style: Training a GAN to Explain a Classifier in StyleSpace

Benchmarking foundation models for time series

LLM Fine-tunig on RTX 4090: 90% Performance at 55% Power

Ipyblender: Blender Engine in a Ipython Notebook

Tested: ExllamaV2's max context on 24gb with 70B low-bpw & speculative sampling performance