Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

llama.cpp

788 59,389 10.0 C++

LLM inference in C/C++

The speedup would not be that high in practice for folks already using speculative sampling[1]. ANPD appears to be similar but uses a simpler, faster, and less accurate drafting approach. These two enhancements can't be meaningfully stacked.
[1] https://github.com/ggerganov/llama.cpp/pull/2926

transformers

181 126,915 10.0 Python

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

The HuggingFace transformers library already has support for a similar method called prompt lookup decoding that uses the existing context to generate an ngram model: https://github.com/huggingface/transformers/issues/27722
I don't think it would be that hard to switch it out for a pretrained ngram model.

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Llama3.np: pure NumPy implementation of Llama3

10 projects | news.ycombinator.com | 16 May 2024
How to count tokens in frontend for Popular LLM Models: GPT, Claude, and Llama

2 projects | dev.to | 21 May 2024
Reading list to join AI field from Hugging Face cofounder

1 project | news.ycombinator.com | 18 May 2024
XLSTM: Extended Long Short-Term Memory

2 projects | news.ycombinator.com | 8 May 2024
AI enthusiasm #6 - Finetune any LLM you want💡

2 projects | dev.to | 16 Apr 2024

Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
NLP llama Natural Language Processing llm Pytorch
Post date: 21 Apr 2024

llama.cpp

transformers

Scout Monitoring

Related posts

Llama3.np: pure NumPy implementation of Llama3

How to count tokens in frontend for Popular LLM Models: GPT, Claude, and Llama

Reading list to join AI field from Hugging Face cofounder

XLSTM: Extended Long Short-Term Memory

AI enthusiasm #6 - Finetune any LLM you want💡

Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com NLP llama Natural Language Processing llm Pytorch Post date: 21 Apr 2024

llama.cpp

transformers

Scout Monitoring

Related posts

Llama3.np: pure NumPy implementation of Llama3

How to count tokens in frontend for Popular LLM Models: GPT, Claude, and Llama

Reading list to join AI field from Hugging Face cofounder

XLSTM: Extended Long Short-Term Memory

AI enthusiasm #6 - Finetune any LLM you want💡

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
NLP llama Natural Language Processing llm Pytorch
Post date: 21 Apr 2024