Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • llama.cpp

    LLM inference in C/C++

  • The speedup would not be that high in practice for folks already using speculative sampling[1]. ANPD appears to be similar but uses a simpler, faster, and less accurate drafting approach. These two enhancements can't be meaningfully stacked.

    [1] https://github.com/ggerganov/llama.cpp/pull/2926

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

  • The HuggingFace transformers library already has support for a similar method called prompt lookup decoding that uses the existing context to generate an ngram model: https://github.com/huggingface/transformers/issues/27722

    I don't think it would be that hard to switch it out for a pretrained ngram model.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Llama3.np: pure NumPy implementation of Llama3

    10 projects | news.ycombinator.com | 16 May 2024
  • How to count tokens in frontend for Popular LLM Models: GPT, Claude, and Llama

    2 projects | dev.to | 21 May 2024
  • Reading list to join AI field from Hugging Face cofounder

    1 project | news.ycombinator.com | 18 May 2024
  • XLSTM: Extended Long Short-Term Memory

    2 projects | news.ycombinator.com | 8 May 2024
  • AI enthusiasm #6 - Finetune any LLM you want💡

    2 projects | dev.to | 16 Apr 2024