Llama3.np: pure NumPy implementation of Llama3

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • llama3

    The official Meta Llama 3 GitHub site

  • From the readme [0]:

    > All models support sequence length up to 8192 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware.

    [0] https://github.com/meta-llama/llama3/tree/14aab0428d3ec3a959...

  • llama3.np

    llama3.np is a pure NumPy implementation for Llama 3 model.

  • We changed the URL from https://github.com/likejazz/llama3.np to the article it points to, which gives more background.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • llama.np

    Inference Llama/Llama2 Modes in NumPy

  • What is the difference to the llama.np repository credited in the README? https://github.com/hscspring/llama.np

  • cria

    Tiny inference-only implementation of LLaMA (by recmo)

  • tensorli

    Absolute minimalistic implementation of a GPT-like transformer using only numpy (<650 lines).

  • Trainable Llama-like transformer (with backpropagation) in numpy only (~600 lines)

    https://github.com/joennlae/tensorli

  • fabrique

    Research-friendly implementations of LLMs in JAX

  • JAX requires a bit more work to maintain fixed-size buffers as required by XLA, especially in case of caching and rotary embeddings. But yeah, overall the code can be pretty similar [1].

    [1]: https://github.com/dfdx/fabrique/blob/main/fabrique/llama/mo...

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

  • Sure, knowing the basics of LLM math is necessary. But it's also _enough_ to know this math to fully grasp the code. There are only 4 concepts - attention, feed-forward net, RMS-normalization and rotary embeddings - organized into a clear structure.

    Now compare it to the Hugginface implementation [1]. In addition to the aforementioned concepts, you need to understand the hierarchy of `PreTrainedModel`s, 3 types of attention, 3 types of rotary embeddings, HF's definition of attention mask (which is not the same as mask you read about in transformer tutorials), several types of cache class, dozens of flags to control things like output format or serialization, etc.

    It's not that Meta's implementation is good and HF's implementation is bad - they pursue different goals in their own optimal way. But if you just want to learn how the model works, Meta's code base is great.

    [1]: https://github.com/huggingface/transformers/blob/main/src/tr...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

    3 projects | news.ycombinator.com | 21 Apr 2024
  • How to count tokens in frontend for Popular LLM Models: GPT, Claude, and Llama

    2 projects | dev.to | 21 May 2024
  • Reading list to join AI field from Hugging Face cofounder

    1 project | news.ycombinator.com | 18 May 2024
  • XLSTM: Extended Long Short-Term Memory

    2 projects | news.ycombinator.com | 8 May 2024
  • AI enthusiasm #6 - Finetune any LLM you want💡

    2 projects | dev.to | 16 Apr 2024