Llama3.np: pure NumPy implementation of Llama3

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

llama3

20 21,306 9.0 Python

The official Meta Llama 3 GitHub site

From the readme [0]:
> All models support sequence length up to 8192 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware.
[0] https://github.com/meta-llama/llama3/tree/14aab0428d3ec3a959...

llama3.np

2 859 7.5 Python

llama3.np is a pure NumPy implementation for Llama 3 model.

We changed the URL from https://github.com/likejazz/llama3.np to the article it points to, which gives more background.

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
llama.np

1 14 3.8 Python

Inference Llama/Llama2 Modes in NumPy

What is the difference to the llama.np repository credited in the README? https://github.com/hscspring/llama.np

cria

4 77 2.5 Python

Tiny inference-only implementation of LLaMA (by recmo)
tensorli

4 245 8.4 Python

Absolute minimalistic implementation of a GPT-like transformer using only numpy (<650 lines).

Trainable Llama-like transformer (with backpropagation) in numpy only (~600 lines)
https://github.com/joennlae/tensorli

fabrique

1 1 6.4 Python

Research-friendly implementations of LLMs in JAX

JAX requires a bit more work to maintain fixed-size buffers as required by XLA, especially in case of caching and rotary embeddings. But yeah, overall the code can be pretty similar [1].
[1]: https://github.com/dfdx/fabrique/blob/main/fabrique/llama/mo...

transformers

181 126,915 10.0 Python

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Sure, knowing the basics of LLM math is necessary. But it's also _enough_ to know this math to fully grasp the code. There are only 4 concepts - attention, feed-forward net, RMS-normalization and rotary embeddings - organized into a clear structure.
Now compare it to the Hugginface implementation [1]. In addition to the aforementioned concepts, you need to understand the hierarchy of `PreTrainedModel`s, 3 types of attention, 3 types of rotary embeddings, HF's definition of attention mask (which is not the same as mask you read about in transformer tutorials), several types of cache class, dozens of flags to control things like output format or serialization, etc.
It's not that Meta's implementation is good and HF's implementation is bad - they pursue different goals in their own optimal way. But if you just want to learn how the model works, Meta's code base is great.
[1]: https://github.com/huggingface/transformers/blob/main/src/tr...

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

3 projects | news.ycombinator.com | 21 Apr 2024
How to count tokens in frontend for Popular LLM Models: GPT, Claude, and Llama

2 projects | dev.to | 21 May 2024
Reading list to join AI field from Hugging Face cofounder

1 project | news.ycombinator.com | 18 May 2024
XLSTM: Extended Long Short-Term Memory

2 projects | news.ycombinator.com | 8 May 2024
AI enthusiasm #6 - Finetune any LLM you want💡

2 projects | dev.to | 16 Apr 2024

Llama3.np: pure NumPy implementation of Llama3

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
NLP llama Natural Language Processing llama2 Pytorch
Post date: 16 May 2024

llama3

llama3.np

Scout Monitoring

llama.np

cria

tensorli

fabrique

transformers

InfluxDB

Related posts

Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

How to count tokens in frontend for Popular LLM Models: GPT, Claude, and Llama

Reading list to join AI field from Hugging Face cofounder

XLSTM: Extended Long Short-Term Memory

AI enthusiasm #6 - Finetune any LLM you want💡

Llama3.np: pure NumPy implementation of Llama3

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com NLP llama Natural Language Processing llama2 Pytorch Post date: 16 May 2024

Related posts

Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

How to count tokens in frontend for Popular LLM Models: GPT, Claude, and Llama

Reading list to join AI field from Hugging Face cofounder

XLSTM: Extended Long Short-Term Memory

AI enthusiasm #6 - Finetune any LLM you want💡

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
NLP llama Natural Language Processing llama2 Pytorch
Post date: 16 May 2024