-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
tensorli
Absolute minimalistic implementation of a GPT-like transformer using only numpy (<650 lines).
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
From the readme [0]:
> All models support sequence length up to 8192 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware.
[0] https://github.com/meta-llama/llama3/tree/14aab0428d3ec3a959...
We changed the URL from https://github.com/likejazz/llama3.np to the article it points to, which gives more background.
What is the difference to the llama.np repository credited in the README? https://github.com/hscspring/llama.np
Trainable Llama-like transformer (with backpropagation) in numpy only (~600 lines)
https://github.com/joennlae/tensorli
JAX requires a bit more work to maintain fixed-size buffers as required by XLA, especially in case of caching and rotary embeddings. But yeah, overall the code can be pretty similar [1].
[1]: https://github.com/dfdx/fabrique/blob/main/fabrique/llama/mo...
Sure, knowing the basics of LLM math is necessary. But it's also _enough_ to know this math to fully grasp the code. There are only 4 concepts - attention, feed-forward net, RMS-normalization and rotary embeddings - organized into a clear structure.
Now compare it to the Hugginface implementation [1]. In addition to the aforementioned concepts, you need to understand the hierarchy of `PreTrainedModel`s, 3 types of attention, 3 types of rotary embeddings, HF's definition of attention mask (which is not the same as mask you read about in transformer tutorials), several types of cache class, dozens of flags to control things like output format or serialization, etc.
It's not that Meta's implementation is good and HF's implementation is bad - they pursue different goals in their own optimal way. But if you just want to learn how the model works, Meta's code base is great.
[1]: https://github.com/huggingface/transformers/blob/main/src/tr...