Llama3 Implemented from Scratch

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • llama3-from-scratch

    llama3 implementation one matrix multiplication at a time

  • https://github.com/naklecha/llama3-from-scratch/blob/main/im...

    There is also a major difference between choosing a cute logo and covering the actual content during a presentation.

  • llama2.c

    Inference Llama 2 in one file of pure C

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Docker Swarm

    Source repo for Docker's Documentation (by docker)

  • Docker Compose

    Define and run multi-container applications with Docker

  • https://github.com/docker/compose

    This seems to really just be "old0man-yelling-at-clouds-syndrome"

    I for one welcome anime girls in readmes and hope to see more of it in the future if only because it seems to bother some of the old hoagies in the world for some reason.

  • mikupad

    LLM Frontend in a single html file

  • > creativity - one of the very few applications generative AI can truly excel at - is currently impossible. it could revolutionize entertainment, but it isn't allowed to. the models are only allowed to produce inoffensive, positivity-biased, sterile slop that no human being finds attractive.

    Have you played around with base models? If you haven't yet, I highly recommend trying a base model like davinci-002[1] in OpenAI's "legacy" Completions API playground. That's probably the most accessible, but if you're technically inclined, you can pair a base model like Llama3-70B[2] with an interface like Mikupad[3] and do some brilliant creative writing. Llama3 models can be run locally with something like Ollama[4] or if you don't have the compute for it, via an LLM-as-a-service platform like OpenRouter[5].

    I'm sure you'll be delighted to find that base models are thoroughly unslopped and uncensored.

    [1] https://platform.openai.com/docs/models/gpt-base

    [2] https://huggingface.co/meta-llama/Meta-Llama-3-70B

    [3] https://github.com/lmg-anon/mikupad

    [4] https://ollama.com/library/llama3:70b-text

    [5] https://openrouter.ai/models/meta-llama/llama-3-70b

  • llama-from-scratch

    Llama from scratch, or How to implement a paper without crying

  • I recommend reading https://github.com/bkitano/llama-from-scratch over the article op linked.

    It actually teaches you how to build llama iteratively, test, debug and interpret the training loss rather than just desribing the code.

  • Whisper

    High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model (by Const-me)

  • > you could probably implement training and inference for a single model architecture, from scratch, on a single kind of GPU, with reasonable performance… with a year or so

    I have implemented inference of Whisper https://github.com/Const-me/Whisper and Mistral https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral... models on all GPUs which support Direct3D 11.0 API. The performance is IMO very reasonable.

    A year might be required when the only input is the research articles. In practice, we also have reference Python implementations of these models. Possible to test different functions or compute shaders against the corresponding pieces from the reference implementations, by comparing saved output tensors between the reference and the newly built implementation. Due to that simple trick, I think I have spent less than 1 month part-time for each of these two projects.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • Cgml

    GPU-targeted vendor-agnostic AI library for Windows, and Mistral model implementation.

  • > you could probably implement training and inference for a single model architecture, from scratch, on a single kind of GPU, with reasonable performance… with a year or so

    I have implemented inference of Whisper https://github.com/Const-me/Whisper and Mistral https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral... models on all GPUs which support Direct3D 11.0 API. The performance is IMO very reasonable.

    A year might be required when the only input is the research articles. In practice, we also have reference Python implementations of these models. Possible to test different functions or compute shaders against the corresponding pieces from the reference implementations, by comparing saved output tensors between the reference and the newly built implementation. Due to that simple trick, I think I have spent less than 1 month part-time for each of these two projects.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • FLaNK-AIM Weekly 13 May 2024

    34 projects | dev.to | 13 May 2024
  • MongoDB on Your Local Machine Using Docker: A Step-by-Step Guide

    2 projects | dev.to | 1 Jan 2024
  • Docker - Setup a local JS and Python Development environment

    3 projects | dev.to | 2 Dec 2023
  • New computer? Install THIS first... 💻

    2 projects | dev.to | 23 Nov 2023
  • FLaNK Stack Weekly 18 September 2023

    11 projects | dev.to | 18 Sep 2023