-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Whisper
High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model (by Const-me)
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
https://github.com/naklecha/llama3-from-scratch/blob/main/im...
There is also a major difference between choosing a cute logo and covering the actual content during a presentation.
https://github.com/docker/compose
This seems to really just be "old0man-yelling-at-clouds-syndrome"
I for one welcome anime girls in readmes and hope to see more of it in the future if only because it seems to bother some of the old hoagies in the world for some reason.
> creativity - one of the very few applications generative AI can truly excel at - is currently impossible. it could revolutionize entertainment, but it isn't allowed to. the models are only allowed to produce inoffensive, positivity-biased, sterile slop that no human being finds attractive.
Have you played around with base models? If you haven't yet, I highly recommend trying a base model like davinci-002[1] in OpenAI's "legacy" Completions API playground. That's probably the most accessible, but if you're technically inclined, you can pair a base model like Llama3-70B[2] with an interface like Mikupad[3] and do some brilliant creative writing. Llama3 models can be run locally with something like Ollama[4] or if you don't have the compute for it, via an LLM-as-a-service platform like OpenRouter[5].
I'm sure you'll be delighted to find that base models are thoroughly unslopped and uncensored.
[1] https://platform.openai.com/docs/models/gpt-base
[2] https://huggingface.co/meta-llama/Meta-Llama-3-70B
[3] https://github.com/lmg-anon/mikupad
[4] https://ollama.com/library/llama3:70b-text
[5] https://openrouter.ai/models/meta-llama/llama-3-70b
I recommend reading https://github.com/bkitano/llama-from-scratch over the article op linked.
It actually teaches you how to build llama iteratively, test, debug and interpret the training loss rather than just desribing the code.
> you could probably implement training and inference for a single model architecture, from scratch, on a single kind of GPU, with reasonable performance… with a year or so
I have implemented inference of Whisper https://github.com/Const-me/Whisper and Mistral https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral... models on all GPUs which support Direct3D 11.0 API. The performance is IMO very reasonable.
A year might be required when the only input is the research articles. In practice, we also have reference Python implementations of these models. Possible to test different functions or compute shaders against the corresponding pieces from the reference implementations, by comparing saved output tensors between the reference and the newly built implementation. Due to that simple trick, I think I have spent less than 1 month part-time for each of these two projects.
> you could probably implement training and inference for a single model architecture, from scratch, on a single kind of GPU, with reasonable performance… with a year or so
I have implemented inference of Whisper https://github.com/Const-me/Whisper and Mistral https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral... models on all GPUs which support Direct3D 11.0 API. The performance is IMO very reasonable.
A year might be required when the only input is the research articles. In practice, we also have reference Python implementations of these models. Possible to test different functions or compute shaders against the corresponding pieces from the reference implementations, by comparing saved output tensors between the reference and the newly built implementation. Due to that simple trick, I think I have spent less than 1 month part-time for each of these two projects.