-
exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
Exllama for example uses buffers on each card that reduce the amount of VRAM available for model and context, see here. https://github.com/turboderp/exllama/issues/121
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.
Related posts
-
AIM Weekly 03 June 2024
-
Llama3V is suspected to have been stolen from the MiniCPM-Llama3-v2.5 project
-
Lama3-V project from a Stanford team plagiarized a lot from MiniCPM-Llama3-v2.5
-
[2209.02842] ASR2K: Speech Recognition for Around 2000 Languages without Audio
-
Text-to-Speech with Speaker Diarization