-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
It seems Sepp Hochreiter has talked already about this model since Oct 2023: https://github.com/huggingface/transformers/issues/27011
In the scaling law comparison, I wonder if it is reasonable to compare number of parameters between Llama, Mamba, RWKV, xLSTM? Isn't compute time more relevant? E.g. in the figure about scaling laws, replace num of params by compute time.
It seems neither the code nor the model params are released. I wonder if that will follow.
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.