lm-hackers
OpenMoE
lm-hackers | OpenMoE | |
---|---|---|
3 | 8 | |
1,669 | 1,225 | |
12.5% | - | |
5.0 | 8.6 | |
4 months ago | 2 months ago | |
Jupyter Notebook | Python | |
Apache License 2.0 | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
lm-hackers
- A hacker's guide to Language Models
- Show HN: Hackers Guide to Language Models
-
A Hackers' Guide to Language Models [video]
This was excellent. Here's the notebook that accompanies the video: https://github.com/fastai/lm-hackers/blob/main/lm-hackers.ip...
I thought the selection of projects was great - some OpenAI API hacking including a Code Interpreter imitation created using OpenAI functions, then some Hugging Face model local LLM execution, and then a fine-tuning example to build a text-to-SQL model somehow crammed into just 10 minutes at the end!
OpenMoE
-
Mixtral: A Promising Model with Unforeseen Challenges
Switch Transformer aside, it's not very open and OpenMoE is months old.
- will the point meet in 2024?
-
Ask HN: Why is GPT4 better than the other major LLMs?
What about this one? https://github.com/XueFuzhao/OpenMoE
-
Partial Outage Across ChatGPT and API
https://github.com/XueFuzhao/OpenMoE
Check out this open source Mixture of Experts research. Could help a lot with performance of open source models.
-
OpenAI is too cheap to beat
I think the weird thing about this is that it's completely true right now but in X months it may be totally outdated advice.
For example, efforts like OpenMOE https://github.com/XueFuzhao/OpenMoE or similar will probably eventually lead to very competitive performance and cost-effectiveness for open source models. At least in terms of competing with GPT-3.5 for many applications.
Also see https://laion.ai/
I also believe that within say 1-3 years there will be a different type of training approach that does not require such large datasets or manual human feedback.
-
Mixtures of Experts
Google have released the models and code for the Switch Transformer from Fedus et al. (2021) under the Apache 2.0 licence. [0]
There's also OpenMoE - an open-source effort to train a mixture of experts model. Currently they've released a model with 8 billion parameters. [1]
[0] https://github.com/google-research/t5x/blob/main/docs/models...
[1] https://github.com/XueFuzhao/OpenMoE
- A Hackers' Guide to Language Models [video]
- OpenMoE – A family of open-sourced Mixture-of-Experts (MoE) LLMs
What are some alternatives?
st-moe-pytorch - Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
t5x
Laminoid - An ML instance manager