Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Python language-model Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
-
haystack
:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
-
LMFlow
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
-
txtai
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
-
gpt-neox
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: How to count tokens in frontend for Popular LLM Models: GPT, Claude, and Llama | dev.to | 2024-05-21Thanks to transformers.js, we can run the tokenizer and model locally in the browser. Transformers.js is designed to be functionally equivalent to Hugging Face's transformers python library, meaning you can run the same pretrained models using a very similar API.
Project mention: gpt4-openai-api VS gpt4free - a user suggested alternative | libhunt.com/r/gpt4-openai-api | 2024-01-04I cant install
For open assistant, the code: https://github.com/LAION-AI/Open-Assistant/tree/main/inference
Alpaca is an instruction-oriented LLM derived from LLaMA, enhanced by Stanford researchers with a dataset of 52,000 examples of following instructions, sourced from OpenAI’s InstructGPT through the self-instruct method. The extensive self-instruct dataset, details of data generation, and the model refinement code were publicly disclosed. This model complies with the licensing requirements of its base model. Due to the utilization of InstructGPT for data generation, it also adheres to OpenAI’s usage terms, which prohibit the creation of models competing with OpenAI. This illustrates how dataset restrictions can indirectly affect the resulting fine-tuned model.
Project mention: Haystack DB – 10x faster than FAISS with binary embeddings by default | news.ycombinator.com | 2024-04-28I was confused for a bit but there is no relation to https://haystack.deepset.ai/
https://github.com/BlinkDL/RWKV-LM#rwkv-discord-httpsdiscord... lists a number of implementations of various versions of RWKV.
https://github.com/BlinkDL/RWKV-LM#rwkv-parallelizable-rnn-w... :
> RWKV: Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V)
> RWKV is an RNN with Transformer-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable). And it's 100% attention-free. You only need the hidden state at position t to compute the state at position t+1. You can use the "GPT" mode to quickly compute the hidden state for the "RNN" mode.
> So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding (using the final hidden state).
> "Our latest version is RWKV-6,*
Project mention: DECT NR+: A technical dive into non-cellular 5G | news.ycombinator.com | 2024-04-02This seems to be an order of magnitude better than LoRa (https://lora-alliance.org/ not https://arxiv.org/abs/2106.09685). LoRa doesn't have all the features this one does like OFDM, TDM, FDM, and HARQ. I didn't know there's spectrum dedicated for DECT use.
Project mention: SpeechBrain 1.0: A free and open-source AI toolkit for all things speech | news.ycombinator.com | 2024-02-28
Project mention: Show HN: FileKitty – Combine and label text files for LLM prompt contexts | news.ycombinator.com | 2024-05-01
The easiest is to use vllm (https://github.com/vllm-project/vllm) to run it on a Couple of A100's, and you can benchmark this using this library (https://github.com/EleutherAI/lm-evaluation-harness)
CogVLM is very good in my (brief) testing: https://github.com/THUDM/CogVLM
The model weights seem to be under a non-commercial license, not true open source, but it is "open access" as you requested.
Project mention: New OS Python Framework "Agents" Introduced for Autonomous Language Agents | /r/deeplearning | 2023-09-21(arXiv) (github)
Project mention: Are there any multimodal AI models I can use to provide a paired text *and* image input, to then generate an expanded descriptive text output? [D] | /r/MachineLearning | 2023-07-05Maybe the recent OpenFlamingo gives you better results (they have a demo on HF).
Project mention: Show HN: Fructose, LLM calls as strongly typed functions | news.ycombinator.com | 2024-03-06
Python language-model related posts
-
Reading list to join AI field from Hugging Face cofounder
-
Show HN: Ellipsis – Automated PR reviews and bug fixes
-
XLSTM: Extended Long Short-Term Memory
-
CatLIP: Clip Vision Accuracy with 2.7x Faster Pre-Training on Web-Scale Data
-
Multimodal Embeddings for JavaScript, Swift, and Python
-
Mistral AI Launches New 8x22B Moe Model
-
Schedule-Free Learning – A New Way to Train
-
A note from our sponsor - InfluxDB
www.influxdata.com | 31 May 2024
Index
What are some of the best open-source language-model projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 126,915 |
2 | gpt4free | 58,313 |
3 | Open-Assistant | 36,738 |
4 | stanford_alpaca | 28,967 |
5 | LLaMA-Factory | 22,989 |
6 | mlc-llm | 17,302 |
7 | haystack | 14,100 |
8 | RWKV-LM | 11,798 |
9 | ChatRWKV | 9,314 |
10 | LoRA | 9,407 |
11 | LMFlow | 8,062 |
12 | speechbrain | 8,013 |
13 | txtai | 7,158 |
14 | gpt-neox | 6,635 |
15 | OpenNMT-py | 6,619 |
16 | BERT-pytorch | 6,039 |
17 | lm-evaluation-harness | 5,359 |
18 | CogVLM | 5,355 |
19 | agents | 4,602 |
20 | self-instruct | 3,666 |
21 | OpenAgents | 3,635 |
22 | open_flamingo | 3,511 |
23 | lmql | 3,392 |
Sponsored