SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 NLP Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
HanLP
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
-
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
-
500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code
500 AI Machine learning Deep learning Computer vision NLP Projects with code
-
Awesome-pytorch-list
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
-
haystack
:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
-
DeepLearningExamples
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
-
PaddleHub
Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)
-
FinGPT
FinGPT: Open-Source Financial Large Language Models! Revolutionize 🔥 We release the trained model on HuggingFace.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Fascinating work, very promising.
Can you summarise how the model in your paper differs from this one ?
https://github.com/huggingface/transformers/issues/27011
In 2019, a new language representation called BERT (Bedirectional Encoder Representation from Transformers) was introduced. The main idea behind this paradigm is to first pre-train a language model using a massive amount of unlabeled data then fine-tune all the parameters using labeled data from the downstream tasks. This allows the model to generalize well to different NLP tasks. Moreover, it has been shown that this language representation model can be used to solve downstream tasks without being explicitly trained on, e.g classify a text without training phase.
🔗 https://github.com/microsoft/AI-For-Beginners 🔗 https://microsoft.github.io/AI-For-Beginners/
Project mention: How I discovered Named Entity Recognition while trying to remove gibberish from a string. | dev.to | 2024-05-06
Project mention: 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑💻 🥇 | dev.to | 2023-10-19
Project mention: The Era of 1-Bit LLMs: Training_Tips, Code And_FAQ [pdf] | news.ycombinator.com | 2024-03-21
Project mention: 🔥🚀 Top 10 Open-Source Must-Have Tools for Crafting Your Own Chatbot 🤖💬 | dev.to | 2023-11-06Support Rasa on GitHub ⭐
I'd like to share with you today the Chinese-Alpaca-Plus-13B-GPTQ model, which is the GPTQ format quantised 4bit models of Yiming Cui's Chinese-LLaMA-Alpaca 13B for GPU reference.
Project mention: Haystack DB – 10x faster than FAISS with binary embeddings by default | news.ycombinator.com | 2024-04-28I was confused for a bit but there is no relation to https://haystack.deepset.ai/
alternatively, could we not simply split by common characters such as newlines and periods, to split it within sentences? it would be fragile with special handling required for numbers with decimal points and probably various other edge cases, though.
there are also Python libraries meant for natural language parsing[0] that could do that task for us. I even see examples on stack overflow[1] that simply split text into sentences.
[0]: https://www.nltk.org/
Project mention: GPT-4, without specialized training, beat a GPT-3.5 class model that cost $10B | news.ycombinator.com | 2024-03-24There is also the open source FinGPT, that is claimed to beat GPT4 in some benchmarks at a fine tuning cost of $17.25.
https://github.com/AI4Finance-Foundation/FinGPT
NLP related posts
-
XLSTM: Extended Long Short-Term Memory
-
Zero Shot Text Classification Under the hood
-
LangFun: Object oriented data programs using LLMs
-
AI enthusiasm #9 - A multilingual chatbot📣🈸
-
Quick tip: Using R, OpenAI and SingleStore Notebooks
-
Haystack DB – 10x faster than FAISS with binary embeddings by default
-
Rust Keyword Extraction: Creating the YAKE! algorithm from scratch
-
A note from our sponsor - SaaSHub
www.saashub.com | 12 May 2024
Index
What are some of the best open-source NLP projects? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 125,741 |
2 | bert | 37,077 |
3 | HanLP | 32,438 |
4 | AI-For-Beginners | 31,481 |
5 | spaCy | 28,849 |
6 | datasets | 18,480 |
7 | unilm | 18,407 |
8 | rasa | 18,012 |
9 | 500-AI-Machine-learning-Deep-learning-Computer-vision-NLP-Projects-with-code | 17,631 |
10 | Chinese-LLaMA-Alpaca | 17,466 |
11 | awesome-nlp | 16,058 |
12 | best-of-ml-python | 15,633 |
13 | gensim | 15,273 |
14 | Awesome-pytorch-list | 14,985 |
15 | ML-YouTube-Courses | 14,354 |
16 | nlp-tutorial | 13,735 |
17 | haystack | 13,784 |
18 | flair | 13,587 |
19 | NLTK | 13,054 |
20 | DeepLearningExamples | 12,660 |
21 | PaddleHub | 12,539 |
22 | botpress | 12,006 |
23 | FinGPT | 11,586 |
Sponsored