Python NLP

Open-source Python projects categorized as NLP

Top 23 Python NLP Projects

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

  • Project mention: AI enthusiasm #9 - A multilingual chatbot📣🈸 | dev.to | 2024-05-01

    transformers is a package by Hugging Face, that helps you interact with models on HF Hub (GitHub)

  • bert

    TensorFlow code and pre-trained models for BERT

  • Project mention: OpenAI – Application for US trademark "GPT" has failed | news.ycombinator.com | 2024-02-15

    task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pre-trained parameters.

    [0] https://arxiv.org/abs/1810.04805

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • HanLP

    中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理

  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

  • Project mention: Step by step guide to create customized chatbot by using spaCy (Python NLP library) | dev.to | 2024-03-23

    Hi Community, In this article, I will demonstrate below steps to create your own chatbot by using spaCy (spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython):

  • datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

  • Project mention: 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑‍💻 🥇 | dev.to | 2023-10-19
  • unilm

    Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

  • Project mention: The Era of 1-Bit LLMs: Training_Tips, Code And_FAQ [pdf] | news.ycombinator.com | 2024-03-21
  • rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

  • Project mention: 🔥🚀 Top 10 Open-Source Must-Have Tools for Crafting Your Own Chatbot 🤖💬 | dev.to | 2023-11-06

    Support Rasa on GitHub ⭐

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • Chinese-LLaMA-Alpaca

    中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

  • Project mention: Chinese-Alpaca-Plus-13B-GPTQ | /r/LocalLLaMA | 2023-05-30

    I'd like to share with you today the Chinese-Alpaca-Plus-13B-GPTQ model, which is the GPTQ format quantised 4bit models of Yiming Cui's Chinese-LLaMA-Alpaca 13B for GPU reference.

  • best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

  • gensim

    Topic Modelling for Humans

  • Project mention: Aggregating news from different sources | /r/learnprogramming | 2023-07-08
  • haystack

    :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

  • Project mention: Haystack DB – 10x faster than FAISS with binary embeddings by default | news.ycombinator.com | 2024-04-28

    I was confused for a bit but there is no relation to https://haystack.deepset.ai/

  • flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

  • NLTK

    NLTK Source

  • Project mention: Building a local AI smart Home Assistant | news.ycombinator.com | 2024-01-13

    alternatively, could we not simply split by common characters such as newlines and periods, to split it within sentences? it would be fragile with special handling required for numbers with decimal points and probably various other edge cases, though.

    there are also Python libraries meant for natural language parsing[0] that could do that task for us. I even see examples on stack overflow[1] that simply split text into sentences.

    [0]: https://www.nltk.org/

  • PaddleHub

    Awesome pre-trained models toolkit based on PaddlePaddle. (400+ models including Image, Text, Audio, Video and Cross-Modal with Easy Inference & Serving)

  • PaddleNLP

    👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.

  • TextBlob

    Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

  • Project mention: Using EvaDB to build AI-enhanced apps | dev.to | 2024-01-10

    TextBlob is a Python toolkit for text processing. It offers some common NLP functionalities such as part-of-speech tagging and noun phrase extraction. We’ll use TextBlob in our project to perform some quick sentiment analysis on tweets.

  • petals

    🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

  • Project mention: Mistral Large | news.ycombinator.com | 2024-02-26

    So how long until we can do an open source Mistral Large?

    We could make a start on Petals or some other open source distributed training network cluster possibly?

    [0] https://petals.dev/

  • attention-is-all-you-need-pytorch

    A PyTorch implementation of the Transformer model in "Attention is All You Need".

  • Project mention: ElevenLabs Launches Voice Translation Tool to Break Down Language Barriers | news.ycombinator.com | 2023-10-10

    The transformer model was invented to attend to context over the entire sequence length. Look at how the original authors used the Transformer for NMT in the original Vaswani et al publication. https://github.com/jadore801120/attention-is-all-you-need-py...

  • text-generation-inference

    Large Language Model Text Generation Inference

  • Project mention: FLaNK AI-April 22, 2024 | dev.to | 2024-04-22
  • GPT2-Chinese

    Chinese version of GPT2 training code, using BERT tokenizer.

  • Stanza

    Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages

  • Project mention: Down and Out in the Magic Kingdom | news.ycombinator.com | 2023-07-23
  • txtai

    💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

  • Project mention: Show HN: FileKitty – Combine and label text files for LLM prompt contexts | news.ycombinator.com | 2024-05-01
  • mycroft-core

    Mycroft Core, the Mycroft Artificial Intelligence platform.

  • Project mention: Rabbit R1, Designed by Teenage Engineering | news.ycombinator.com | 2024-01-09

    It's indeed suspicious. You're sending your voice samples, your various services accounts, your location and more private data to some proprietary black box in some public cloud. Sorry, but this is a privacy nightmare. It should be open source and self-hosted like Mycroft (https://mycroft.ai) or Leon (https://getleon.ai) to be trustworthy.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python NLP related posts

  • LangFun: Object oriented data programs using LLMs

    1 project | news.ycombinator.com | 2 May 2024
  • AI enthusiasm #9 - A multilingual chatbot📣🈸

    6 projects | dev.to | 1 May 2024
  • What contributing to Open-source is, and what it isn't

    1 project | news.ycombinator.com | 27 Apr 2024
  • Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding

    3 projects | news.ycombinator.com | 21 Apr 2024
  • AI enthusiasm #6 - Finetune any LLM you want💡

    2 projects | dev.to | 16 Apr 2024
  • Fast and secure translation on your local machine with a GUI

    6 projects | news.ycombinator.com | 13 Apr 2024
  • Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat

    3 projects | news.ycombinator.com | 12 Apr 2024
  • A note from our sponsor - SaaSHub
    www.saashub.com | 4 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source NLP projects in Python? This list will help you:

Project Stars
1 transformers 125,369
2 bert 37,036
3 HanLP 32,388
4 spaCy 28,751
5 datasets 18,443
6 unilm 18,358
7 rasa 17,984
8 Chinese-LLaMA-Alpaca 17,348
9 best-of-ml-python 15,364
10 gensim 15,256
11 haystack 13,711
12 flair 13,582
13 NLTK 13,035
14 PaddleHub 12,520
15 PaddleNLP 11,448
16 TextBlob 8,944
17 petals 8,684
18 attention-is-all-you-need-pytorch 8,469
19 text-generation-inference 7,881
20 GPT2-Chinese 7,358
21 Stanza 7,053
22 txtai 6,990
23 mycroft-core 6,456

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com