Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Natural Language Processing Open-Source Projects
-
funNLP
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&摘要相关工具、cocoNLP信息抽取工具、国内电话号码正则匹配、清华大学XLORE:中英文跨语言百科知识图谱、清华大学人工智能技术系列报告、自然语言生成、NLU太难了系列、自动对联数据及机器人、用户名黑名单列表、罪名法务名词及分类模型、微信公众号语料、cs224n深度学习自然语言处理课程、中文手写汉字识别、中文自然语言处理 语料/数据集、变量命名神器、分词语料库+代码、任务型对话英文数据集、ASR 语音数据集 + 基于深度学习的中文
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
HanLP
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
applied-ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
-
NLP-progress
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
-
d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
-
datasets
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
-
rasa
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
-
Ciphey
⚡ Automatically decrypt encryptions without knowing the key or cipher, decode encodings, and crack hashes ⚡
-
Awesome-pytorch-list
A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
-
deep-learning-drizzle
Drench yourself in Deep Learning, Reinforcement Learning, Machine Learning, Computer Vision, and NLP by learning from these exciting lectures!!
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Fascinating work, very promising.
Can you summarise how the model in your paper differs from this one ?
https://github.com/huggingface/transformers/issues/27011
In 2019, a new language representation called BERT (Bedirectional Encoder Representation from Transformers) was introduced. The main idea behind this paradigm is to first pre-train a language model using a massive amount of unlabeled data then fine-tune all the parameters using labeled data from the downstream tasks. This allows the model to generalize well to different NLP tasks. Moreover, it has been shown that this language representation model can be used to solve downstream tasks without being explicitly trained on, e.g classify a text without training phase.
Project mention: [D] How do you keep up to date on Machine Learning? | /r/learnmachinelearning | 2023-08-13Made With ML
Project mention: [OC] How Many Chinese Characters You Need to Learn to Read Chinese! | /r/dataisbeautiful | 2023-06-14jieba to do Chinese word segmentation
Project mention: How I discovered Named Entity Recognition while trying to remove gibberish from a string. | dev.to | 2024-05-06
Project mention: 🐍🐍 23 issues to grow yourself as an exceptional open-source Python expert 🧑💻 🥇 | dev.to | 2023-10-19
Project mention: 🔥🚀 Top 10 Open-Source Must-Have Tools for Crafting Your Own Chatbot 🤖💬 | dev.to | 2023-11-06Support Rasa on GitHub ⭐
Project mention: CyberChef from GCHQ: The Cyber Swiss Army Knife | news.ycombinator.com | 2024-02-01I also discovered Ciphey. Neat little tool indeed, but it's being deprecated. It's mentioned in this issue[1] and being replaced with Ares[2]. Neither could decipher this strange encryption[3] I used it on :(
[1] https://github.com/Ciphey/Ciphey/issues/764
[2] https://github.com/bee-san/Ares
[3] "dEFLWWFKQWxRQW16RnkvbTZML0lsdz09" original text is "hacker"
alternatively, could we not simply split by common characters such as newlines and periods, to split it within sentences? it would be fragile with special handling required for numbers with decimal points and probably various other edge cases, though.
there are also Python libraries meant for natural language parsing[0] that could do that task for us. I even see examples on stack overflow[1] that simply split text into sentences.
[0]: https://www.nltk.org/
Natural Language Processing related posts
-
XLSTM: Extended Long Short-Term Memory
-
How I discovered Named Entity Recognition while trying to remove gibberish from a string.
-
Zero Shot Text Classification Under the hood
-
Show HN: GoSBD v0.1.4 and updated playground for Intl.Segmenter option
-
Maxtext: A simple, performant and scalable Jax LLM
-
Ruby vs. Python comes down to the for loop (2021)
-
Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding
-
A note from our sponsor - InfluxDB
www.influxdata.com | 12 May 2024
Index
What are some of the best open-source Natural Language Processing projects? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 125,741 |
2 | funNLP | 64,360 |
3 | bert | 37,077 |
4 | Made-With-ML | 35,801 |
5 | Jieba | 32,484 |
6 | HanLP | 32,438 |
7 | spaCy | 28,789 |
8 | applied-ml | 26,028 |
9 | NLP-progress | 22,350 |
10 | d2l-en | 21,858 |
11 | datasets | 18,480 |
12 | rasa | 18,012 |
13 | Ciphey | 17,092 |
14 | awesome-nlp | 16,058 |
15 | gensim | 15,273 |
16 | Awesome-pytorch-list | 14,985 |
17 | ML-YouTube-Courses | 14,354 |
18 | DocsGPT | 14,208 |
19 | nlp-tutorial | 13,735 |
20 | flair | 13,587 |
21 | NLTK | 13,054 |
22 | MOSS | 11,825 |
23 | deep-learning-drizzle | 11,834 |
Sponsored