Top 23 Python Bert Projects

transformers

180 126,170 10.0 Python

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Project mention: Reading list to join AI field from Hugging Face cofounder | news.ycombinator.com | 2024-05-18

Not sure what you are implying. Thomas Wolf has the second highest number of commits on HuggingFace/transformers. He is clearly competent & deeply technical
https://github.com/huggingface/transformers/

haystack

55 13,883 9.9 Python

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

Project mention: Haystack DB – 10x faster than FAISS with binary embeddings by default | news.ycombinator.com | 2024-04-28

I was confused for a bit but there is no relation to https://haystack.deepset.ai/

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
clip-as-service

15 12,212 5.2 Python

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

Project mention: Search for anything ==> Immich fails to download textual.onnx | /r/immich | 2023-09-15

PaddleNLP

2 11,515 9.8 Python

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
bertviz

15 6,428 3.9 Python

BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)

Project mention: StreamingLLM: tiny tweak to KV LRU improves long conversations | news.ycombinator.com | 2024-02-13

This seems only to work cause large GPTs have redundant, undercomplex attentions. See this issue in BertViz about attention in Llama: https://github.com/jessevig/bertviz/issues/128

ERNIE

4 6,165 2.7 Python

Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.
BERT-pytorch

1 6,032 0.0 Python

Google AI 2018 BERT pytorch implementation
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
BERTopic

22 5,619 8.2 Python

Leveraging BERT and c-TF-IDF to create easily interpretable topics.

Project mention: how can a top2vec output be improved | /r/learnmachinelearning | 2023-07-04

Try experimenting with different hyperparameters, clustering algorithms and embedding representations. Try https://github.com/MaartenGr/BERTopic/tree/master/bertopic

awesome-pretrained-chinese-nlp-models

1 4,279 8.9 Python

Awesome Pretrained Chinese NLP Models，高质量中文预训练模型&大模型&多模态模型&大语言模型集合
KeyBERT

5 3,237 6.1 Python

Minimal keyword extraction with BERT

Project mention: I want to extract important keywords from large documents... | /r/LangChain | 2023-12-07

Use something else like KeyBERT or BERTopic: https://github.com/MaartenGr/KeyBERT It's much faster.

llmware

9 3,839 9.8 Python

Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models.

Project mention: More Agents Is All You Need: LLMs performance scales with the number of agents | news.ycombinator.com | 2024-04-06

I couldn't agree more. You should check out LLMWare's SLIM agents (https://github.com/llmware-ai/llmware/tree/main/examples/SLI...). It's focusing on pretty much exactly this and chaining multiple local LLMs together.
A really good topic that ties in with this is the need for deterministic sampling (I may have the terminology a bit incorrect) depending on what the model is indended for. The LLMWare team did a good 2 part video on this here as well (https://www.youtube.com/watch?v=7oMTGhSKuNY)
I think dedicated miniture LLMs are the way forward.
Disclaimer - Not affiliated with them in any way, just think it's a really cool project.

Top2Vec

13 2,844 6.2 Python

Top2Vec learns jointly embedded topic, document and word vectors.

Project mention: [D] Is it better to create a different set of Doc2Vec embeddings for each group in my dataset, rather than generating embeddings for the entire dataset? | /r/MachineLearning | 2023-10-28

I'm using Top2Vec with Doc2Vec embeddings to find topics in a dataset of ~4000 social media posts. This dataset has three groups:

ABSA-PyTorch

1 1,962 0.0 Python

Aspect Based Sentiment Analysis, PyTorch Implementations. 基于方面的情感分析，使用PyTorch实现。
AliceMind

1 1,951 5.7 Python

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
DeBERTa

4 1,865 3.6 Python

The implementation of DeBERTa
FARM

3 1,728 0.0 Python

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
jiant

2 1,610 0.0 Python

jiant is an nlp toolkit
finetuner

36 1,435 5.5 Python

:dart: Task-oriented embedding tuning for BERT, CLIP, etc.
scibert

2 1,440 0.0 Python

A BERT model for scientific text.
beir

8 1,407 4.2 Python

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

Project mention: On building a semantic search engine | news.ycombinator.com | 2024-01-06

The BEIR project might be what you're looking for: https://github.com/beir-cellar/beir/wiki/Leaderboard

SparK

3 1,389 7.3 Python

[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling" (by keyu-tian)
BERT-NER

1 1,182 0.0 Python

Pytorch-Named-Entity-Recognition-with-BERT
contextualized-topic-models

7 1,166 5.0 Python

A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Bert related posts

AI enthusiasm #6 - Finetune any LLM you want💡

2 projects | dev.to | 16 Apr 2024
Splade: Sparse Neural Search

1 project | news.ycombinator.com | 11 Mar 2024
Show HN: LLMWare – Small Specialized Function Calling 1B LLMs for Multi-Step RAG

2 projects | news.ycombinator.com | 11 Feb 2024
Better Call GPT, Comparing Large Language Models Against Lawyers (pdf)

1 project | news.ycombinator.com | 6 Feb 2024
Show HN: LLMWare – Integrated Solution for RAG in Finance and Legal

1 project | news.ycombinator.com | 21 Jan 2024
On building a semantic search engine

3 projects | news.ycombinator.com | 6 Jan 2024
Llmware.ai – AI Tools for Financial, Legal and Compliance

1 project | news.ycombinator.com | 12 Jan 2024
A note from our sponsor - InfluxDB
www.influxdata.com | 18 May 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source Bert projects in Python? This list will help you:

	Project	Stars
1	transformers	126,170
2	haystack	13,883
3	clip-as-service	12,212
4	PaddleNLP	11,515
5	bertviz	6,428
6	ERNIE	6,165
7	BERT-pytorch	6,032
8	BERTopic	5,619
9	awesome-pretrained-chinese-nlp-models	4,279
10	KeyBERT	3,237
11	llmware	3,839
12	Top2Vec	2,844
13	ABSA-PyTorch	1,962
14	AliceMind	1,951
15	DeBERTa	1,865
16	FARM	1,728
17	jiant	1,610
18	finetuner	1,435
19	scibert	1,440
20	beir	1,407
21	SparK	1,389
22	BERT-NER	1,182
23	contextualized-topic-models	1,166

Python Bert

Top 23 Python Bert Projects

Python Bert related posts

AI enthusiasm #6 - Finetune any LLM you want💡

Splade: Sparse Neural Search

Show HN: LLMWare – Small Specialized Function Calling 1B LLMs for Multi-Step RAG

Better Call GPT, Comparing Large Language Models Against Lawyers (pdf)

Show HN: LLMWare – Integrated Solution for RAG in Finance and Legal

On building a semantic search engine

Llmware.ai – AI Tools for Financial, Legal and Compliance

Index