Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Python Bert Projects
-
haystack
:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
PaddleNLP
👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
-
ERNIE
Official implementations for various pre-training models of ERNIE-family, covering topics of Language Understanding & Generation, Multimodal Understanding & Generation, and beyond.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
awesome-pretrained-chinese-nlp-models
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
-
FARM
:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
-
beir
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
-
SparK
[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling" (by keyu-tian)
-
contextualized-topic-models
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Reading list to join AI field from Hugging Face cofounder | news.ycombinator.com | 2024-05-18Not sure what you are implying. Thomas Wolf has the second highest number of commits on HuggingFace/transformers. He is clearly competent & deeply technical
https://github.com/huggingface/transformers/
Project mention: Haystack DB – 10x faster than FAISS with binary embeddings by default | news.ycombinator.com | 2024-04-28I was confused for a bit but there is no relation to https://haystack.deepset.ai/
Project mention: Search for anything ==> Immich fails to download textual.onnx | /r/immich | 2023-09-15
Project mention: StreamingLLM: tiny tweak to KV LRU improves long conversations | news.ycombinator.com | 2024-02-13This seems only to work cause large GPTs have redundant, undercomplex attentions. See this issue in BertViz about attention in Llama: https://github.com/jessevig/bertviz/issues/128
Try experimenting with different hyperparameters, clustering algorithms and embedding representations. Try https://github.com/MaartenGr/BERTopic/tree/master/bertopic
Project mention: I want to extract important keywords from large documents... | /r/LangChain | 2023-12-07Use something else like KeyBERT or BERTopic: https://github.com/MaartenGr/KeyBERT It's much faster.
Project mention: More Agents Is All You Need: LLMs performance scales with the number of agents | news.ycombinator.com | 2024-04-06I couldn't agree more. You should check out LLMWare's SLIM agents (https://github.com/llmware-ai/llmware/tree/main/examples/SLI...). It's focusing on pretty much exactly this and chaining multiple local LLMs together.
A really good topic that ties in with this is the need for deterministic sampling (I may have the terminology a bit incorrect) depending on what the model is indended for. The LLMWare team did a good 2 part video on this here as well (https://www.youtube.com/watch?v=7oMTGhSKuNY)
I think dedicated miniture LLMs are the way forward.
Disclaimer - Not affiliated with them in any way, just think it's a really cool project.
Project mention: [D] Is it better to create a different set of Doc2Vec embeddings for each group in my dataset, rather than generating embeddings for the entire dataset? | /r/MachineLearning | 2023-10-28I'm using Top2Vec with Doc2Vec embeddings to find topics in a dataset of ~4000 social media posts. This dataset has three groups:
The BEIR project might be what you're looking for: https://github.com/beir-cellar/beir/wiki/Leaderboard
Python Bert related posts
-
AI enthusiasm #6 - Finetune any LLM you want💡
-
Splade: Sparse Neural Search
-
Show HN: LLMWare – Small Specialized Function Calling 1B LLMs for Multi-Step RAG
-
Better Call GPT, Comparing Large Language Models Against Lawyers (pdf)
-
Show HN: LLMWare – Integrated Solution for RAG in Finance and Legal
-
On building a semantic search engine
-
Llmware.ai – AI Tools for Financial, Legal and Compliance
-
A note from our sponsor - InfluxDB
www.influxdata.com | 18 May 2024
Index
What are some of the best open-source Bert projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 126,170 |
2 | haystack | 13,883 |
3 | clip-as-service | 12,212 |
4 | PaddleNLP | 11,515 |
5 | bertviz | 6,428 |
6 | ERNIE | 6,165 |
7 | BERT-pytorch | 6,032 |
8 | BERTopic | 5,619 |
9 | awesome-pretrained-chinese-nlp-models | 4,279 |
10 | KeyBERT | 3,237 |
11 | llmware | 3,839 |
12 | Top2Vec | 2,844 |
13 | ABSA-PyTorch | 1,962 |
14 | AliceMind | 1,951 |
15 | DeBERTa | 1,865 |
16 | FARM | 1,728 |
17 | jiant | 1,610 |
18 | finetuner | 1,435 |
19 | scibert | 1,440 |
20 | beir | 1,407 |
21 | SparK | 1,389 |
22 | BERT-NER | 1,182 |
23 | contextualized-topic-models | 1,166 |
Sponsored