LASER
electra
LASER | electra | |
---|---|---|
5 | 3 | |
3,539 | 2,296 | |
0.8% | 0.7% | |
5.7 | 0.0 | |
21 days ago | 2 months ago | |
Jupyter Notebook | Python | |
GNU General Public License v3.0 or later | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
LASER
-
SentenceTransformers: Python framework for sentence, text and image embeddings
I'm curious how people are handling multi-lingual embeddings.
I've found LASER[1] which originally had the idea to embed all languages in the same vector space, though it's a bit harder to use than models available through SentenceTransformers. LASER2 stuck with this approach, but LASER3 switched to language-specific models. However, I haven't found benchmarks for these models, and they were released about 2 years ago.
Another alternative would be to translate everything before embedding, which would introduce some amount of error, though maybe it wouldn't be significant.
1. https://github.com/facebookresearch/LASER
-
[D] Hey Reddit! We're a bunch of research scientists and software engineers and we just open sourced a new state-of-the-art AI model that can translate between 200 different languages. We're excited to hear your thoughts so we're hosting an AMA on 07/21/2022 @ 9:00AM PT. Ask Us Anything!
You can check out some of our materials and open sourced artifacts here: - Our latest blog post: https://ai.facebook.com/blog/nllb-200-high-quality-machine-translation - Project Overview: https://ai.facebook.com/research/no-language-left-behind/ - Product demo: https://nllb.metademolab.com/ - Research paper: https://research.facebook.com/publications/no-language-left-behind - NLLB-200: https://github.com/facebookresearch/fairseq/tree/nllb - FLORES-200: https://github.com/facebookresearch/flores - LASER3: https://github.com/facebookresearch/LASER Joining us today for the AMA are: - Angela Fan (AF), Research Scientist - Jean Maillard (JM), Research Scientist - Maha Elbayad (ME), Research Scientist - Philipp Koehn (PK), Research Scientist - Shruti Bhosale (SB), Software Engineer We’ll be here from 07/21/2022 @09:00AM PT - 10:00AM PT Thanks and we’re looking forward to answering your questions!
-
School project : sentiments analysis with my country Arabic Dialect
This may be helpful: https://github.com/facebookresearch/LASER
-
[P] Bilingual text alignment tools for NMT - help needed
Check FB's LASER: https://github.com/facebookresearch/LASER/tree/master/tasks/CCMatrix Also , Sentence-Transformers has a pretty neat model for crosslingual sentence similarity: https://huggingface.co/sentence-transformers/stsb-xlm-r-multilingual
-
Help with aligned word embeddings
You want LASER its a superbig model trained on tons of languages you can use it with sentence_transformers in python to compute embedings. Then you can use faiss or datasketch to find matches at K
electra
-
Fine-tuned model consistently producing Precision and Recall scores of 0 from start of training, any suggestions on how to improve?
If this is your own implementation of ELECTRA, hopefully you have previous versions you've demonstrated working, you could revert back to a working version, then apply the changes you made one-by-one. If it's open-source code you are using, such as this one, try and find a working example, run it yourself, carefully modify it, preserve it in a working (high performance) state, change it piece-by-piece until it works on your problem.
-
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators Web Demo
github: https://github.com/google-research/electra
-
Help with aligned word embeddings
If you have at least a decent gaming gpu or want to bother with colab, you could get a relevant dataset and use electra https://github.com/google-research/electra
What are some alternatives?
MUSE - A library for Multilingual Unsupervised or Supervised word Embeddings
clip-as-service - 🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
Arraymancer - A fast, ergonomic and portable tensor library in Nim with a deep learning focus for CPU, GPU and embedded devices via OpenMP, Cuda and OpenCL backends
stanford-tensorflow-tutorials - This repository contains code examples for the Stanford's course: TensorFlow for Deep Learning Research.
fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
flores - Facebook Low Resource (FLoRes) MT Benchmark
datasets - 🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
iSarcasmEval - Datasets used for iSarcasmEval shared-task (Task 6 at SemEval 2022)