-
emubert-creator
The training code behind EmuBert, the largest open-source masked language model for Australian law.
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
⦁ Text embedding.
Not only that but, despite only being trained to guess missing words, EmuBert seems to know facts such as that Norfolk Island is an Australian territory (try the prompt, 'Norfolk Island is an Australian .'), it is Section 51 of the Constitution that grants Parliament the power to make laws for the peace, order, and good government of the Commonwealth ('Section of the Constitution grants the Australian Parliament the power to make laws for the peace, order, and good government of the Commonwealth.'), and that the representative of the monarch of Australia is the Governor-General ('The representative of the monarch of Australia is the -General.').
Finally, EmuBert achieves a perplexity of 2.05 on the Open Australian Legal QA, the first open dataset of Australian legal questions and answers, outperforming all known state-of-the-art masked language models, including Roberta, Bert and Legal-Bert.
You can check out EmuBert on Hugging Face here: https://huggingface.co/umarbutler/emubert
The code I used to create EmuBert is also openly available on GitHub: https://github.com/umarbutler/emubert-creator