Large Language Models: Compairing Gen2/Gen3 Models (GPT-3, GPT-J, MT5 and More)

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

mesh-transformer-jax

52 6,213 0.0 Python

Model parallel transformers in JAX and Haiku

GPT-J is a LLM case study with two goals: Training a LLM with a data source containing unique material, and using the training frameworkMesh Transformer JAX to achieve a high training efficiency through parallelization. There is no research paper about GPT-J, but on its GitHub pages, the model, different checkpoints, and the complete source code for training is given.

math-lm

2 994 8.4 Python

The training material is named The Pile, a 800GB large corpus consisting of 22 different sources, including scientific research papers from ArXiV, legal documents from the the FreeLaw Project, and eBooks from Project Gutenberg campus. As shown in its documentation, GPT-J performance is on par with the GPT-3 6B model. Also, the model can be used for advanced theorem proving and natural language understanding.

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
Megatron-LM

19 8,914 9.9 Python

Ongoing research training transformer models at scale

This 20B model was trained on the same datasets as its predecessor, aptly named The Pile. Furthermore, the libraries Megatron and DeepSpeed were used to achieve better computing resource utilization, and eventually GPT-NeoX evolved into its own framework for training other LLMs. It was used, for example, as the foundation for Llemma, an open-source model specializing on theorem proving.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

I Created a Password Manager with AI: Powered by GPT-4

1 project | dev.to | 2 Jun 2024
Scout: Scalable Cognitive Operations Unified Team

1 project | news.ycombinator.com | 1 Jun 2024
Membuat Project Python yang mudah untuk dimaintain

1 project | dev.to | 1 Jun 2024
Make Maintainable Python Project

1 project | dev.to | 1 Jun 2024
Download Paul Graham essays in ePub format

1 project | news.ycombinator.com | 1 Jun 2024