Large Language Models: Compairing Gen2/Gen3 Models (GPT-3, GPT-J, MT5 and More)

This page summarizes the projects mentioned and recommended in the original post on dev.to

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • mesh-transformer-jax

    Model parallel transformers in JAX and Haiku

  • GPT-J is a LLM case study with two goals: Training a LLM with a data source containing unique material, and using the training frameworkMesh Transformer JAX to achieve a high training efficiency through parallelization. There is no research paper about GPT-J, but on its GitHub pages, the model, different checkpoints, and the complete source code for training is given.

  • math-lm

  • The training material is named The Pile, a 800GB large corpus consisting of 22 different sources, including scientific research papers from ArXiV, legal documents from the the FreeLaw Project, and eBooks from Project Gutenberg campus. As shown in its documentation, GPT-J performance is on par with the GPT-3 6B model. Also, the model can be used for advanced theorem proving and natural language understanding.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • Megatron-LM

    Ongoing research training transformer models at scale

  • This 20B model was trained on the same datasets as its predecessor, aptly named The Pile. Furthermore, the libraries Megatron and DeepSpeed were used to achieve better computing resource utilization, and eventually GPT-NeoX evolved into its own framework for training other LLMs. It was used, for example, as the foundation for Llemma, an open-source model specializing on theorem proving.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • I Created a Password Manager with AI: Powered by GPT-4

    1 project | dev.to | 2 Jun 2024
  • Scout: Scalable Cognitive Operations Unified Team

    1 project | news.ycombinator.com | 1 Jun 2024
  • Membuat Project Python yang mudah untuk dimaintain

    1 project | dev.to | 1 Jun 2024
  • Make Maintainable Python Project

    1 project | dev.to | 1 Jun 2024
  • Download Paul Graham essays in ePub format

    1 project | news.ycombinator.com | 1 Jun 2024