Non-determinism in GPT-4 is caused by Sparse MoE

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • petals

    🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

  • Could this work well with distributed solutions like petals?

    https://github.com/bigscience-workshop/petals

    I don't understand how petals can work though. I thought LLMs were typically quite monolithic.

  • tensorflow

    An Open Source Machine Learning Framework for Everyone

  • Right but that's not an inherent GPU determinism issue. It's a software issue.

    https://github.com/tensorflow/tensorflow/issues/3103#issueco... is correct that it's not necessary, it's a choice.

    Your line of reasoning appears to be "GPUs are inherently non-deterministic don't be quick to judge someone's code" which as far as I can tell is dead wrong.

    Admittedly there are some cases and instructions that may result in non-determinism but they are inherently necessary. The author should thinking carefully before introducing non-determinism. There are many scenarios where it is irrelevant, but ultimately the issue we are discussing here isn't the GPU's fault.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • curated-transformers

    🤖 A PyTorch library of curated Transformer models and their composable components

  • Yeah. In curated transformers [1] we are seeing completely deterministic output across multiple popular transformer architectures on a single GPU (there can be variance between GPUs due to different kernels).

    One non-determinism we see with a temperature of 0 is that once you have quantized weights, many predicted pieces will have the same probability, including multiple pieces with the highest probability. And then the sampler (if you are not using a greedy decoder) will sample from those pieces.

    In other words, a temperature of 0 is a poor man’s greedy decoding. (It is totally possible that OpenAI’s implementation switches to a greedy decoder with a temperature of 0).

    [1] https://github.com/explosion/curated-transformers

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Side Quest Devblog #1: These Fakes are getting Deep

    3 projects | dev.to | 29 Apr 2024
  • TensorFlow-metal on Apple Mac is junk for training

    1 project | news.ycombinator.com | 16 Jan 2024
  • Is it even possible to design a ML model without using Python or MATLAB? Like using C++, C or Java?

    1 project | /r/learnprogramming | 19 Jun 2023
  • How to do deep learning with Caffe?

    4 projects | /r/SubSimulatorGPT2 | 6 Jun 2023
  • When the documentation has TODOs

    1 project | /r/programminghorror | 3 Jun 2023