Non-determinism in GPT-4 is caused by Sparse MoE

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

petals

99 8,763 8.3 Python

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Could this work well with distributed solutions like petals?
https://github.com/bigscience-workshop/petals
I don't understand how petals can work though. I thought LLMs were typically quite monolithic.

tensorflow

223 183,162 10.0 C++

An Open Source Machine Learning Framework for Everyone

Right but that's not an inherent GPU determinism issue. It's a software issue.
https://github.com/tensorflow/tensorflow/issues/3103#issueco... is correct that it's not necessary, it's a choice.
Your line of reasoning appears to be "GPUs are inherently non-deterministic don't be quick to judge someone's code" which as far as I can tell is dead wrong.
Admittedly there are some cases and instructions that may result in non-determinism but they are inherently necessary. The author should thinking carefully before introducing non-determinism. There are many scenarios where it is irrelevant, but ultimately the issue we are discussing here isn't the GPU's fault.

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
curated-transformers

7 847 8.4 Python

🤖 A PyTorch library of curated Transformer models and their composable components

Yeah. In curated transformers [1] we are seeing completely deterministic output across multiple popular transformer architectures on a single GPU (there can be variance between GPUs due to different kernels).
One non-determinism we see with a temperature of 0 is that once you have quantized weights, many predicted pieces will have the same probability, including multiple pieces with the highest probability. And then the sampler (if you are not using a greedy decoder) will sample from those pieces.
In other words, a temperature of 0 is a poor man’s greedy decoding. (It is totally possible that OpenAI’s implementation switches to a greedy decoder with a temperature of 0).
[1] https://github.com/explosion/curated-transformers

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Side Quest Devblog #1: These Fakes are getting Deep

3 projects | dev.to | 29 Apr 2024
TensorFlow-metal on Apple Mac is junk for training

1 project | news.ycombinator.com | 16 Jan 2024
Is it even possible to design a ML model without using Python or MATLAB? Like using C++, C or Java?

1 project | /r/learnprogramming | 19 Jun 2023
How to do deep learning with Caffe?

4 projects | /r/SubSimulatorGPT2 | 6 Jun 2023
When the documentation has TODOs

1 project | /r/programminghorror | 3 Jun 2023

Non-determinism in GPT-4 is caused by Sparse MoE

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Machine Learning Tensorflow Python Deep Learning deep-neural-networks
Post date: 4 Aug 2023

petals

tensorflow

Scout Monitoring

curated-transformers

Related posts

Side Quest Devblog #1: These Fakes are getting Deep

TensorFlow-metal on Apple Mac is junk for training

Is it even possible to design a ML model without using Python or MATLAB? Like using C++, C or Java?

How to do deep learning with Caffe?

When the documentation has TODOs

Non-determinism in GPT-4 is caused by Sparse MoE

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Machine Learning Tensorflow Python Deep Learning deep-neural-networks Post date: 4 Aug 2023

petals

tensorflow

Scout Monitoring

curated-transformers

Related posts

Side Quest Devblog #1: These Fakes are getting Deep

TensorFlow-metal on Apple Mac is junk for training

Is it even possible to design a ML model without using Python or MATLAB? Like using C++, C or Java?

How to do deep learning with Caffe?

When the documentation has TODOs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Machine Learning Tensorflow Python Deep Learning deep-neural-networks
Post date: 4 Aug 2023