Description
mx.random.normal(shape=..., key=mx.random.key(seed)) produces different float32 outputs on Apple M1 Max vs Apple M3 Ultra and Apple M5, given identical MLX version, identical Python, and identical inputs. mx.random.uniform and mx.random.split are unaffected — both produce bit-identical output across all three machines.
This breaks bit-exact reproducibility for any code that relies on mx.random.normal with a directly-constructed key, across Apple Silicon hardware generations.
Reproducer
import mlx.core as mx
import numpy as np
import hashlib
key = mx.random.key(0)
a = mx.random.normal(shape=(8,), key=key)
mx.eval(a)
print(hashlib.sha256(np.asarray(a, dtype=np.float32).tobytes()).hexdigest())
Observed output (MLX 0.31.1, Python 3.14.3-4, macOS 26)
| Chip |
sha256 of normal(shape=(8,), key=mx.random.key(0)) |
| Apple M5 |
a662607828c52cafea1f79d98f09bd81855784116acd23364a6130eb037d6a5e |
| Apple M3 Ultra |
a662607828c52cafea1f79d98f09bd81855784116acd23364a6130eb037d6a5e |
| Apple M1 Max |
8a8c5a6737feea9eeee3c7056f4699250e6b8440db8930fd10495ada9117c05a |
Same pattern for shape=(8,) key=mx.random.key(42) and shape=(16, 4) key=mx.random.key(0) — M1 Max always differs from {M5, M3 Ultra}.
Adjacent observations — narrowing the cause
The following calls produce bit-identical output across all three machines:
| Probe |
M5 |
M3 Ultra |
M1 Max |
All match ? |
mx.random.key(0) materialised |
✓ |
✓ |
✓ |
yes |
mx.random.split(mx.random.key(7)) (both halves) |
✓ |
✓ |
✓ |
yes |
mx.random.uniform(low=-1, high=1, shape=(8,), key=mx.random.key(0)) |
✓ |
✓ |
✓ |
yes |
mx.random.normal(shape=(4,), key=<split-child-of-key(7)>) |
✓ |
✓ |
✓ |
yes |
mx.random.normal(shape=(8,), key=mx.random.key(0)) |
✓ |
✓ |
❌ |
no |
mx.random.normal(shape=(8,), key=mx.random.key(42)) |
✓ |
✓ |
❌ |
no |
mx.random.normal(shape=(16, 4), key=mx.random.key(0)) |
✓ |
✓ |
❌ |
no |
So the divergence is specifically in mx.random.normal on Apple M1 Max when the key is the raw output of mx.random.key(seed). The key tensor itself is identical; mx.random.split returns identical halves; mx.random.uniform produces identical output with the same raw key. Only mx.random.normal with the raw key diverges, only on M1 Max.
It is plausible (but not confirmed) that mx.random.normal with a split-derived key matches across all three, but mx.random.normal with a directly-constructed key takes a different kernel path on M1 Max.
Environment
All three machines:
- MLX
0.31.1
- Python 3.14.3 (M5) / 3.14.4 (M3 Ultra, M1 Max)
- macOS 26.x (Tahoe), arm64
- Same
uv lockfile, single MLX wheel
Why this matters (downstream)
We hit this in a research project (dream-of-kiki) that runs an R1 bit-exact reproducibility suite. After shipping new ops that call mx.random.normal with seeded keys, the hashes diverge between M1 Max and M5/M3 Ultra even at the same commit, breaking cross-machine equivalence that previously held for ops that didn't touch normal. Full milestone: docs/milestones/r1-cross-machine-m5-vs-m1-2026-05-20.md.
We've put a golden_hashes.json per-hardware-family workaround in place locally, but it would be much cleaner if mx.random.normal produced the same bytes across Apple Silicon generations.
Full repro script
A self-contained script that emits a JSON dump of 8 different mx.random.* probes (including the matching ones above as controls): [/tmp/mlx_rng_repro.py — paste here on request].
Question
- Is this an intentional kernel divergence (different SIMD path / Philox internal variant per hardware family) or a regression worth fixing?
- If intentional, is there a recommended way to opt into a portable code path for
mx.random.normal?
Description
mx.random.normal(shape=..., key=mx.random.key(seed))produces different float32 outputs on Apple M1 Max vs Apple M3 Ultra and Apple M5, given identical MLX version, identical Python, and identical inputs.mx.random.uniformandmx.random.splitare unaffected — both produce bit-identical output across all three machines.This breaks bit-exact reproducibility for any code that relies on
mx.random.normalwith a directly-constructed key, across Apple Silicon hardware generations.Reproducer
Observed output (MLX 0.31.1, Python 3.14.3-4, macOS 26)
sha256ofnormal(shape=(8,), key=mx.random.key(0))a662607828c52cafea1f79d98f09bd81855784116acd23364a6130eb037d6a5ea662607828c52cafea1f79d98f09bd81855784116acd23364a6130eb037d6a5e8a8c5a6737feea9eeee3c7056f4699250e6b8440db8930fd10495ada9117c05aSame pattern for
shape=(8,) key=mx.random.key(42)andshape=(16, 4) key=mx.random.key(0)— M1 Max always differs from {M5, M3 Ultra}.Adjacent observations — narrowing the cause
The following calls produce bit-identical output across all three machines:
mx.random.key(0)materialisedmx.random.split(mx.random.key(7))(both halves)mx.random.uniform(low=-1, high=1, shape=(8,), key=mx.random.key(0))mx.random.normal(shape=(4,), key=<split-child-of-key(7)>)mx.random.normal(shape=(8,), key=mx.random.key(0))mx.random.normal(shape=(8,), key=mx.random.key(42))mx.random.normal(shape=(16, 4), key=mx.random.key(0))So the divergence is specifically in
mx.random.normalon Apple M1 Max when the key is the raw output ofmx.random.key(seed). The key tensor itself is identical;mx.random.splitreturns identical halves;mx.random.uniformproduces identical output with the same raw key. Onlymx.random.normalwith the raw key diverges, only on M1 Max.It is plausible (but not confirmed) that
mx.random.normalwith a split-derived key matches across all three, butmx.random.normalwith a directly-constructed key takes a different kernel path on M1 Max.Environment
All three machines:
0.31.1uvlockfile, single MLX wheelWhy this matters (downstream)
We hit this in a research project (
dream-of-kiki) that runs an R1 bit-exact reproducibility suite. After shipping new ops that callmx.random.normalwith seeded keys, the hashes diverge between M1 Max and M5/M3 Ultra even at the same commit, breaking cross-machine equivalence that previously held for ops that didn't touchnormal. Full milestone:docs/milestones/r1-cross-machine-m5-vs-m1-2026-05-20.md.We've put a
golden_hashes.jsonper-hardware-family workaround in place locally, but it would be much cleaner ifmx.random.normalproduced the same bytes across Apple Silicon generations.Full repro script
A self-contained script that emits a JSON dump of 8 different
mx.random.*probes (including the matching ones above as controls): [/tmp/mlx_rng_repro.py— paste here on request].Question
mx.random.normal?