mx.random.normal: bit-exact divergence on Apple M1 Max vs M3 Ultra / M5

## Description

`mx.random.normal(shape=..., key=mx.random.key(seed))` produces **different float32 outputs** on Apple M1 Max vs Apple M3 Ultra and Apple M5, given identical MLX version, identical Python, and identical inputs. `mx.random.uniform` and `mx.random.split` are unaffected — both produce bit-identical output across all three machines.

This breaks bit-exact reproducibility for any code that relies on `mx.random.normal` with a directly-constructed key, across Apple Silicon hardware generations.

## Reproducer

```python
import mlx.core as mx
import numpy as np
import hashlib

key = mx.random.key(0)
a = mx.random.normal(shape=(8,), key=key)
mx.eval(a)
print(hashlib.sha256(np.asarray(a, dtype=np.float32).tobytes()).hexdigest())
```

## Observed output (MLX 0.31.1, Python 3.14.3-4, macOS 26)

| Chip | `sha256` of `normal(shape=(8,), key=mx.random.key(0))` |
|------|--------------------------------------------------------|
| Apple M5         | `a662607828c52cafea1f79d98f09bd81855784116acd23364a6130eb037d6a5e` |
| Apple M3 Ultra   | `a662607828c52cafea1f79d98f09bd81855784116acd23364a6130eb037d6a5e` |
| **Apple M1 Max** | `8a8c5a6737feea9eeee3c7056f4699250e6b8440db8930fd10495ada9117c05a` |

Same pattern for `shape=(8,) key=mx.random.key(42)` and `shape=(16, 4) key=mx.random.key(0)` — M1 Max always differs from {M5, M3 Ultra}.

## Adjacent observations — narrowing the cause

The following calls produce **bit-identical** output across all three machines:

| Probe | M5 | M3 Ultra | M1 Max | All match ? |
|-------|----|----------|--------|-------------|
| `mx.random.key(0)` materialised | ✓ | ✓ | ✓ | **yes** |
| `mx.random.split(mx.random.key(7))` (both halves) | ✓ | ✓ | ✓ | **yes** |
| `mx.random.uniform(low=-1, high=1, shape=(8,), key=mx.random.key(0))` | ✓ | ✓ | ✓ | **yes** |
| `mx.random.normal(shape=(4,), key=<split-child-of-key(7)>)` | ✓ | ✓ | ✓ | **yes** |
| `mx.random.normal(shape=(8,), key=mx.random.key(0))` | ✓ | ✓ | ❌ | **no** |
| `mx.random.normal(shape=(8,), key=mx.random.key(42))` | ✓ | ✓ | ❌ | **no** |
| `mx.random.normal(shape=(16, 4), key=mx.random.key(0))` | ✓ | ✓ | ❌ | **no** |

So the divergence is specifically in `mx.random.normal` **on Apple M1 Max** when the key is the raw output of `mx.random.key(seed)`. The key tensor itself is identical; `mx.random.split` returns identical halves; `mx.random.uniform` produces identical output with the same raw key. Only `mx.random.normal` with the raw key diverges, only on M1 Max.

It is plausible (but not confirmed) that `mx.random.normal` with a split-derived key matches across all three, but `mx.random.normal` with a directly-constructed key takes a different kernel path on M1 Max.

## Environment

All three machines:

- MLX `0.31.1`
- Python 3.14.3 (M5) / 3.14.4 (M3 Ultra, M1 Max)
- macOS 26.x (Tahoe), arm64
- Same `uv` lockfile, single MLX wheel

## Why this matters (downstream)

We hit this in a research project (`dream-of-kiki`) that runs an R1 bit-exact reproducibility suite. After shipping new ops that call `mx.random.normal` with seeded keys, the hashes diverge between M1 Max and M5/M3 Ultra even at the same commit, breaking cross-machine equivalence that previously held for ops that didn't touch `normal`. Full milestone: [`docs/milestones/r1-cross-machine-m5-vs-m1-2026-05-20.md`](https://github.com/hypneum-lab/dream-of-kiki/blob/main/docs/milestones/r1-cross-machine-m5-vs-m1-2026-05-20.md).

We've put a `golden_hashes.json` per-hardware-family workaround in place locally, but it would be much cleaner if `mx.random.normal` produced the same bytes across Apple Silicon generations.

## Full repro script

A self-contained script that emits a JSON dump of 8 different `mx.random.*` probes (including the matching ones above as controls): [`/tmp/mlx_rng_repro.py` — paste here on request].

## Question

- Is this an intentional kernel divergence (different SIMD path / Philox internal variant per hardware family) or a regression worth fixing?
- If intentional, is there a recommended way to opt into a portable code path for `mx.random.normal`?


Chip	`sha256` of `normal(shape=(8,), key=mx.random.key(0))`
Apple M5	`a662607828c52cafea1f79d98f09bd81855784116acd23364a6130eb037d6a5e`
Apple M3 Ultra	`a662607828c52cafea1f79d98f09bd81855784116acd23364a6130eb037d6a5e`
Apple M1 Max	`8a8c5a6737feea9eeee3c7056f4699250e6b8440db8930fd10495ada9117c05a`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mx.random.normal: bit-exact divergence on Apple M1 Max vs M3 Ultra / M5 #3568

Description

Reproducer

Observed output (MLX 0.31.1, Python 3.14.3-4, macOS 26)

Adjacent observations — narrowing the cause

Environment

Why this matters (downstream)

Full repro script

Question

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Probe	M5	M3 Ultra	M1 Max	All match ?
`mx.random.key(0)` materialised	✓	✓	✓	yes
`mx.random.split(mx.random.key(7))` (both halves)	✓	✓	✓	yes
`mx.random.uniform(low=-1, high=1, shape=(8,), key=mx.random.key(0))`	✓	✓	✓	yes
`mx.random.normal(shape=(4,), key=<split-child-of-key(7)>)`	✓	✓	✓	yes
`mx.random.normal(shape=(8,), key=mx.random.key(0))`	✓	✓	❌	no
`mx.random.normal(shape=(8,), key=mx.random.key(42))`	✓	✓	❌	no
`mx.random.normal(shape=(16, 4), key=mx.random.key(0))`	✓	✓	❌	no

mx.random.normal: bit-exact divergence on Apple M1 Max vs M3 Ultra / M5 #3568

Description

Description

Reproducer

Observed output (MLX 0.31.1, Python 3.14.3-4, macOS 26)

Adjacent observations — narrowing the cause

Environment

Why this matters (downstream)

Full repro script

Question

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions