Skip to content

mx.random.normal: bit-exact divergence on Apple M1 Max vs M3 Ultra / M5 #3568

@electron-rare

Description

@electron-rare

Description

mx.random.normal(shape=..., key=mx.random.key(seed)) produces different float32 outputs on Apple M1 Max vs Apple M3 Ultra and Apple M5, given identical MLX version, identical Python, and identical inputs. mx.random.uniform and mx.random.split are unaffected — both produce bit-identical output across all three machines.

This breaks bit-exact reproducibility for any code that relies on mx.random.normal with a directly-constructed key, across Apple Silicon hardware generations.

Reproducer

import mlx.core as mx
import numpy as np
import hashlib

key = mx.random.key(0)
a = mx.random.normal(shape=(8,), key=key)
mx.eval(a)
print(hashlib.sha256(np.asarray(a, dtype=np.float32).tobytes()).hexdigest())

Observed output (MLX 0.31.1, Python 3.14.3-4, macOS 26)

Chip sha256 of normal(shape=(8,), key=mx.random.key(0))
Apple M5 a662607828c52cafea1f79d98f09bd81855784116acd23364a6130eb037d6a5e
Apple M3 Ultra a662607828c52cafea1f79d98f09bd81855784116acd23364a6130eb037d6a5e
Apple M1 Max 8a8c5a6737feea9eeee3c7056f4699250e6b8440db8930fd10495ada9117c05a

Same pattern for shape=(8,) key=mx.random.key(42) and shape=(16, 4) key=mx.random.key(0) — M1 Max always differs from {M5, M3 Ultra}.

Adjacent observations — narrowing the cause

The following calls produce bit-identical output across all three machines:

Probe M5 M3 Ultra M1 Max All match ?
mx.random.key(0) materialised yes
mx.random.split(mx.random.key(7)) (both halves) yes
mx.random.uniform(low=-1, high=1, shape=(8,), key=mx.random.key(0)) yes
mx.random.normal(shape=(4,), key=<split-child-of-key(7)>) yes
mx.random.normal(shape=(8,), key=mx.random.key(0)) no
mx.random.normal(shape=(8,), key=mx.random.key(42)) no
mx.random.normal(shape=(16, 4), key=mx.random.key(0)) no

So the divergence is specifically in mx.random.normal on Apple M1 Max when the key is the raw output of mx.random.key(seed). The key tensor itself is identical; mx.random.split returns identical halves; mx.random.uniform produces identical output with the same raw key. Only mx.random.normal with the raw key diverges, only on M1 Max.

It is plausible (but not confirmed) that mx.random.normal with a split-derived key matches across all three, but mx.random.normal with a directly-constructed key takes a different kernel path on M1 Max.

Environment

All three machines:

  • MLX 0.31.1
  • Python 3.14.3 (M5) / 3.14.4 (M3 Ultra, M1 Max)
  • macOS 26.x (Tahoe), arm64
  • Same uv lockfile, single MLX wheel

Why this matters (downstream)

We hit this in a research project (dream-of-kiki) that runs an R1 bit-exact reproducibility suite. After shipping new ops that call mx.random.normal with seeded keys, the hashes diverge between M1 Max and M5/M3 Ultra even at the same commit, breaking cross-machine equivalence that previously held for ops that didn't touch normal. Full milestone: docs/milestones/r1-cross-machine-m5-vs-m1-2026-05-20.md.

We've put a golden_hashes.json per-hardware-family workaround in place locally, but it would be much cleaner if mx.random.normal produced the same bytes across Apple Silicon generations.

Full repro script

A self-contained script that emits a JSON dump of 8 different mx.random.* probes (including the matching ones above as controls): [/tmp/mlx_rng_repro.py — paste here on request].

Question

  • Is this an intentional kernel divergence (different SIMD path / Philox internal variant per hardware family) or a regression worth fixing?
  • If intentional, is there a recommended way to opt into a portable code path for mx.random.normal?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions