Skip to content

[BUG] Metal GPU crash (SIGABRT) on Apple M4 Max with mlx_lm.server running Qwen3.5-27B-4bit #3564

@halubzbm-creator

Description

@halubzbm-creator

Environment

  • Hardware: Apple M4 Max (Mac16,9), 64GB unified memory
  • OS: macOS 26.4.1 (Sequoia)
  • Python: 3.11.15 (Homebrew)
  • mlx-lm: latest (installed via pip)
  • mlx: latest (bundled with mlx-lm)
  • Model: mlx-community/Qwen3.5-27B-4bit

Description

mlx_lm.server crashes intermittently during inference with a Metal GPU error. The crash occurs during quantized matrix multiplication (QuantizedMatmul::eval_gpu) when the Metal allocator attempts to allocate GPU memory.

The server runs successfully for several requests before crashing, suggesting a gradual memory pressure issue rather than an immediate failure.

Crash Signature

Triggered by Thread: com.Metal.CompletionQueueDispatch
Exception Type: EXC_CRASH (SIGABRT)

Thread (crashed):
libmlx.dylib mlx::core::metal::MetalAllocator::malloc(unsigned long)
libmlx.dylib mlx::core::QuantizedMatmul::eval_gpu(...)
libmlx.dylib mlx::core::gpu::eval(mlx::core::array&)
libmlx.dylib mlx::core::eval_impl(...)

Also observed:

libmlx.dylib mlx::core::gpu::check_error(MTL::CommandBuffer*)

Steps to Reproduce

  1. Start server: mlx_lm.server --model mlx-community/Qwen3.5-27B-4bit --port 8080
  2. Send multiple chat completion requests via OpenAI-compatible API
  3. Server crashes after several requests (timing varies, sometimes within minutes)

Additional Notes

  • Tried MLX_GPU_MEMORY_LIMIT=48 — does not prevent the crash
  • The crash happens during QuantizedMatmul::eval_gpu, specifically in Metal's resource residency set update (IOGPUResourceGroupUpdateResources)
  • Ollama running the same model on the same hardware does not crash (but is slower)
  • A watchdog script with auto-restart serves as a workaround

Expected Behavior

Server should handle sustained inference workloads without crashing.

Actual Behavior

Server crashes with SIGABRT triggered by Metal GPU command buffer error after variable number of requests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions