[BUG] Metal GPU crash (SIGABRT) on Apple M4 Max with mlx_lm.server running Qwen3.5-27B-4bit


### Environment
- **Hardware**: Apple M4 Max (Mac16,9), 64GB unified memory
- **OS**: macOS 26.4.1 (Sequoia)
- **Python**: 3.11.15 (Homebrew)
- **mlx-lm**: latest (installed via pip)
- **mlx**: latest (bundled with mlx-lm)
- **Model**: mlx-community/Qwen3.5-27B-4bit

### Description

`mlx_lm.server` crashes intermittently during inference with a Metal GPU error. The crash occurs during quantized matrix multiplication (`QuantizedMatmul::eval_gpu`) when the Metal allocator attempts to allocate GPU memory.

The server runs successfully for several requests before crashing, suggesting a gradual memory pressure issue rather than an immediate failure.

### Crash Signature




Triggered by Thread: com.Metal.CompletionQueueDispatch
Exception Type: EXC_CRASH (SIGABRT)

Thread (crashed):
libmlx.dylib  mlx::core::metal::MetalAllocator::malloc(unsigned long)
libmlx.dylib  mlx::core::QuantizedMatmul::eval_gpu(...)
libmlx.dylib  mlx::core::gpu::eval(mlx::core::array&)
libmlx.dylib  mlx::core::eval_impl(...)

Also observed:



libmlx.dylib  mlx::core::gpu::check_error(MTL::CommandBuffer*)

### Steps to Reproduce
1. Start server: `mlx_lm.server --model mlx-community/Qwen3.5-27B-4bit --port 8080`
2. Send multiple chat completion requests via OpenAI-compatible API
3. Server crashes after several requests (timing varies, sometimes within minutes)
### Additional Notes
- Tried `MLX_GPU_MEMORY_LIMIT=48` — does not prevent the crash
- The crash happens during `QuantizedMatmul::eval_gpu`, specifically in Metal's resource residency set update (`IOGPUResourceGroupUpdateResources`)
- Ollama running the same model on the same hardware does not crash (but is slower)
- A watchdog script with auto-restart serves as a workaround
### Expected Behavior
Server should handle sustained inference workloads without crashing.
### Actual Behavior
Server crashes with SIGABRT triggered by Metal GPU command buffer error after variable number of requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Metal GPU crash (SIGABRT) on Apple M4 Max with mlx_lm.server running Qwen3.5-27B-4bit #3564

Environment

Description

Crash Signature

Steps to Reproduce

Additional Notes

Expected Behavior

Actual Behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Metal GPU crash (SIGABRT) on Apple M4 Max with mlx_lm.server running Qwen3.5-27B-4bit #3564

Description

Environment

Description

Crash Signature

Steps to Reproduce

Additional Notes

Expected Behavior

Actual Behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions