talkie-MLX

From-scratch reimplementation in MLX (Apple) of the Talkie 13B LLM — a model trained only on English text from before 1931. The original required ≥28 GB of CUDA VRAM; this port runs on the GPU of any modern Apple-Silicon Mac (M1/M2/M3/M4) via Metal. On an M4 Max, the q4 reaches ~26 tok/s with ~8 GB resident and ~1.5 s first-token latency.

https://github.com/joseluissaorin/talkie-mlx