Gemma 3 270m 4-bit DWQ is up. Same speed, same memory, much better quality:
Awni Hannun
Awni Hannun15 ago, 02:01
Gemma 3 270m 4-bit generates text at over 650 (!) tok/sec on an M4 Max with mlx-lm and uses < 200MB: Not sped up:
28,65K