Quantization Explained: Q4 vs Q8 vs FP16
Lower-bit quantization improves memory fit and speed but may reduce quality on complex reasoning tasks.
For most local setups, Q4 or Q5 are practical defaults, while Q8/FP16 are better when memory headroom allows.
Try recommendations in the optimizer and cross-check candidates in the model comparison chart.