What should I use: big model-small quant or small model-no quant?

Smorty [she/her]@lemmy.blahaj.zone · edit-2 2 months ago

What should I use: big model-small quant or small model-no quant?

Smorty [she/her]@lemmy.blahaj.zone · 2 months ago

Is this VPTQ similar to that 1.58Q I’ve heard about? Where they quantized the Llama 8B down to just 1.5 Bits and it somehow still was rather comprehensive?

brucethemoose@lemmy.world · 2 months ago

No, from what I’ve seen it falls off below 4bpw (just less slowly than other models) and makes ~2.25 bit quants somewhat usable instead of totally impractical, largely like AQLM.

You are thinking of bitnet, which (so far, though not after many tries) requires models to be trained from scratch that way to be effective.