minus-squaretehnomad@lemm.eetoSelfhosted@lemmy.world•Using Mac M2 Ultra 192GB to Self-Host LLMs?linkfedilinkEnglisharrow-up1·13 hours agoThe context cache doesn’t take up too much memory compared to the model. The main benefit of having a lot of VRAM is that you can run larger models. I think you’re better off buying a 24 GB Nvidia card from a cost and performance standpoint. linkfedilink
The context cache doesn’t take up too much memory compared to the model. The main benefit of having a lot of VRAM is that you can run larger models. I think you’re better off buying a 24 GB Nvidia card from a cost and performance standpoint.