• 0 Posts
  • 2 Comments
Joined 10 months ago
cake
Cake day: March 22nd, 2024

help-circle
  • Late to this post, but shoot for and AMD Strix Halo or Nvidia Digits mini PC.

    Prompt processing is just too slow on Apple, and the Nvidia/AMD backends are so much faster with long context.

    Otherwise, your only sane option for 128K context in a server with a bunch of big GPUs.

    Also… what model are you trying to use? You can fit Qwen coder 32B with like 70K context on a single 3090, but honestly its not good above 32K tokens anyway.