@lily33

lily33@lemm.ee · 6 days ago

I don’t think any kind of “poisoning” actually works. It’s well known by now that data quality is more important than data quantity, so nobody just feeds training data in indiscriminately. At best it would hamper some FOSS AI researchers that don’t have the resources to curate a dataset.

lily33@lemm.ee · edit-2 12 days ago

What makes these consumer-oriented models different is that that rather than being trained on raw data, they are trained on synthetic data from pre-existing models. That’s what the “Qwen” or “Llama” parts mean in the name. The 7B model is trained on synthetic data produced by Qwen, so it is effectively a compressed version of Qen. However, neither Qwen nor Llama can “reason,” they do not have an internal monologue.

You got that backwards. They’re other models - qwen or llama - fine-tuned on synthetic data generated by Deepseek-R1. Specifically, reasoning data, so that they can learn some of its reasoning ability.

But the base model - and so the base capability there - is that of the corresponding qwen or llama model. Calling them “Deepseek-R1-something” doesn’t change what they fundamentally are, it’s just marketing.

lily33@lemm.ee · 13 days ago

There are already other providers like Deepinfra offering DeepSeek. So while the the average person (like me) couldn’t run it themselves, they do have alternative options.

lily33@lemm.ee · 13 days ago

A server grade CPU with a lot of RAM and memory bandwidth would work reasonable well, and cost “only” ~$10k rather than 100k+…

lily33@lemm.ee · 13 days ago

To be fair, most people can’t actually self-host Deepseek, but there already are other providers offering API access to it.