$12,000 for the base model is insane. I have an Apple M3 Max with 128GB RAM that can run 120B parameter models using like 80 watts of electricity at about 15-20 tokens/sec. It's not amazing for 120B parameter models but it's also not 12 grand.
It is very comparable if you work out the $/tok/s on inference. I did some napkin math and it looks like you’re getting roughly 3x the performance for 3x the cost. Red v2 vs Mac Studio M3 Ultra 96GB.
If you compare tokens/kWh efficiency then my math has Mac Studio being about 1.5x more efficient.
M3 has tolerable decode performance for the price, and that's what people would care about most of the time. they underperform severely wrt. prefill, but that's a fraction of the workload. AI, even agentic AI, spends most of its time outputing tokens, not processing context in bulk.
it's for fools. i bought 160gb of vram for $1000 last year. 96gb of p40 VRAM can be had for under $1000. And it will run gpt-oss-120b Q8 at probably 30tk/sec
P40 is Tesla architecture which is no longer receiving driver or CUDA updates. And only available as used hardware. Fine for hobbyists, startups, and home labs, but there is likely a growing market of businesses too large to depend on used gear from ebay, but too small for a full rack solution from Nvidia. Seems like that's who they're targeting.
I suppose if I rent a cloud GPU and just let it sit there dark and do nothing then I wouldn't have to move any data to it. Otherwise, I'm uploading some kind of work for it to do. And that usually involves some data to operate on. Even if it's just prompts.
So you also believe when you rent a server you are sharing your data with the cloud? AWS and GCP are copying all private data on servers? Give me a break. There's a big difference between renting a server and using an API.
> So you also believe when you rent a server you are sharing your data with the cloud [hosting provider]?
Only if you upload your data to that cloud server you rented. Then, by definition, you are.
> AWS and GCP are copying all private data on servers?
Every computer copies data when moving it. Several times, in fact. Through network card buffers, switches, system memory, disk caches, and finally to some form of semi-permanent storage.
I don't have to think Amazon is stealing my data to be aware that Amazon S3 buckets containing privileged information are routinely found open. I don't have to think that Google is spying on me to know that operating equipment my business owns on prem and does not share requires me to trust fewer people and less complex systems than doing the same work from the cloud.
You are very quick to make foolish assumptions and assign them to others.