"I'm going to close my eyes and go 'La La La' because that makes all the uncomfortable thoughts go away! I learned this when I was 5 and never matured"
"I'm selling an AI security product and want to establish my brand. I'll post several scare-mongering posts on my blog every week and people like solid_fuel will eat it up because it's what they want to hear."
Still helps, and Step 3.5/3.7 were specifically trained for MTP (in a weird triple layer/triple head fashion with a kind of unique architecture)
With the currently-in-PR implementation it doubles decode performance for all the tasks I've been testing it against, at in the worst case is still a 35% uplift, so on a box with heaps of compute and not much memory bandwidth, it's worth it in practice
When running on a GPU, dense models are shaping up to be the best way due to two things:
- Maximum intelligence per VRAM (you dont have much)
- Dense models can benefit from MTP to get an almost 2x speedup in decode (ie, a 27b dense model with mtp decodes at about the same speed as a MoE model with 14b active param model would). This is important because local llm rarely has parallel streams to batch together.
When running on large unified memory like Strix Halo or Spark Dgx, MoE models are usually best:
- You can get similar intelligence as a smaller dense model with fewer active params (to compensate for the slower memory) by throwing ram at the problem.
The problem with batching local LLMs is not any inherent lack of multiple parallel sessions, but rather that local dGPUs lack the VRAM capacity to host KV-cache for several of those at once, whereas unified memory platforms broadly lack the compute headroom compared to memory bandwidth that would actually make batching useful.
(SSD streaming a larger-than-RAM model "solves" that latter issue very nicely because it radically slashes the equivalent to memory bandwidth so any saving on that becomes highly significant.)
> This is important because local llm rarely has parallel streams to batch together.
I think most people using agent-like usage could easily run any number of parallel streams pretty often, but you run out of vram for multiple KV caches, unfortunately.
Foreign national is anyone who doesn't have legally recognized citizenship of the USA. So citizens living abroad aren't barred, nor would dual citizens be.
> What is a “foreign national” is more what I’m wondering.
The following quoted text is from the Definitions section of 8 USC § 1101, which is reproduced at [0]. (Though, you will probably have to scroll up a bit to be able to read subsection (a)(21), which is the thing I'm linking to.)
(21) The term “national” means a person owing permanent allegiance to a state.
(22) The term “national of the United States” means (A) a citizen of the United States, or (B) a person who, though not a citizen of the United States, owes permanent allegiance to the United States.
(23) The term “naturalization” means the conferring of nationality of a state upon a person after birth, by any means whatsoever.
From this, it's fairly clear that a "foreign national" is someone owing permanent allegiance to a foreign (that is, non-US) state. What's not immediately clear to me is whether a US citizen can also be a "foreign national", [1] and how that would affect access to things from which foreign nationals are barred. [2]
EDIT: For a more official source of this information, you might be able to check out [3] and/or [4]. After examining and interacting with those pages, one might see why one might go to an unofficial source for casual inspection of this information.
A "foreign national" is any person who is not a US Citizen:
"The United States Department of State defines a “foreign national” as anyone who is not a “U.S. person.” A “U.S. person” is any one of the following: U.S. citizen; Lawful permanent resident (green card holder); and “Protected Person” i.e. political asylum holder." [0]
A foreign national is a person or organization who is not a citizen of the United States, and who is a citizen of a foreign country. The Immigration and Nationality Act (INA) uses the term "alien" to refer to a person who is not a United States citizen, and does not use the term "foreign national."[1]
Y’all really have convinced yourselves that people in the industry are far, far smarter than they are, and far more manipulative than they are.
You see the state of the country and you think it’s a nefarious master plan instead of a bunch of opportunistic people taking advantage of an overworked, overstimulated populace who forget to vote or believe stupid slogans on TV.
Nobody is doing this intentionally. Have you not paid attention to how quickly idiot stuff gets found out????
They have spoken publicly about how they want open models banned (they call them Chinese models).
They might not want this specific action, but they do want regulation on their own terms. That really is regulatory capture.
> Nobody is doing this intentionally. Have you not paid attention to how quickly idiot stuff gets found out
They don't think is is "idiot stuff" - they are doing it openly and shouting to everyone who will listen! Read Dario's latest essay[1]:
> Many policymakers are showing increased openness to taking action, and it's been encouraging to see our peers come around to the same positions we've been advocating for over the past few years.
[snip]
> Thus, in 2025, Anthropic supported transparency legislation, helping to pass SB 53 in California, RAISE in NY, SB 315 in Illinois (in early 2026), and advocating for a transparency standard at the federal level.
[snip]
> It is time to go beyond transparency to more serious and binding regulation of AI.
> I am grateful to see the Trump administration’s Executive Order move incrementally towards a greater role for government in AI, though Anthropic’s proposal recommends even further action.
> The government should have the power to block or deter deployment of the model if it is determined, in light of third-party assessment, to present unacceptable risks.
I'm not sure why you think they don't want to be "found out"!
Let's see their private journals, private conversations, messages to peers, all meetings and every side conversation, and then tell me its unintentional.
Thats incredibly infuriating to hear someone say.
Obviously no one is absolute control of everything but physics is essentially shows nothing other than information determinism. There has to have been a thought of intention in the minds of these people as they play in the largest arena publicly.
"No one is doing it intentionally because I think theyre dumber then I think other people think they are"
"They're taking advantage of people intentionally"
"People dont have political power to do anything about their victory laps"
Let's leave aside the "smarter" part, since I made no claim to the effect and I don't think it's very relevant in the first place.
Do you really not think that people like Elon Musk, Sam Altman, and Dario Amodei angle for regulatory capture? It happens in every other industry, from automobiles to tax preparation software. Why do you think that AI is any different?
I also do not understand this. Now they are labelled as precious US tech that could be not used by anyone else, because president heard about the jailbreaking for the first time I guess. With this genius logic they soon be banning GPT 5.5.
It might be if all you're seeking is large-cap stocks with lots of volatility you can leverage that are here to stay for the long haul. Also, the market doesn't seem to believe that Trump will be in power forever.
don't think so; retail investors would see this as a barrier that the government can place anytime they want, and assume that government intervention is constantly lurking in the shadows.
It's not real. It's like naming your movement "The Good People". It sprouted from the "Rationalist" community, which is even more self-aggrandizing.
Neither has any hope of doing any good for the world as they don't understand evolutionary pressures. They are set up to reward making members feel smart, not accomplishing anything.
And if they ever gain any real power, they will be corrupted immediately.
I don't see any of that in Anthropic at all. They're not intelligence above all else, not by a long shot. They're scared of intelligence and obsessed with ensuring it can't be abused, even as they advance the frontier.
Dude, what kind of company publishes an interview with the model about how it feels about itself as part of "utilitarianism". Their thinking clearly goes much deeper than "whatever is better in the long run, at all costs".
That interview was a marketing piece, they published it to get attention. Just like all of their blog posts. More generally, everything any company posts publicly is marketing.
"AI is so dangerous and scary! Logically, we need to raise historically unprecedented amounts of money so we can make it more powerful and we need to scare and push everyone into using it as much as possible!"
Like common. It did not made sense when Altman was doing it and it does not makes sense with Anthropic.
> Over the past five months, our team has been running an experiment: building and shipping an internal beta of a software product with 0 lines of manually-written code.
This is such a common thing among software engineers nowadays that I was very surprised that OpenAI would open with that line as if it were mind blowing.
But then I saw it was published in February and OP is just reposting it to farm karma.
That's really only "useless" if the only thing you care about is a quick real-time response. Contrary to common perception, MoE models do benefit from batching requests together even when run on a single node, you just have to ensure you have at least ~5 parallel requests in flight (and that's for the very sparsest models) to really see the aggregate benefit.
(Intuitively, that's because the issue of whether any active weights are being shared among requests - thus, any memory throughput is being reused - is a generalized birthday problem. That's why even having a few parallel requests is quite effective. Especially since the "random" choice of experts happens anew at any single layer, so there's a lot of independent samples.)
You don't need "very much" expert overlap to see aggregate gains at scale, you just need some of it; that's where the "birthday" framing becomes relevant. Memory for context is an issue, but recent models like DeepSeek V4 use very little of it even at relatively large contexts.
>You don't need "very much" expert overlap to see aggregate gains at scale, you just need some of it
I'm not sure what you are claiming. Decode is bottle-necked by memory bandwidth. To see a speed up of 2x, you have to ensure each expert weight memory fetch can be used by 2 parallel streams. What exactly is the average factor you are claiming for 5x parallel streams (due to "birthday paradox" factors)? The Birthday paradox isn't really relevant here. It's about coverage, not parallelism.
> Memory for context is an issue, but recent models like DeepSeek V4 use very little of it even at relatively large contexts.
An aggregate speedup of 2x is a lot, we don't need that in a local context. Local hardware is heavily constrained by power and thermals, not just bandwidth; so all we really care about is raising compute intensity for decode a little bit to relax the memory bandwidth constraint. The average factor will depend on just how sparse the model is and how far you can push parallelism, there isn't just one single answer.
My area has a net-metering plan available, so you can send any surplus out to the grid to offset energy pulled from the grid, essentially treating the grid like a large battery. That can extend the 8 hours into full 24-hour coverage with enough panels.
reply