A transmission error has a strictly contained, predictable blast radius. If a packet drops, the system knows exactly how to handle it: it throws a timeout, drops a connection, or asks for a retry. The worst-case scenario is known.
A reasoning error has an infinite, unpredictable blast radius. When an LLM hallucinates, it doesn't fail safely but it writes perfectly compiling code that does the wrong thing. That "wrong thing" might just render a button incorrectly, or it might silently delete your production database, or open a security backdoor.
You can build reliable abstractions over failures that are predictable and contained. You cannot abstract away unpredictable destruction.
In 2024, a Chevy dealership deployed an AI chatbot that confidently agreed to sell a customer a 2024 Chevy Tahoe for $1. It executed a catastrophic business failure simply because it didn't know the logic was wrong.
Sure, you can patch that specific case with guardrails, but how many unpredictable edge cases are you going to cover? It only takes a user with a bit of ingenuity to circumvent them. There are already several examples of AI agents getting stuck in infinite loops, burning through massive API bills while achieving absolutely nothing.
You can contain a system failure, but you cannot contain a logic failure if the system doesn't know the logic is wrong.
You literally can contain a logic failure. If I execute logic on my computer that’s not connected to the internet it can’t get out of the box. Done. Contained
This is intentionally written vaguely. How do you limit that these implementations ensure Program() runs and does the right thing when there is no guarantee Math() or its components are correct?
Normally you could use a typed programming language, unit tests, etc, but if LLM is the ultimate abstraction programs will be written line above. At some point traditional software engineering principles will need to apply.
Very few people are even beginning to understand the constraints of these systems, and none of them have yet been elevated to high enough positions of prominence to rise above the noise of all the hype. Give us some time man, jeez
A transmission error does not have a strictly contained blast radius.
A bad packet could tell a flying probe to fire all thrusters on and deplete its fuel in 15 minutes.
What makes a transmission error controlled is all the protection mechanisms on top of it. An LLM cannot delete a production database unless you give it access to do it.
My hot take is that many people are naturally more comfortable with deterministic systems that have clearly analyzable outcomes. Software engineering has historically primarily been oriented around deterministic systems and it has attracted that type of thinker.
But many of us, myself included, prefer chaotic systems where you can’t fully nail down every cause and effect. The challenge of building a prediction model on top of chaos is exhilarating. I really don’t find many people like me in SWE as in, say, the graphics design department.
To me, that’s the underlying threat here — LLMs are rewriting a field that has previously self selected a certain type of person and this, quite understandably, rubs them the wrong way.
Yes, but when all it takes to avoid this chaos is hiring someone with at least 5 or 10 years of experience for a reasonable wage, this entire perspective looks insane.
It's... just... not that hard to write code nor does it cost that much. There are millions of us working silently at places that aren't "big tech". We all shrugged our shoulders, took a sip of coffee, and went back to our Teams meetings where the only LLM usage is still just Copilot.
I don't need to be able to write proofs about my maths using logic and determinism. If the answer comes out in a way that I like then it has to be correct!
The comment you replied to made no statements about math or proofs. They made a statement about working in systems of non determinism effectively. Your statement seems to imply that this is dumb, as if working in a world of full determinism is an option.
when you do have the option of determinism, but intentionally eschew it in favour of a strictly inferior nondeterministic tool, then yes, it is kinda dumb.
What deterministic option are you referring to here? Humans certainly are not deterministic in how they interpret instructions and write code. If I asked you to implement a feature and a month later asked you to implement the exact same feature, you likely wouldn’t do it the same way again. Two different people certainly wouldn’t.
strictly inferior != bad. It's relative. One tool will still give the output i intended long after I'm dead and decomposed, with the other might not at the very next time i run it.
This is sorta how I've felt working the past ~7 years.
Simple example, we've been striving for 90% unit test coverage and thorough code review when there's 0% integration test coverage. I blame the metrics only looking at unit tests, but also many people think unit tests should come first. I would prioritize integration. There are some small pieces that need to work reliably, but if your system relies so hard on all of them working right, it's a bad system. That, and too many things will work in pieces but not overall.
Broadly I'm gonna assume that the team will later hire solid SWEs who don't necessarily know how our stuff works, and aren't going to read 100 docs about it. If this is a backend+DB combo, get your DB right and there won't be too many wrong ways to code against it in the future, get it wrong and it becomes a black hole for SWE-time. Or if someone on their first day can't run a system locally for debugging, no matter how elegant the code is, don't count on that system getting fixed quickly during an outage.
I mean if your talking about packets, your already one abstraction over the real data Transmission, in wich is noisy. So bits can randomly flip, noise could be interpreted as bits, and bits could get lost.
A much larger blast radius
The idea of their automated rollback infrastructure sounds good on paper, but at the end of the day, this still reads like a highly sophisticated machine for generating technical debt at lightspeed, mitigated only by an aggressive rollback system. You can't have an AI review code written by an AI and call it a security gate. A true security gate requires a human being who actually understands the context and who is actually accountable if the system breaks.
This article is spot on. I'm feeling the exact same way watching the industry aggressively promote the idea that it's safe to deploy unverified code just because an AI wrote the tests.
We are playing with fire. If we keep treating "I don't read the code I ship" as a feature rather than a liability, it's going to cause a massive, real-world disaster. The resulting regulation will be so heavy that software engineering will end up needing a Bar Council or Medical Board just to ship a basic feature. We're cheering for a trend that is going to regulate us into a corner.
But people code also cause real world disasters; most human programmers are terrible, never held accountable (they usually left a while ago), they cannot read/comprehend code (either) and cannot write tests (either). Only in a echo chamber like HN you can believe that the majority human programmers are any good / better than a 1bit 7B model ; they are not. Go out in the real world; most people are really really bad at what they do, including programmers.
Friendly reminder: There is no ghost in the machine. It is a system executing code, not a being having thoughts. Let’s admire the tool without projecting a personality onto it.
For me, that’s kind of the point. It’s similar to how the characters in a novel don’t really exist, and yet you can’t really discuss what happens in a novel without pretending that they do. It doesn’t really make sense to treat the author’s motivations and each character’s motivations as the same.
Similarly, we’re all talking to ghosts now, which aren’t real, and yet there is something there that we can talk about. There are obvious behavioral differences depending on what persona the LLM is generating text for.
I also like the hint of danger in “talking to ghosts.” It’s difficult to see how a rational adult could be in any danger from just talking, but I believe the news reports that some people who get too deep into it get “possessed.”
Consciousness is weird and nobody understands it. There is no good reason to assume that these systems have it. But there is also no good reason to rule it out.
usually either use Grok to optimize a mistral prompt, or you can use gemini to optimize a chatGPT prompt. It's best to keep those pairs of AIs and not cross streams!
Thanks, not being well versed in design I just picked a small color palette I liked and sticked to it
Approval was I think 2-3 days for Google (I had already validated the store page and opened it to preregistration a month before the final build) and a bit more than week for App Store due to some back and forth because of missing privacy policy links in some places of the app and stuff like that.
It does feel like planned obsolescence when companies like Apple limit software support for older hardware, Ubuntu run smoothly on much older devices. They could certainly do better by extending support and focusing on sustainability.
Exactly, token per dollar rates are useful, but without knowing the typical input output token distribution for each model on this specific task, the numbers alone don’t give a full picture of cost.
That’s how they lie to us. Companies can advertise cheap prices to lure you in but they know very well how many tokens you’re going to use on average so they will still make more profit than ever, especially if you’re using any kind of reasoning model which is just like a blank check for them to print money.
A reasoning error has an infinite, unpredictable blast radius. When an LLM hallucinates, it doesn't fail safely but it writes perfectly compiling code that does the wrong thing. That "wrong thing" might just render a button incorrectly, or it might silently delete your production database, or open a security backdoor.
You can build reliable abstractions over failures that are predictable and contained. You cannot abstract away unpredictable destruction.