Hacker Newsnew | past | comments | ask | show | jobs | submit | distalx's commentslogin

A transmission error has a strictly contained, predictable blast radius. If a packet drops, the system knows exactly how to handle it: it throws a timeout, drops a connection, or asks for a retry. The worst-case scenario is known.

A reasoning error has an infinite, unpredictable blast radius. When an LLM hallucinates, it doesn't fail safely but it writes perfectly compiling code that does the wrong thing. That "wrong thing" might just render a button incorrectly, or it might silently delete your production database, or open a security backdoor.

You can build reliable abstractions over failures that are predictable and contained. You cannot abstract away unpredictable destruction.


> A reasoning error has an infinite, unpredictable blast radius.

Says who? It’s quite easy to limit the blast radius of a reasoning error.


In 2024, a Chevy dealership deployed an AI chatbot that confidently agreed to sell a customer a 2024 Chevy Tahoe for $1. It executed a catastrophic business failure simply because it didn't know the logic was wrong.

Sure, you can patch that specific case with guardrails, but how many unpredictable edge cases are you going to cover? It only takes a user with a bit of ingenuity to circumvent them. There are already several examples of AI agents getting stuck in infinite loops, burning through massive API bills while achieving absolutely nothing.

You can contain a system failure, but you cannot contain a logic failure if the system doesn't know the logic is wrong.


This would be more convincing if a single car had been exchanged for $1.

It didn't happen. Seems the bug was "contained".

Sort of undermines your point re "catastrophic business failure" don't you think?


You literally can contain a logic failure. If I execute logic on my computer that’s not connected to the internet it can’t get out of the box. Done. Contained


> but how many unpredictable edge cases are you going to cover?

This is the wrong question. The correct question is what specific subsets of cases do you allow, similar to any security question


How so?

Suppose you had:

Math() Add() Subtract()

Program() Math(“calculate rate”)

This is intentionally written vaguely. How do you limit that these implementations ensure Program() runs and does the right thing when there is no guarantee Math() or its components are correct?

Normally you could use a typed programming language, unit tests, etc, but if LLM is the ultimate abstraction programs will be written line above. At some point traditional software engineering principles will need to apply.


Very few people are even beginning to understand the constraints of these systems, and none of them have yet been elevated to high enough positions of prominence to rise above the noise of all the hype. Give us some time man, jeez


A transmission error does not have a strictly contained blast radius.

A bad packet could tell a flying probe to fire all thrusters on and deplete its fuel in 15 minutes.

What makes a transmission error controlled is all the protection mechanisms on top of it. An LLM cannot delete a production database unless you give it access to do it.

My hot take is that many people are naturally more comfortable with deterministic systems that have clearly analyzable outcomes. Software engineering has historically primarily been oriented around deterministic systems and it has attracted that type of thinker.

But many of us, myself included, prefer chaotic systems where you can’t fully nail down every cause and effect. The challenge of building a prediction model on top of chaos is exhilarating. I really don’t find many people like me in SWE as in, say, the graphics design department.

To me, that’s the underlying threat here — LLMs are rewriting a field that has previously self selected a certain type of person and this, quite understandably, rubs them the wrong way.


Yes, but when all it takes to avoid this chaos is hiring someone with at least 5 or 10 years of experience for a reasonable wage, this entire perspective looks insane.

It's... just... not that hard to write code nor does it cost that much. There are millions of us working silently at places that aren't "big tech". We all shrugged our shoulders, took a sip of coffee, and went back to our Teams meetings where the only LLM usage is still just Copilot.


I don't need to be able to write proofs about my maths using logic and determinism. If the answer comes out in a way that I like then it has to be correct!


This is vapid condescension.

The comment you replied to made no statements about math or proofs. They made a statement about working in systems of non determinism effectively. Your statement seems to imply that this is dumb, as if working in a world of full determinism is an option.


Thank you for "vapid condescension".

I've wanted a term for this for decades!


when you do have the option of determinism, but intentionally eschew it in favour of a strictly inferior nondeterministic tool, then yes, it is kinda dumb.


What deterministic option are you referring to here? Humans certainly are not deterministic in how they interpret instructions and write code. If I asked you to implement a feature and a month later asked you to implement the exact same feature, you likely wouldn’t do it the same way again. Two different people certainly wouldn’t.


When you cling to determinism and call a clearly useful and powerful tool “strictly inferior” I would say this misses the point.


strictly inferior != bad. It's relative. One tool will still give the output i intended long after I'm dead and decomposed, with the other might not at the very next time i run it.


This is sorta how I've felt working the past ~7 years.

Simple example, we've been striving for 90% unit test coverage and thorough code review when there's 0% integration test coverage. I blame the metrics only looking at unit tests, but also many people think unit tests should come first. I would prioritize integration. There are some small pieces that need to work reliably, but if your system relies so hard on all of them working right, it's a bad system. That, and too many things will work in pieces but not overall.

Broadly I'm gonna assume that the team will later hire solid SWEs who don't necessarily know how our stuff works, and aren't going to read 100 docs about it. If this is a backend+DB combo, get your DB right and there won't be too many wrong ways to code against it in the future, get it wrong and it becomes a black hole for SWE-time. Or if someone on their first day can't run a system locally for debugging, no matter how elegant the code is, don't count on that system getting fixed quickly during an outage.


Insightful.

Feels like this maps to the J/P of Myers Briggs


I mean if your talking about packets, your already one abstraction over the real data Transmission, in wich is noisy. So bits can randomly flip, noise could be interpreted as bits, and bits could get lost. A much larger blast radius


We put a supercomputer in a laptop just so the OS could struggle to draw a grid of icons. Peak modern engineering.


The idea of their automated rollback infrastructure sounds good on paper, but at the end of the day, this still reads like a highly sophisticated machine for generating technical debt at lightspeed, mitigated only by an aggressive rollback system. You can't have an AI review code written by an AI and call it a security gate. A true security gate requires a human being who actually understands the context and who is actually accountable if the system breaks.


No human has ever let through a major security vulnerability… right? Right?


This article is spot on. I'm feeling the exact same way watching the industry aggressively promote the idea that it's safe to deploy unverified code just because an AI wrote the tests.

We are playing with fire. If we keep treating "I don't read the code I ship" as a feature rather than a liability, it's going to cause a massive, real-world disaster. The resulting regulation will be so heavy that software engineering will end up needing a Bar Council or Medical Board just to ship a basic feature. We're cheering for a trend that is going to regulate us into a corner.


But people code also cause real world disasters; most human programmers are terrible, never held accountable (they usually left a while ago), they cannot read/comprehend code (either) and cannot write tests (either). Only in a echo chamber like HN you can believe that the majority human programmers are any good / better than a 1bit 7B model ; they are not. Go out in the real world; most people are really really bad at what they do, including programmers.


AI is coded by people.

We have not reached the state in which AI creates AI.


A few very smart, highest paid people in the world yes. The rest... Well... I did not use All quantifiers in the original post.


This is either going to save hours… or create very educational outages.


If the agent would be able to update the model that would be educational for the model, noone else.


Friendly reminder: There is no ghost in the machine. It is a system executing code, not a being having thoughts. Let’s admire the tool without projecting a personality onto it.


For me, that’s kind of the point. It’s similar to how the characters in a novel don’t really exist, and yet you can’t really discuss what happens in a novel without pretending that they do. It doesn’t really make sense to treat the author’s motivations and each character’s motivations as the same.

Similarly, we’re all talking to ghosts now, which aren’t real, and yet there is something there that we can talk about. There are obvious behavioral differences depending on what persona the LLM is generating text for.

I also like the hint of danger in “talking to ghosts.” It’s difficult to see how a rational adult could be in any danger from just talking, but I believe the news reports that some people who get too deep into it get “possessed.”


Consciousness is weird and nobody understands it. There is no good reason to assume that these systems have it. But there is also no good reason to rule it out.


That’s the old way of thinking about it. there is a new way.


You sound as if you have grounds for certainty about this. What are they?


What tools or process do you use to optimize your prompts?


usually either use Grok to optimize a mistral prompt, or you can use gemini to optimize a chatGPT prompt. It's best to keep those pairs of AIs and not cross streams!


This looks great! I'm not a guitar enthusiast myself, but the design and color tone look very slick.

Congratulations on the launch after a year of work, and I wish you all the best with it!

Just out of curiosity, how much time did it take you to get app store approval from Apple and Google in 2025?


Thanks, not being well versed in design I just picked a small color palette I liked and sticked to it

Approval was I think 2-3 days for Google (I had already validated the store page and opened it to preregistration a month before the final build) and a bit more than week for App Store due to some back and forth because of missing privacy policy links in some places of the app and stuff like that.


It does feel like planned obsolescence when companies like Apple limit software support for older hardware, Ubuntu run smoothly on much older devices. They could certainly do better by extending support and focusing on sustainability.


Exactly, token per dollar rates are useful, but without knowing the typical input output token distribution for each model on this specific task, the numbers alone don’t give a full picture of cost.


That’s how they lie to us. Companies can advertise cheap prices to lure you in but they know very well how many tokens you’re going to use on average so they will still make more profit than ever, especially if you’re using any kind of reasoning model which is just like a blank check for them to print money.


I don’t think any of them are profitable are they? We’re in the losing money to gain market share phase of this industry.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: