Hacker Newsnew | past | comments | ask | show | jobs | submit | knivets's commentslogin

He has early access to anthropic models, of course he will hype them up, so that they will keep sharing access to preview models with him (and more traffic to his website). It also does't require him to perform any rigorous analysis of model performance, just share how it feels:

> But it's all vibes, if you want a more scientific comparison you'll have to look elsewhere.


> Software engineering. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

How was it measured? How was the output of this magnitude verified over a period of couple of days?


They just went by gut feeling. Classic snake oil marketing haha. No real data to back things up, just let some famous people say they feel better when using it.

I'm a little skeptical of claims like this that involve migrating things like libraries, etc. I've done big refactors like this multiple times (albeit, in an "only" 500k-1m LOC codebase) with less powerful models and it is usually just 99% the same edits, with 1% requiring a close human eye to resolve a particularly painful breaking change.

EDIT: to be clear, it's still quite a helpful thing in terms of time saved, I just don't think it's necessarily the best indication of value-added from making models smarter when cases like this can often be handled by well-directed swarms of smaller ones.


You should probably use software to do such large transformations (especially in dynamic languages). In Python LibCST is available, not sure what exists for Ruby.

> Over the past six months, there hasn’t been a single day where I’ve checked the HN Best RSS feed without seeing a post about how AI “writes bad code,” “introduces bugs,” “creates technical debt,” or something along those lines.

because it's true

> Users don’t care whether the code was written by AI or by hand, or which framework you used. They care that the product works.

How can you guarantee that it works though? You can verify, but it would be at the same speed as before the AI, or even slower.

> By then, enough real-world feedback would have surfaced to identify the major issues, and tools like Claude Code would make it possible to fix and ship version 2.0 at an incredible pace.

By then you have a blackbox of a codebase which is unmaintainable, or in a worst case scenario you end up losing your data or get hacked or both.


Boggles the mind that people can't see this. You gotta keep the guardrails pretty tight to succeed with LLMs and code in my experience.

>because it's true

I mean, it's just as true that most teams “write bad code,” “introduce bugs,” “create technical debt,” or something along those lines. There is this amazing thing that happens where when we talk about AI code, it's always compared against some idealized infinitely capable professional following every best practice, purposefully and carefully crafting every line.

Which is fantasy. Most of you are creating absolute slop. That's just truth. Clutch your pearls and say it ain't true, but there's this weird reality that when people inherit other people's code, it's declared 100% slop, almost all of the time. It's the reason we are constantly tossing projects in the bin and starting anew.

AI is absolutely imperfect, just like teams of human developers.

>By then you have a blackbox of a codebase which is unmaintainable

Have you ever used Claude Code to work on a project? In what reality is the code unmaintainable? How is it a "blackbox"? Are you getting it to write LLVM IR or something?

In my experience, and with careful guidance and oversight, it makes spectacularly maintainable code. Better code than any human developer I have ever worked with. It helps to occasionally do cycles of refactoring as you've built out the foundation and the core becomes more evident and clear.

>or in a worst case scenario you end up losing your data or get hacked or both

This one is amazing. The industry is awash with garbage, insecure, exploited code. We were making that garbage code by the data centre full long before LLMs joined the scene.


Yes, there have always been bad programmers. The only difference is that now thanks to AI anybody can be a bad programmer. You've got people out there contributing sloppy code like only a bad "10x engineer" could do before. Good code is still hard to write, and from what I have seen in 3 companies so far, the people who write good code with AI are pretty much the same ones that were writing good code before AI.

> AI is absolutely imperfect, just like teams of human developers.

The old "nothing's perfect therefore everything is equally imperfect" fallacy. It's not a binary. While everything is flawed, somethings are more flawed than others. Welcome to the world, it's complex.

> when we talk about AI code, it's always compared against some idealized

Have you ever seen a development mailing list? Seems like when human code is scrutinized it's held to a high idealized standard. "Technical debt" is a concept that originated from looking at human code. How then can it be true that applying it to AI code is setting a higher standard? It's setting the same standard. These things existed and were applied to human code before AI exploded.

The whole "many eyes" thing can be quite brutal. We don't always get the many eyes but when we do... wars are waged. It's brutal out there for anything getting scrutiny. Currently, AI code getting a lot of eyeballs. That's a good thing. Don't wish it away just because you're butthurt about your new pet tech being held to a standard. That's how it gets better.

The "problem" of AI being held to a supposed higher standard isn't a problem. It's a free pony.


What this looks like it practice: enterprise SaaS vendor introduces bug, I report the bug, get a canned AI response because literally nobody cares, and it never gets fixed. The product continues to deteriorate.

No, it’s not just one vendor.


This is not a serious analysis. No mention of open source LLMs and their impact on american AI companies. There’s also no evidence that LLMs can make significant scientific progress on their own.


It's a weird way to arrive at an important idea in economics: simply maximizing productivity is not good for society if the surplus of that productivity is entirely captured by a small group of people.


Some big names in AI made predictions by pulling random dates based on vibes, the author collected this and called this data.


how long until i stop seeing this nonsense shoveled at me from every direction


Stack Overflow provides marginal value when there are LLMs. Great technical books on the other hand still provide tremendous value and complement LLMs in a learning process: a book provides a fact checked, curated set of topics with clear start and finish (a structure) and LLMs help with any blockers or missing context that readers will encounter.


This is astrology for devs.


as someone who is about as llm-forward as anyone out there, this is a brilliant analogy. was equally true of all the “prompt engineer” hype as well from a couple years ago (which i admit i still think does matter)… it kinda makes me feel like an audiophile / hi-fi person talking about how 24bit/192kHz is the one true encoding format and anything less is a willfull (cynical, “Quality”-hating, satisficerist, etc.) compromise. which i freely admit to being one of those people as well.

and in both cases i both “know” that i can tell the difference and “know that i cannot tell the difference”. what anyone takes from that in terms of what it says about me, personally, is a bit of a Rorschack test, but Astrology is about as apt a description as there is… xD


For higher than audible frequency sample rates there's a good chance you can tell the difference. It often causes weird aliasing and harmonics in the more audible frequencies on "real" playback equipment. You can train yourself to recognize some of these and often pretty accurately identify the higher sample rate examples. You might even mentally associate those signs with "Higher Quality".

But it's arguably less accurate to the original recording.


People though asking LLM to output the reasoning steps was astrology until it's standardized and made ubiquitous.


Didn't multiple studies find the reasoning traces didn't have much to do with the final output? And even that outputting placeholder tokens during reasoning has a similar beneficial effect on benchmark scores?

(I don't think that's the full picture but, there's definitely something fishy going on there.)


reasoning itself just affords the model a ton of extra forward passes / "time to think"

the, como se dice, "misalignment" between the content of reasoning tokens and the actual output following the end of the reasoning is a separate problem, extensively studied by e.g. Anthropic


Do they have a golden calf to dance around? Without that success will be hit and miss.


i mean, maybe the golden calf people were right the whole time lol


Right about what?


Unless you can somehow provide some arguments against it, I feel like you're the one who is trying to cargo-cult stuff here.

Say what you will with proper reasoning or arguments if you feel compelled, tired reddit-commentary like that helps no one.


> Unless you can somehow provide some arguments against it,

We're year 4 into this discussion and camps have only gotten more bifrucated. There's no 1-1 discussion to have about this as of now, at least not before the crash.

Your only hope in such discourse is not trying to convince the other party how wrong they are, but appealing to an as of yet undecided party. Be it with reason, or simply pointing out how absurd some comments sound to the average person.


> Your only hope in such discourse is not trying to convince the other party how wrong they are

I don't care about convincing anyone, the ones I reply to or others, but if you take the time to leave a comment, at least make it something to read and think about instead of soundbites like "This is astrology for devs", it's plain boring to read and makes HN worse.


>I don't care about convincing anyone

That's fine. Others will care for you.

>it's plain boring to read and makes HN worse.

I chuckled at the joke. Surprising amount of layers to it.

Though I never strove to be a comic nor writer, that kind of terse, compact punch makes me envy those of such literary talent.


> I chuckled at the joke. Surprising amount of layers to it.

What joke?


i legitimately cannot divine what you are saying at all with this. there are so many dangling antecedents and modifiers that it is completely impossible. and i say this out of a genuine desire to understand what your argument is, knowing full well that i likely disagree with it.


Alright, let me explain, hopefully simpler: GP made told us their experience with working with LLMs, and some pointers to what they found to be working. The comment I replied to just says "This is astrology for devs" which basically is a cheap putdown without any reasoning nor arguments for why the commentator believes so. My comment is urging them to actually participate in the discussion, not just post their soundbite they thought of in five seconds, so HN as a whole can remain good instead of devolving into reddit (which is a tale as old as HN, I know).

Hopefully it's understandable now, and hopefully you don't disagree :)


https://news.ycombinator.com/newsguidelines.html

> Please don't post comments saying that HN is turning into Reddit. It's a semi-noob illusion, as old as the hills


Indeed, with the corollary of, please don't write Reddit-tier comments on HN either, then one wouldn't have to say it's turning into Reddit.


Two wrongs don't make a right, as they teach us as children.


Awesome, you did understand the reference I made, I was afraid I was too sneaky about it but seems it was just clear enough :)


Of course! That point in the guidelines has links to some prior art in this vein. Highly recommend it for you.

And please, do better next time!


Whenever I make joke reference to the guidelines, I do promise I'll attach a link to them, just to make it extra clear, thanks! :)


Thanks for caring and being respectful.


You can't be serious. It couldn't be more obvious what the poster was referring to, a drive by put-down comment with no attempt to discuss anything seriously is more highly upvoted than an objection to such a comment.

What is this place for? Dang tells us, curious discussion. The guidelines explicitly state that certain comments are not in the spirit.

But the community seems to have decided otherwise, which is a shame.


Don't read too much into it, downvotes/upvotes are highly random here, saying the same thing twice will have different reactions depending on the time of day and the topic of the submission, seems certain crowds are drawn to certain topics, which isn't that surprising.

I don't mind the downvotes, the points aren't really the reason I'm here anyways, I just want fun and interesting discussions with people and read other's perspectives, the points don't hinder that :)


Bot account - 70 days old, no submissions, all comments are hyping AI


> easier to build your own solution than to install an existing one

seriously?


In Emacs-land. Obviously clicking a button on the app store is easier than describing to an agent precisely the application that will solve your problem. But Emacs doesn't work this way. There's a whole subthread next to you that got all confused about this and started challenging Dan to like a WhatsApp duel or something; they've all missed this point completely.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: