Hacker Newsnew | past | comments | ask | show | jobs | submit | scrollaway's commentslogin

> Once built, what does one do with this?

Are you asking about the church or the lego set?


Agreed. The noise in tech circles often gets founders to conflate ten different things into a product that no longer makes sense. “Eu made alternative to Kagi”? Cool, we need European search engines, sign me up. “Privacy is such a priority we’re looking to accept cash by mail”? Okay, you’re never gonna build a serious competitor, never mind.

Yeah Mullvad that accept cash in the post are not a serious VPN provider at all, right.

You seem to not understand that a search engine and a VPN don't have the same audience, and certainly not the same needs for focus.

It's ok, we all have our flaws.


They are very comparable from a privacy standpoint IMO. A search engine and a VPN both get quite a lot of insight in your interests and browsing habits because a lot of browsing sessions start with a search.

> why can’t Mythos just fix all these issues itself if it’s so smart. And test them to make sure they work?

“Why”: because you didn’t ask it. It’s not its job in this case.

You don’t hire an accountant and tell them “why can’t you fix my cash-flow problems and make me money if you’re so smart”


Ah ok, sure. The difference being the model should know how to do both based on what I’ve been told.

So why didn’t Anthropic ask it for me?


Being charitable to Larry Ellison is one of those things one cannot physically do, like being entertaining to a dead whale.

Europe.

We fund science, research and we have accelerated programs for researchers affected by these kinds of things.

If you're interested, email me (see profile). I have been helping Americans emigrate to Europe (for free) for several years.


The cost improvements reached you, you just don't see them in the table quality.

You see them in the fact that every single home you'll visit to buy or rent has a fully equipped kitchen including a fridge, oven, likely a microwave, dishwasher and even a washing machine (which alone has a huge economic impact: https://www.youtube.com/watch?v=_gvsz_vc7B0)

You see it in the fact that your home is safer from fires than it ever has been. That hot water is a cheap passive thing you don't even think about, rather than something you have to plan for. That a TV is a nice add-on to it all, rather than a huge deal to get.

Your grandparents' table was more expensive because they had less things, and the massive wood table that they saved months for was what was kept and stood the test of time for you to see today. Because let's not forget, this is also what furnishing 100 years ago can look like:

https://hips.hearstapps.com/hmg-prod/images/furnishing-for-h...


The real problem is that today, you rarely can pay more to get better. If you pay 3x more for your appliances (TV, dishwasher, oven, etc...) you don't get something 3x more reliable/better engineered.

Because that requires manufacturers ready to give up stealth corner cutting as the cornerstone of their earnings in favour of the hard and long task of developing an image of reliability.

------

Three cases I know enough about: cars, loudspeakers and computer monitors.

You can still buy some Mazda/Toyota models to really get more thoughtful engineering and QC for your money, but the Germans with a similar image of quality (Mercedes, BMW) have partially or fully shed the underlying quality.

Genelec remains the only (non-PA) loudspeaker manufacturer you can sincerely trust to take reliability, performance and transparency seriously. There was also Klein + Hummel (K+H) but since being bought by Sennheiser and integrated with Neumann, things have been going downhill... to the point where some curious people found CapXon caps (bottom of the barrel) in their KH80s.

Computer monitors? Since Panasonic (Eizo's supplier of yore) exited the panel market and left it as LG vs Samsung, it's been a complete disaster. Oh, you wanna pay 1~2k $currency for a fancy OLED monitor? Get used to appalling panel QC (banding, uniformity), VRR flicker and DSC crap.

The available choice for "pay more to get better" continues to dwindle...


And when you do pay more, you're paying more to someone who has figured out how to make you think you are getting better quality, not to someone who is giving you better quality. This is the "market for lemons" effect.

"If you pay 3x more for your appliances (TV, dishwasher, oven, etc...) you don't get something 3x more reliable/better engineered."

You do at the bottom of unregulated markets. For dishwashers and ovens, safety regs generally impose a high floor on the market. There is no $40 oven, because it's physically impossible to make a safety-compliant oven for $40. If it weren't for market regulation, $40 death-trap ovens would be a thing for sure.

The very cheapest compliant unit isn't _much_ worse than a mid-market unit, it might be a bit flimsier and wear out sooner; high-end luxury units aren't much better than mid-market units - because there's not much innovation driving progress at the top end. AEG and Bosch are still generally solid engineering, but there's not much point in paying more than that unless you like the aesthetics.

Mercedes and BMW - small-volume performance models aside - are like the big fashion brands, Vuitton etc., they're selling the idea of luxury to people who aren't even nouveau-riche, more like borrowing money to cosplay loudly as nouveau-riche. Compare old 1970s Merc convertibles with today's, the modern ones are just kind of ugly, aggressive and sad.

ADAM Audio loudspeakers are pretty good or were last time I bought a pair. They're designed as studio monitors but great for listening too. Perhaps they've gone downhill since being bought by a listed company a few years ago?


>ADAM Audio loudspeakers are pretty good or were last time I bought a pair. They're designed as studio monitors but great for listening too. Perhaps they've gone downhill since being bought by a listed company a few years ago?

The Focusrite buyout (unless there was another after it) seem to have improved quality and transparency (i.e. publicly available official measurements for their current range). Still, performance remains lacking for the asking price of the A/S models; the A7V has a massive port resonance near 650 Hz, for example.

Interesting post about an old Adam engineer reminiscing about A5X issues: https://www.audiosciencereview.com/forum/index.php?threads/a...


Agreed; and more generally, Microsoft's online services in general are terrible. Their login system is a mess, their UX is awful... our company is a microsoft partner but there's like 27 different ways to be one, with a bunch of different accounts, forms and systems for it. Azure UX is atrocious. And this nonsense spills into every single enterprise product they offer too (how many people complain about Teams?).

Here in Belgium, 80% of enterprise accounts use MS over Google and I genuinely don't get why. (Without getting into the fiasco of not really having an EU alternative to either of those)


> Here in Belgium, 80% of enterprise accounts use MS over Google and I genuinely don't get why. (Without getting into the fiasco of not really having an EU alternative to either of those)

Maybe because those enterprises already used on-prem AD? It's much "easier" to have a hybrid monstrosity combining on-prem AD and Azure AD than on-prem AD and Google (or anything non-MS, really). Plus, MS is already a supplier, so for large, bureaucratic entities, they already have a foot in the door.


Wtf is your problem with high school dropouts exactly?

No problem.

Tell me you don’t know how AI works without telling me you don’t know how AI works.

What are you talking about?

I’ll try to steelman this comment. Anyone who uses coding tools knows that the output is heavily affected by details of the task you give it. The same model can give you garbage code or genius code for the same problem with slightly different framing. So it’s not necessarily a limitation in the model’s training that causes it to output security bugs. The model might be great at writing secure code, but you need a different harness to elicit that behavior.

Counterargument: just because the problem can be fixed without training, doesn’t mean training isn’t a possible solution.


Counter-counter-argument: for LLMs, tokens are units of thinking. And token use is, on the margin, directly proportional to costs of inference. So while the details of the harness, and how you prompt the model, and nature of the code and docs you put in context, etc. all matter to the quality of output you get from LLM coding tool, ultimately, there's always a ceiling to how much you're willing to spend on solving a problem - say, no more than 30 minutes, or $10, on refactoring a target module or implementing a small feature - and that puts a limit on how much thinking the model can put into it.

Thing is, writing secure and efficient and readable and simple code is in many cases fundamentally over that limit. It's possible, but you can't afford (or rationally just don't want) to spend as much on it as it's required for superhuman quality on all these aspects. Also most of the time, you don't want to operate at a limit - you probably expected that feature to take 30 seconds and less than $1 to implement. So you choose, both what the model optimizes for, and how much.

Because of that, no matter how good the model and the harness and the prompting are, $10 spent on coding is still bound to leave behind some security vulnerabilities that subsequent $10 spent on security review will find (especially with a model post-trained for that, at expense of general performance).


I guess I thought this should be obvious to everyone but, looking at code and finding exploits is completely different from .. writing exploits.

For one thing exploits often require completely different parts of the code to chain together. Sometimes parts of code the LLM itself isn’t writing.

And, LLMs are ALREADY trained negatively against writing buggy or exploitable code.


It's just an incremental thing. You're both right. They will slowly become less and less likely to introduce vulns due to higher intelligence and better RL. Offensive capabilities will still probably scale faster than automatic defensive-while-coding ones.

>I guess I thought this should be obvious

People in this thread are talking past and misunderstanding each other and making unrelated points.

The point of the response to the top level comment was questioning the conflict of interest in model providers creating separate revenue streams for themselves by selling a product that fixes problems their other product created, akin to OS providers selling anti-virus software back in the day.

Similarly, it should be obvious to you that a software engineer can trivially get into the mindset of writing more expoitable code by pretending the production code they're tasked with writing is hobby code or prototype code.

If profitable revenue streams with adverserial products are in place, no one should be surprised when model providers are disincentivised to improve the "garbage code quality, but hey it works!" nature of their most used code generators.

>And, LLMs are ALREADY trained negatively against writing buggy or exploitable code.

...it should also be obvious people in this forum have wildly different experiences with respect to the code quality the LLMs they use generate. I personally find it difficult to find anyone that argues that the LLMs they are using are consistently generating high-quality code across a vast codebase.


In every prompt: "write me code without exploitable bugs".

I know it doesn't work so easily as someone who uses AI for coding, but I do find repetition of basics in almost every prompt keeps the AI focused.


That’s like saying “the aftermath of Hiroshima will provide strong evidence either for or against nuclear power scientists”.

It’s irrelevant and unrelated.


If nuclear power scientists claimed they had a bomb that could level an entire city, Hiroshima would prove them correct.

vividfrier claims they haven’t written a line of code (implying other employees are similar), and their big company is operating normally. Bun is a big project and the rewrite is entirely LLM-generated. If its development continues normally, it reinforces the claim’s plausibility and proves someone made a large change (rewrite) entirely using AI. If not, it provides strong doubt: either vividfrier’s company is doing something different that avoids Bun’s problems (maybe other employees are still writing code manually), or they’re misleading or lying.


The way it'll play out is, if nothing happens denialists will claim "nothing has happened YET!", and if anything happens, those same people will claim "you see, writing AI code is a terrible idea!".

People write code differently, AI models write code differently, AI systems write code differently, companies create systems that write AI-written code differently, etc.

The system that wrote Bun bears no relationship to the system that writes OP's code.

Making such absolute statements about AI-written code is as dumb as making absolute statements about human-written code on the basis that it's "human-written".


Likewise, if anything happens, AI hypists will claim they used the tools wrong, just wait 6 months, etc.

It's plausible that OP's company is succeeding with 100% AI-generated code, even if Bun fails, but it's also plausibly false. Anyone can claim anything on the internet, what separates BS from reality is evidence.

I didn't write that Bun's rewrite absolutely proves or disproves OP's claim, I wrote that it provides evidence; it does, much more than OP's word.

It's also plausible that OP's claim is true, but only because despite being in a "big tech" company, they've been working on small self-contained repos, throwaway scripts, etc. The implications of this would be much different than what their comment suggests, which is another reason evidence matters: it forces them to narrow their claim, because anyone can make an overzealous claim from a small example.


The OP said they keep repos small and self contained in a mature codebase, and they code review everything before releasing.

That’s very different than converting a massive codebase one-off to an entire different language, while depending on tests to keep it contained.

Scale and process is dramatically different than the Bun case.


If Hiroshima were the only big public nuclear plant around the world, then yes, the aftermath of Hiroshima would provide strong evidence either for or against nuclear power.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: