More

danjc · 2026-06-13T06:21:37 1781331697

Even if they could practically restrict access to US citizens only, I would expect them not to - it would be hard to regain that once lost and they need a global market for growth.

Wowfunhappy · 2026-06-13T12:56:03 1781355363

> it would be hard to regain that once lost

Harder than regaining the ability to sell access to the model at all?

danjc · 2026-06-09T13:10:38 1781010638

As much as it's true that a novice will generally use AI to build a sloppy mess, I've also had success unsloppifying through some careful prompting.

kaydub · 2026-06-09T15:24:57 1781018697

Yeah, v1 is sloppy, then I tell the LLM to clean it up. Every 1 prompt of building tends to require 1-5 prompts of clean up. Simple, fast, clean good code.

The chasm between "Software Developer" and "Software Engineer" is getting wider. Articles like this and the comments under it give away who is an Engineer and who is just a coder.

rspeele · 2026-06-09T17:16:55 1781025415

> Every 1 prompt of building tends to require 1-5 prompts of clean up. Simple, fast, clean good code.

I have found this to be very effective as well. However, it's so easy to do, I can't imagine they won't build it in.

The harnesses will improve and the loop of "self-review, judge what needs clean up, do the refactoring, repeat until clean" will get included in the one-shot. They are already doing this somewhat, they'll just get a lot better at it and as the models get faster and cheaper to run, the refactoring churn at the end of each task won't even create a noticeable delay.

I do not think the high-level "taste" knowledge that I've built up -- when to break something off into its own service, what to put in the DB vs cache vs queues vs blob storage, how to isolate important logic in pure functional layers so it can be tested and validated independently -- is any more "unlearnable" to AI than the stuff I previously considered impressive that's now one-shottable like "write a Prolog implementation from scratch".

kaydub · 2026-06-09T17:43:42 1781027022

They have definitely built some of it in.

And yes, right now you still need the architectural and system design knowledge because the LLM will fuck that up. We'll all find out if that continues being needed in the future. From what I understand about LLMs and how they work, I doubt it, but also, yeah, I doubted it would've gotten this far when I think back 2+ years ago.

Also, maybe I should be clear, I pretty much never one-shot things. My sessions with claude or other cli tools always starts with a bit of a conversation until we converge on a good plan, claude builds the code, we discuss some more, then we iterate.

yesitcan · 2026-06-09T16:05:23 1781021123

The chasm is 1-5 prompts wide?

kaydub · 2026-06-09T17:10:43 1781025043

Knowing what you're doing holistically is the chasm.

Yokohiii · 2026-06-09T16:18:25 1781021905

The distinction between developer and engineer is obviously that one adds "make no mistakes!!!1."

Cthulhu_ · 2026-06-09T13:28:45 1781011725

I wish I had current-day AI (and a big credit card) for my previous job, they had a big legacy mess made by a productive but not very good developer, but my job was to rebuild it.

If I had AI tooling at the time I'd probably be more inclined to have it both refactor / optimize the existing application, add automated regression tests etc, and use it to extract all of the features and requirements for it for a potential rebuild.

But honestly I think if that application was properly designed and factored (instead of nesting JS in HTML in strings in JS or concatenating XML from query results only for it to be converted to JSON taking up 50% of response time) its lifetime could've been extended, especially if it was then containerized into a HHVM or similar php optimizer.

But, hindsight.

jmaw · 2026-06-09T15:06:09 1781017569

Any tips on how you unsloppify things? Are you using things like claude.md/copilot.md (or similar) to guide better, do you have specific types of prompts that you run, or do you adjust your code review practices in some way to more efficiently review lots of slop code?

One of my particular complaints is how code-gen LLMs tend to re-create the same code over and over again. Case in point, a use-case where a team name is generated from a list of team member names. The LLM re-generates this code in-line every time it needs to display the team name, rather than simply writing and reusing a utility style function.

I know I need to fix this. At this point I'm planning to just prompt something like "please list all the places where team names are generated/calculated", plus manually search through the codebase, then perform the abstraction myself. But I'm unsure how to prevent this (both this example, and other cases that could benefit from similar utility functions) continuing to occur in the future.

kaydub · 2026-06-09T15:29:15 1781018955

I accept that for every prompt of building I'm going to have 1-5 prompts of refinement.

Once the LLM tells me "Okay, it's done, everything works" I always as it to do a thorough review, I tell it to split up the work among sub-agents with each one taking on a specific responsibility (look for code smells, look for bad architecture, review the data access model, DUPLICATE CODE, testability and unit testing, etc.)

After a certain number of revisions and reviews you'll come to accept the shortcomings it comes back. Usually there will be specific design decisions you made that the LLM keeps bringing up, once the review only brings that up and maybe some other minor issues it's time to move on.

I don't overly rely on markdown files and directions. I don't rely on tooling around it either. I just don't trust the LLM when it says "all done", tests pass, and deployment works. I make it to multiple reviews and iterations even when it thinks it's done.

evilturnip · 2026-06-10T11:55:04 1781092504

I also ask it to explain the system back to me. Obviously it should understand the system just by reading the code. But somehow, explaining the system back to me seems to make it more effective. Then I'll ask it questions about how I should make changes to the system. Sometimes I'll agree, sometimes I'll disagree and offer an alternative and ask it to assess the alternative. Having this entire conversion in its context seems to make it way more effective at refactoring/unsloppifying code.

hamdingers · 2026-06-09T15:36:36 1781019396

> Any tips on how you unsloppify things?

Understand what you're writing. If you never build up the mental model of what the code is doing you'll never be able to discern what is slop and what isn't. There are no shortcuts.

Piling more prompts on might get you to the same end result, but without understanding you'll never know when you're there.

pu_pe · 2026-06-09T13:32:03 1781011923

Absolutely. I really don't think the future will be humans reading and picking apart an AI-generated codebase, there will be tech debt agents or whatever running overnight.

SlinkyOnStairs · 2026-06-09T13:45:19 1781012719

I think you misunderstand why tech debt lingers around. It's not a capacity or capability problem.

Organisations just don't want to deal with the accountability involved with "touching cold code". Whether it's a human or "AI agent" doesn't change the "It worked in prod, you touched it, you broke it, never touch anything again" dynamic.

pu_pe · 2026-06-09T14:25:21 1781015121

That's one dimension of it, but in the context of this thread we are talking about how maintainable a codebase is for other humans. If your codebase is messy you depend on a few key employees and it might be hard to onboard new ones, so there has always been financial incentives to reduce tech debt.

SlinkyOnStairs · 2026-06-09T15:03:32 1781017412

> so there has always been financial incentives to reduce tech debt.

Yes. In practice, this does not weigh against organisational resistance.

AI really makes it worse by adding an explicit numerical cost to doing anything.

pu_pe · 2026-06-09T20:40:26 1781037626

Um, no, actually AI makes it better because the cost is lower now. I'm not sure what point you're trying to make here, obviously organizations already fight against tech debt all the time through a variety of means?

SlinkyOnStairs · 2026-06-10T19:09:59 1781118599

The point there is that it is MUCH easier to get corporate to agree to something when the cost is nebulous and being paid anyway. If you get a senior dev to clean up some tech debt, how much did that cost the company? The dev will have some multiple things at the same time, so you can't cleanly assign a number of hours, maybe multiple people are involved. It's practically just an unknowable. Practically, $0.

Anthropic will sent a concrete number bill.

james_marks · 2026-06-09T14:45:30 1781016330

Exactly this. When I reject a refactor PR (or ideally, _before_ there's a PR), it's not because it's a bad idea, per se.

But there's risk associated with every change, and it takes time to review, QA, monitor the rollout, communicate to stake holders, etc.

The refactor itself may be the smallest part of it.

bigstrat2003 · 2026-06-09T16:58:59 1781024339

So your proposal to handle tech debt created by "AI" being unable to do good engineering is... throw more AI at it? There's a saying about the definition of insanity which comes to mind.

nicman23 · 2026-06-09T13:26:30 1781011590

or linters

danjc · 2026-06-09T13:00:13 1781010013

It's not just annoying, it's tiring

danjc · 2026-05-20T07:26:33 1779261993

Plot twist: Apple PR team created this video to make claims that they slow older devices seem less credible.

danjc · 2026-05-17T18:12:23 1779041543

A browser plugin that scores webpage content based on how likely it is to have been AI-generated would be quite useful.

Browser vendors can't build this.

nicce · 2026-05-17T18:20:04 1779042004

> A browser plugin that scores webpage content based on how likely it is to have been AI-generated would be quite useful.

I am strongly against this, because you cannot accurately detect it. People start to get blamed even more when they actually did not use the AI.

Forgeties79 · 2026-05-17T19:31:47 1779046307

Nothing new under the sun unfortunately. It’s just an easy way to dismiss people you don’t want to listen to, and people abuse it like crazy.

sigmoid10 · 2026-05-17T18:19:06 1779041946

This is virtually impossible to build. Not just because all current "AI detector" systems are fake or outright scams with accuracy comparable to a coin-flip on frontier model output, but because even if someone did build a reliable detector and released it to the public, it could be used for adversarial training and it would become worthless pretty fast.

linolevan · 2026-05-17T18:30:15 1779042615

Pangram is legit. I don't work at pangram, we integrated it in our paper website and one of the cool emergent behaviors I've seen is that on AI papers with example rollouts, it will accurately mark the paper's main text as human generated and the rollouts as AI generated.

My understanding is that they strongly believe in no false positives, so it's definitely possible to slip something by them but if it marks something as AI, it very likely is.

dolebirchwood · 2026-05-17T19:16:30 1779045390

> My understanding is that they strongly believe in no false positives

Who cares what they "believe" (or, more accurately, say they believe). What are the underlying processes that actually guarantee this, and what data supports it?

jfengel · 2026-05-17T18:43:59 1779043439

What is a rollout in this context?

woadwarrior01 · 2026-05-17T20:31:28 1779049888

> Pangram is legit.

Their 99.98% accuracy claim[1] makes me doubt that.

[1]: https://www.pangram.com/solutions/chrome-extension

Groxx · 2026-05-18T02:11:59 1779070319

Rather obviously they're choosing the one that makes them look best. Another they link to¹ shows 98% for example.

Much more importantly, 9/10 dentists agree it's the best.

1: https://arxiv.org/pdf/2501.15654, linked from² https://www.pangram.com/blog/third-party-pangram-evals (the second section)

2: the third study they link there is based entirely around the assumption that Pangram is correct, and seems to have been a collaboration or something as they're included in the credits area.

BurningFrog · 2026-05-17T19:00:00 1779044400

AI is very hard to detect and changes on a weekly basis.

But you could build something that ranks the quality of the webpage content! This would also be more useful.

Of course, that tool would have to use AI...

ssl-3 · 2026-05-17T19:04:06 1779044646

Bot detectors are broken. Even human bot detectors are broken. When I'm in the right mood, I can be quite capable of writing with very good formatting, structure, and phrasing. When I actually take the time to do this, there seems to be about a 70% chance that some nimrod will crawl out of the woodwork just to accuse me of being a bot.

Even humans who deliberately use lazy formatting and leave obvious errors uncorrected to provide "proof" of being human aren't seeing the big picture, here.

---

That bigger picture is that it's easy to make instruct a bot to be lazy, or to avoid the usual quirks. I hate when I'm working on a project and see a constant outflow of negation ("Don't do x, y, or w" is a recent hit) and unfounded exclusive confidence ("The correct answer" as if this is Highlander and there can be only one). Repetitious jargon like overuse of "gate" for things other than fences and skiing is something I can't stand. Plus the usual things — like overuse of unusual punctuation — that are obvious tells.

That stuff all drives me nuts.

But the bot just follows instructions, and my bot has been instructed to avoid those things. It generally performs very well, though the instructions do need re-hashed every now and then as models ebb and flow.

It's super easy to get the bot to write some python or perl that takes a body of text and intentionally some words or lose a comma while mmaking other errors and converting — into --.

When it comes to human error in written language, we just aren't that hard to emulate.

Now, that all said: You'll just have to take my word for it, but I do not use the bot to help with writing English. But I do have every confidence that if I woke up tomorrow and actually started bulking up my comments using a bot, none of you would be able to tell.

Groxx · 2026-05-17T18:22:09 1779042129

Everyone has failed to build this. They can only sell claims that they have built it to fools.

lelandfe · 2026-05-17T18:34:09 1779042849

I work somewhere that tries to do such detection (for fraud prevention) and it sort of feels impossible to me in the medium term. AI slop qualities are fleeting - I’ve seen Reddit AI posts that have misspelled words, no dashes, stilted sayings and so on.

People want their slop to be undetectable.

neversupervised · 2026-05-17T18:18:30 1779041910

Check out Pangram

danjc · 2026-02-28T18:28:22 1772303302

I've been waiting for someone to say this. An agent will generally produce far more code than technically necessary for the task. It's a kind of over engineering which makes it increasingly harder to wrap your head around the codebase.

truthbe · 2026-02-28T19:22:45 1772306565

Over engineered implies the codebase was inflated with some kind of rationale by the AI, but there is none. It's just code vomit with duct tape

danjc · 2025-12-07T10:51:51 1765104711

The issue is provenance. We need cameras and phones to digitally sign photos so we can easily verify an unadulterated image.

You also want to be able chain signing so that for example a news reporter could take a photo, then the news outlet could attest its authenticity by adding their signature on top.

Same principle could be applied to video and text.

soerxpso · 2025-12-07T12:13:45 1765109625

Signing something doesn't verify that it's real, it just verifies that you claimed that it was real, which everyone was already aware of. You can either hack a camera, or use an unhacked camera to take a picture of a fake picture.

danjc · 2025-10-27T07:18:55 1761549535

I'd it's processed in 2 seconds, why not just process it immediately in memory?

swiftcoder · 2025-10-27T07:25:05 1761549905

Because they are serverless, so there's currently no memory for it to be processed in at the point of upload

liqilin1567 · 2025-10-27T09:50:24 1761558624

Maybe there are too many requests, so they have to offload the videos to s3.

dboreham · 2025-10-27T14:07:31 1761574051

That's not the reason. And furthermore any buffer/re-try mechanism should be done at the edge (on the camera).

danjc · 2025-10-17T18:43:47 1760726627

It's all jit context

danjc · 2025-09-18T06:23:25 1758176605

It looks like true 0-temperature (i.e. determinism) will happen. Here's some good context: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

ayewo · 2025-09-18T11:28:46 1758194926

HN discussion https://news.ycombinator.com/item?id=45200925

FergusArgyll · 2025-09-18T11:36:50 1758195410

But 0 temp is much less "Creative" and may not be conducive to showing off the AI's latest tricks

explorigin · 2025-09-18T17:12:47 1758215567

True. It depends on the feature you're demoing...but determinism is a VERY DESIRABLE feature for giving demos.