Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even the strongest frontier model they used - GPT 5.2 - I would consider barely usable for agentic programming.

I’m not really interested in analysis of the weaknesses of such models because in my experience many weaknesses disappear entirely as models get stronger and reasoning effort is turned up. Especially if you tell them what you want them to do.

Also, it’s not surprising to learn that when more acceptance criteria are added the failure rate increases.

 help



Oldheads remember when GPT 5.2 was at the forefront of agentic programming. December 2025 feels like eons ago, but alack it was an entire half year!

If I'm not using got 5.5 high reasoning I'm wasting time.

Well, maybe so, but how did you feel about 5.2 when it was OpenAI's frontier model? That's what I'm getting at – it was the equivalent of your gpt 5.5 high reasoning just six months ago.

It was a joke. I think you need to mix up models.

Gotcha. Hard to parse tone and intent through text on the internet.

They all feel the same to me now, opus, 5.5, whatever

Wait isn't gpt 5.2 good? Or is it not thinking / not codex? 5.2 was what sparked the late 2025 openai agentic programming revolution.

5.2 still had a Codex variant, which this doesn't describe using. It also notably is not using the Codex harness -- it does everything with open source harnesses (which obviously are worse). And while it uses two harnesses with its cheap models, it only uses the worse-performing one of those with GPT 5.2 for cost reasons. (They also don't specify effort/thinking level used for GPT 5.2, but given that it performs worse in their baseline testing than obviously non-SOTA models, I'm guessing it wasn't set to anything high.)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: