I think that that would have been Apple’s positioning for their car project, but that seems to have been axed.
Maybe they’ll bring it back someday, I hope they do, but it’s almost guaranteed that governments will rain down regulation on them for entering too many markets at once—and yes, for building operating systems to which Apple refuses to build a backdoor to the encryption.
What isn’t fair is for schools to take students’ matriculation and set them up for years of debt, apparently without any intention of educating them properly as per your comment. Better for schools to just screen based on standardized test scores
I know, but your comment also in no way implies that you are taking into account the bigger picture here, where the criticism is directed at the admissions process, and wherein universities are honestly at fault.
If university-level classes have pre-requisites that should be taught in high school, then universities should screen for that and disqualify students who do not have the required competency. They should not be taking the students' money, admit them in the institution, and then let them enroll in classes that they are not prepared to succeed in. That's outright extortion. Many of those students have to take on debt to pay for their education, and besides the financial cost, it's a waste of time, and their failures would be mentally crushing and have lifelong repercussions.
I sympathize with educators in that they cannot slow the whole class down, but that's the point: universities shouldn't be putting educators in a position to compromise the teaching. Meanwhile, educators also shouldn't accept that "pointing [students] in the right direction to get caught up" is enough, because objectively speaking, it's not---that is not how a student develops an understanding of maths and sciences. For the student, that requires a focused (and in many cases, guided) study of those subject areas and before university, without the stress of catching up to university-level courses that are already being taken at the same time.
then why did you accuse me of not intending to educate my students?
>Meanwhile, educators also shouldn't accept that "pointing [students] in the right direction to get caught up" is enough, because objectively speaking, it's not---that is not how a student develops an understanding of maths and sciences.
you havent bothered to ask what "pointing in the right direction" entails, and are making (wrong) assumptions.
yes, obviously, because you called me out specifically. and you are using what i said, without necessary context, and extrapolating it generally to "educators". i'm not cool with either.
> … why not keep your engineers and deliver 20x the value?
Probably because there isn’t actually an increase in demand for the capabilities of software, and engineers, product managers, and UI/UX designers are justifying the existence of their jobs by complicating software more than necessary.
Anyway, the essence of the article is that a “just say no” engineer is a person who knows how to use and enforce constraints so that complex systems remain manageable in the long-term; and that companies perceive such engineers to be irrelevant as AI coding tools become more mainstream.
I think that that has definitely happened, even with my own employer, but I think that companies of the same mindset just don’t have strong engineering cultures to begin with, and will be natural selected into oblivion during this wave of disruption, which already coincides with a prolonged period of economic uncertainty to begin with.
AI tools are great, but they are only as good as your people’s discernment. If you’re making AI adoption a KPI in your company, you’ve already lost sight of what your business is really about, and you’ll be bankrupt by your token spending before you can beat your competition.
> solves real problems and nothing bad happens most of the time
Aaand this is why AI is taking our jobs and we all rightfully deserve to be laid off. This utter lack of risk awareness and care for quality is what created the need for autonomous agents to dig through and build upon man-made slop.
Honestly, I find it rich that we’re the ones who think that AI is the one that’s producing slop. Give any agent clear harnesses and it’ll produce better code than a human would close to 100% of the time. That’s still as indeterministic as the way you used “most of the time”, but the deviation tends to be smaller and the quality and rigor is much higher.
Are you suggesting that AI-written code tends to be more secure than human-written code? Because there are many examples to the contrary, starting with MoltBook.
Not really, no. That's not even the point. Say for example they're just the same level of security. Then what value does a human even offer to a company if AI can do the same quality of work faster? It's not as if the company benefits from something like "human discernment", because as predicated in this thread, developers exactly have none of that, since they don't care about the security aspect of the VSCode extensions that they use. Might as well lay off the human developers and just use AI for as long as the latter is cheaper. How many people does a company really need to update its VSCode to the version that blocks the malicious extension? Do you need more than one and does that person have to be full-time?
Fair, of course price is a factor in whether one product is better than another, and yes it’s my opinion that things becoming more affordable/junkier, is not always a net increase in quality of life.
Is there room for people who are already in the acceptance phase? We started aggressively adopting AI in my company this year. I think I disliked (though never hated) it for a few days, but it’s a systemic change that I can’t just push back against. I don’t believe that strong public opinion can stop technological development either—just take nuclear for example.
I think that the concerns underlying the outrage are real and honestly valid, but the question I’m asking now isn’t “how to stop it” but “what now”? Because economies are cyclical and if it wasn’t AI it’d have been something else that would threaten our survival, and there are many good alternatives right now: climate change and war.
> We started aggressively adopting AI in my company this year. I think I disliked (though never hated) it for a few days, but it’s a systemic change that I can’t just push back against.
I'm right there with you. I think AI will be bad as a whole for the world, but I use it for work every day and am pushing my team to use it more. I think it's a really effective tool for my company even if it's going to be bad for the world overall.
> I don’t believe that strong public opinion can stop technological development either—just take nuclear for example.
I see nuclear as an example of where public opinion did stop development. In the US at least, we've basically given up on nuclear power, much to our detriment.
Another example of this is human cloning, which seemed inevitable back when Dolly the sheep was first cloned.
I don't think AI is going to be as easy to give up on as nuclear. Nuclear has some long term/diffuse benefits, but in the short run it's just one among many types of electricity generation. AI is a whole category, not one substitutable member of an already common category. Us giving up on AI development would be more like giving up on electricity generation than like giving up on nuclear.
Human cloning is a solution with no corresponding problem. We can make more humans very easily, if we have someone willing to bear those humans and take care of them.
If AI becomes demonstrably useful, opting out will be incredibly challenging, since we cannot force other countries to disarm.
Am I crazy, or are these differences between the best models so marginal that you’d get roughly the same performance if you use the same high-quality harness (ie preloaded instructions from md files, including custom skills)?
You will immediately notice the difference if you use it at the threshold.
It's like most people just watching a 'starting nba player' (not superstar, but just starting player) vs one that sits on the bench.
If you were to just watching them play, work out, shoot - you'd never notice the difference.
Put them head to head and it's 98-54 and you start to see the patterns.
It's pretty interesting actually, someone tell me what the 'science' for this is, I'm sure there is some kind of information theory at work here.
Software has innumerable kinds of problems at varying level of complexity and so it provides the perfect testbed for seeing how far models can go in practice.
Should add: you're very right to hint that harness, tooling, and models tuned o both the harness and he kinds of things people do on the harness, as well as some other things do make enormous difference.
Bu and large, SOTA Codex/Claude Code are substantially better - at least for now. That may change.
No you're not wrong. Many people will see what you see. Enthusiasts will see it as monumental squeezing out that last drop of performance. In my opinion I think it is okay for enthusiasts to feel that way. I'm just satisfied with getting a tool as an aid.
Personal opinion we need to focus more on efficiency instead of how large or complex a model can get as that model creeps into more resource requirements. If the goal is to cost a billion dollars to operate than we've really lost the idea of what models are supposed to be achieving.
By definition the differences between "best models" are small. It's tautology. If a model is significantly dumber than the others then it's not one of the best models.
I have the same experience. I've been running sequential agents in my own harness that is a standard SDLC pipeline (plan, design, code, build, test). It has gates between each stage to control quality.
The big benefit of automating this for so long is that I have lots of data. I analyzed it and found that I can change the models out without much of a change in the output quality.
For one-off tasks, where there is no harness and you're just YOLOing with the TUI, yes, big difference. You need a harness.
The pipeline controls the quality far more than the model, empirically.
You have correctly identified that getting a "high-quality harness (ie preloaded instructions from md files, including custom skills)" is the (or at least a) hard part.
Because you have to adjust the harness to your problem space and provide that so you can say it is high-quality.
Many people will stop that discussion at the claude code vs. codex vs. opencode level and then merge that with discussing model performance.
And that is also why "Generate an SVG of a pelican riding a bicycle" is still a benchmark worth discussing. Because at least it is a defined problem space.
Show your code, or show you the door. There are so many native Mac and iOS apps out there right now perfectly capable of rendering Markdown and streaming text. You just gotta wonder what is this guy’s excuse.
OP says "you want to select a whole Markdown document built from SwiftUI primitives", but who wants that? what sort of product thinking tells us we want that? that sounds like a document editor, which has been hard to build for decades and sounds out of scope for an llm chat ui. everyone has landed on only supporting selection within each contiguous block, with a copy button for the entire message
LLMs are often used to generate Markdown because they're quite good at it and unlike HTML it's very forgiving.
Rendering text into things like chat bubbles or even just generic output panes as it comes in is a massive pain. Every new word requires redoing layout, detecting LTR versus RTL flows and overrides, figuring out word breaks and line breaks, possibly combined with resizing the containing UI element (which involves measuring the render space, which is often implemented by rendering to a dummy canvas and finding out the limits).
Document editors have it relatively easy because humans type at a relatively low speed and pasting is a single operation (although pasting large amounts of text does hit the render performance of the UI). They're also often provide relatively limited features on phones.
If you want to render something like ChatGPT with similar features in native UI, youre going to need to find a fully-fledged document component or build one yourself. And, as it turns out, we have document components that work quite well: web engines.
If you embed a webview rendering just HTML and CSS, you get better performance, features, and accessibility than any home-grown renderer will provide. And with every major OS coming with a browser built in, it won't even bloat your app.
HTML is famously forgiving as well - that's the whole reason XHTML failed, because one typo in the latter will make your entire web page fail to render with an error. Markdown is probably a little more forgiving, which mattered more with previous-gen LLMs with small context windows. Any near-frontier model should have no problem generating valid HTML.
Also a lot of LLMs are trained specifically to expect Markdown in their instructions, OpenAI's models in particular (Anthropic expects more pseudo-HTML/XML, but that is different from real HTML/XML).
Maybe they’ll bring it back someday, I hope they do, but it’s almost guaranteed that governments will rain down regulation on them for entering too many markets at once—and yes, for building operating systems to which Apple refuses to build a backdoor to the encryption.
reply