> I don't see why that's the case. LLM trained on binary would totally see it, not?
It would not. You find the correct version by counting the number of bytes to the destination. LLMs are famously bad at this kind of problem (counting).
> Also the tool can also be running the test and a debugger.
The test needs to provide a good amount of signal. That’s too hard if you are throwing machine code at the wall.
In order for debuggers to work, you need some kind of model that describes what the code should do and what state the computer should be in after each instruction. That model is high-level code.
I can understand the intuitive appeal of training LLMs with machine code, but all of my experience with LLMs suggest that they are incredibly ill-suited to the task, and we just don’t have the capacity to train them to make useful machine code.
I would phrase it as "LLMs are good at big picture stuff and bad at fine detail", or to put it another way, they're accurate, but imprecise and with low reproducibility.
It is my experience that it's the opposite. LLMs are very very precise but wildly inaccurate. They might give you 17 significant digits but be off by 10 orders of magnitude, to use a metaphor.
Sounds like we're in agreement, then. The 7 digits it got correct are the big picture, and the rest are the details. Are you disagreeing with my statement or with my usage of "accurate" and "precise"?
I said "big picture stuff", but I guess I should have said "broad strokes". The truly correct answer is probably similar to what the model will answer, and if your problem is such that it can work with small imperfections in a solution, then the LLM helps. If the solution needs to be exactly right, then it will probably fail.
Yesterday on a whim I tried asking a local model a question about kanji that look different in different fonts despite being the same character (to the point of strokes appearing in completely different directions), and the model hallucinated imgur links to images of the characters. If imgur could work with approximate references to data maybe that would have worked.
No, I don’t think so. LLMs are good at a lot of simple tasks, but bad at certain simple tasks. Moravec’s paradox in a new iteration.
It applies to humans too. Calculus is “simple” but it takes something like sixteen years to train a human to do it, if all goes well. Meanwhile, most humans think that inverse kinematics is, like, the easiest thing in the world (it’s a super complicated task).
Calculus is definitely the harder task, considering it took a species developing the cognitive capacity for symbolic reasoning for it to show up, whereas any animal can figure out how to position its limbs. Yeah, we figured out how to make CAS programs before inverse kinematics software, but that's because computers were made to solve numerical problems, not to replace the cerebella of chordates.
You’re only evaluating “harder” or “easier” based on the perspective of somebody who has a mammalian brain with millions of years of selective pressure to make it suitable for solving inverse kinematics problems.
The point here is that when we start constructing agents or tools with different architectures to ourselves, it makes sense to reevaluate notions of whether something is ‘hard’ or ‘easy’. LLMs are bad at counting not because counting is hard, but because their architecture makes it hard.
I'm evaluating them using an objective metric, which is how long each took to arise in the universe. It could have never been the case that calculus arose before inverse kinematics, because a thing like that could not interact with the real world.
Also, I suspect you're comparing dissimilar things, because in one case you're looking at a brain doing both inverse kinematics and "calculus" (sense 1), and in the other you're looking at a computer doing both inverse kinematics and "calculus" (sense 2). The kind of calculus a CAS does is not the same kind that a human does. It's less versatile, for one.
>The point here is that when we start constructing agents or tools with different architectures to ourselves, it makes sense to reevaluate notions of whether something is ‘hard’ or ‘easy’.
Well, no, because when someone says that calculus is hard and moving their arms is easy, they're not talking about how hard it was to create each functionality, they're talking about how hard it is to employ each. We would need to ask a computer how hard it thinks the tasks it does are to do.
I don’t think the metric is at all reasonable, and the fact that it’s “objective” doesn’t make up for its other shortcomings. I don’t think we have a basis for agreement here—I think you’ve framed the argument in a way that supports a “calculus is hard” conclusion merely by defining “hard” in such a way that supports your conclusion from the start, but I think that approach is only useful as a way to win an argument, and we’ve failed to share ideas once you start using that tactic.
>I think you’ve framed the argument in a way that supports a “calculus is hard” conclusion merely by defining “hard” in such a way that supports your conclusion from the start
It seems to me you're the one who first did that by equivocating what is easier to do and what is easier to make a machine do.
>we’ve failed to share ideas once you start using that tactic
Even if it could, it would be ridiculously token inefficient to update huge amount of addresses instead when some small change is done to the middle of a binary
But for desktop applications it is bloated, a big attack surface.
HTML/CSS is made for online documents, and using it for applications is a bit hack that happen to work, but hides a huge ton of complexity behind frameworks and frameworks of frameworks with leaky abstractions and each their own caveat.
I'm not a frontend developer, but seems fast to me. I'm surprised that the UI portion of this example takes so little CPU: https://youtu.be/7k0JNT6itaI
Now, the rest of the DSP code sure is faster in native.
What are examples where web UI is too slow for you?
Or do you mean large apps written in JS, which is a different topic?
But it is controlled for the wrong criterias.
"Natural" doesn't mean healthy or good for the environment.
It is only greenwashing and "appeal to nature" fallacy
Or do you need evidence that the bio labels are not optimizing for health or environment? Check the rules. Most of them are just there to restrict synthetic products, regardless of their impact.
There is an official eu organic label. It’s not compulsory of course, but it’s the baseline for organic food production in and for Europe. Other (private) labels have stricter rules and are usually certified in addition to the EU label.
No, this is definitely an official gvmt body that can fine you if you try to sell fruit as organic that doesn't follow the regulations. It IS definitely compulsory if you mark your produce as organic.
"IMHO organic labels optimize for the wrong things."
What do you mean?
I only know of "Demeter", that also has some very esoteric requirements (homeopathy, cosmic energy flow rituals) - but otherwise organic label optimize for:
- no or little pesticides and herbicides
- more space and better condition for the animals
My only other grievance is that they also all ban GMO
They optimise for natural. So you can still have pesticides and herbicides. If you find your poison in some plant, it is fine. If you synthetize the same molecule in a factory, then it's not allowed.
As for the animal welfare, true, but there are also labels specifically for that that.
Biodynamic at least requires farms to produce their own fertilizers. For that reason alone I try to buy it. Fertilizer dependency will be the end of us
I ignore all the magic stuff (in fact, if you have some spiritual devotion to the food you're growing I think that's just fine)
Am aware of that, bug glyphosat is definitely not allowed and likely a result of neighbors spraying plentiful in bad wind conditions (there are strict regulations in theory, that are usually ignored in reality)
Because users and community contributors most likely already have an account, are familiar with the UI.
There is also the "gamification" aspect that GitHub have. Doesn't motivate me personally, but could have effect on some others.
Projects on GitHub gets a lot more visibility.
To the point that many projects that do not use GitHub as their main forge are still often mirroring their repository there, and have to deal with double source of bug reports or pr.
> How often do you read assembly to check what your compiler is doing?
The difference is my compiler is more-or-less deterministic, and tends to do exactly what the specification provided to it (the source code) says. LLMs do not currently fulfil either of those criteria
Swear words and violence don't cause addiction, alcohol can but it's way less likely and also easier to restrict... idk why a kid should have cigs even once though
there may be valid use cases in certain demographics eg the disabled. to me it is evidently advantageous teaching a teenager how to have a smoke or have a drink properly , so that they don't go overboard with self directed learning for a valid activity (loosening social inhibition). we could totally teach teenagers the generation and consumption of dispassionate violent relationship simulacra. may I ask what would be advantageous about this ?
It looks like their approach could nicely solve a problem that's shared by almost every new GUI toolkit I've tried: text looks terrible, or at least out of place when surrounded by applications built with the desktop's native toolkit.
So far everything is going according to the plan. Humans are really close to make the AI that will replace them and enter into the next phase of the plan.
Or do you have a better idea of what the plan exactly is?
You mean the AI that might fail and suck every last ounce of entropy or life out the planet and sufficate it? Have you seen the insane amount of natural gas being burned to power it? Obviously I'd love if AI solved its own energy crisis but that hasn't even begun to happen yet. You think it will invent cold fusion? Room temp super conductors? Solar cells past our theoretical limits? Do you realize it's literally being controlled by human greed?
What about P vs. NP? Is auto-complete able to create P solutions and then perform NP verification by interacting with experiment or calculation IO? Couldn't it test solutions faster than a human on problems with massive solution spaces like folding proteins or aligning electron-hole pairs?
I don't see why that's the case. LLM trained on binary would totally see it, not?
Also the tool can also be running the test and a debugger.
reply