Hacker Newsnew | past | comments | ask | show | jobs | submit | hatthew's commentslogin

I'm kinda of surprised that so many here on HN were dismissive/unaware of the capabilities and potential in the DALL-E days and earlier. I feel like this is the sort of forum where most people would be both aware of advancements and aware of their potential.

My moment was GANs and GPT-2 back in 2019. I feel like that's where computer-generated media went from "obviously fake" to "sometimes can be mistaken as real." RLHF for LLMs and diffusion for image generation are both important improvements, but I feel like they aren't fundamental prerequisites for they type of stuff we have today. I think the main advancements since then are just marginal improvements, larger models/datasets, and better surrounding tooling.


I think most debates about LLMs understanding boil down to different definitions of the word "understand." For example, with the definition of "understand" that I typically use in my daily life, I would argue that in the chinese room, the system as a whole "understands" chinese.

Fair enough, but then, a pocket calculator also understands math, and a pocket translator also understands language. And a wikipedia page that inform you about radioactivity understands nuclear physics. Some will maybe say it is the case, but if we talk about the LLM capabilities as a novelty, then it implies that we are talking about something else, because otherwise, it is not novel at all and it does not make sense to pretend it is.

I'd say that broadly speaking, a system understands things when it can interact with them "correctly". I agree a pocket calculator understands math, but I'd say a pocket translator understands grammar, not language as a whole. A wikipedia page does not interact with anything, so I'm not interested in pushing the definition that far. However, if the wikipedia page were to make recommendations for nuclear safety based on some context it receives as input (say via an integrated LLM), I'd be happy to argue that it understands [that part of] nuclear physics.

I don't think that LLMs as black boxes are fundamentally novel, I just think that their internal design is novel, and their generality and ability to give correct responses to complex topics is far beyond anything previously. For example I would argue that wolfram alpha has a poor understanding of language and a very good understanding of math. I would argue that LLMs have an excellent understanding of language and a mediocre understanding of math, but are able to temporarily increase their understanding of math through document retrieval and "thinking" (or whatever you want to call the process of iteratively generating tokens that build on each other to result in a final response).


Well, then you basically agree with Chiang's article. Just that Chiang as a clever usage of the word "understanding" than you (more clever because more nuanced: 1) I doubt that "people on the street" will agree that obviously "brainless" objects, like a pocket calculator or an interactive wikipedia page will understands anything, 2) Chiang is not stumbling on words: he explained his case that makes clear what he means, and it is to the interlocutor to adopt his vocabulary (because it is very legitimate here) rather than start saying "hm, no, I disagree, because for me, 'conscious' means 'print something on the screen', so LLMs are conscious". That is just missing the point)

Does a pocket calculator not understand arithmetic? What part of a fourth grader's understanding is missing?

Not sure what you mean.

I'm happy both ways:

Either you say that a pocket calculator understands arithmetic, and that LLM understand language, which is something trivial. If a pocket calculator understands arithmetic, than previous substitutes to calculators, such as an abacus, do too. In this case, a word dictionary also understand language. And it is basically what Chiang's article says: the LLM don't understand language more than a word dictionary does. If you disagree with Chiang, it looks like you do only because you don't understand what he is saying, or somehow are not mature enough to realise that Chiang may use a different definition of "understanding" than yours in a fully legitimate way, like everyone is always doing when talking about plenty of subject.

Or you pretend that a pocket calculator understanding of arithmetic is somehow different than the one of an abacus or other obviously inanimate object who are obviously not thinking.


The data on the graph is real, but the labels are made up for vibes.

Yeah the graph is more for vibes than an actual analogy. The yellow slice represents the top 1%, purple 90-99%, green 50-90%, and red bottom 50%. It would make a bit more sense if those slices were labeled "1 person", "9 people", "40 people", and "50 people" respectively.

I solved it with this, a pleasingly symmetric solution. I was surprised that a solution exists with all the queens in a row.

    . . . . . . . .
    . . . . . . . Q
    . . B . . . . .
    . . . . . Q . .
    . . . . . . . .
    . . . Q . . . .
    . . . . . . . .
    . Q . . . . . .


More surprising to me is that there's a solution with no pieces on black.


I found this solution (actually, it's 4 solutions, each B is a different Bishop placement and at the same time it's the only 4 spots on the board not covered by queens):

    . B . . . . . .
    . . . . . . Q .
    . . . B . . . .
    Q . . . . . . .
    . . . . . B . .
    . . Q . . . . .
    . . . . . . . B
    . . . . Q . . .


I started with 4 queens placed symmetrically on c7,g6,f2,b3, which covers everything except corners. Then I shifted all of them diagonally, i.e. to d6,h5,g1,c2. And it turns out, then only a8 and b7 are not covered, which can be easily solved by placing bishop anywhere at diagonal, e.g. h1.


I put `setInterval(checkBoard, 100)` in the console to do it automatically.


I wonder how reliable this is. Will AWS lightsail continue to work indefinitely for free? What if AWS changes the system in some way? What if the person hosting my locality becomes unavailable?


I suppose you can use any nameserver you like, the only problem is it’ll be a PITA to change it.

(I’ve recently registered a .bt domain by filling out a PDF form, hand-signing it, scanning and sending to a Bhutan Telecom admin. Changing a nameserver would probably be a similar procedure now, and involves a one-time fee if I recall correctly.)


In no particular order

    - Meetings
    - Reading papers
    - Understanding legacy code
    - Reading internal news
    - Ad hoc chats with coworkers
    - Writing docs
    - Editing configs
    - Thinking about solutions
    - Slacking off
    - Analyzing results
    - Testing code
    - Reviewing PRs
    - Understanding others' ongoing projects


    > Slacking off
I laughed when I read this, but there is something to it. I like to say "intellectual relaxation" or take a break. Sometimes getting up from your desk to do some mindless admin task like photocopy a document for HR can free up your mind. If we were line workers at a factory, this would be mandated breaks. Business/Financial newspapers and factory executives love the old quote: "With robots, they never need a break, never need holiday, and can work 24x7." With the advent of agentic LLMs, a tiny fraction of that reality is leaking into the white collar world.


AI can do everything you listed except chats with coworkers and slacking off.

I just don't think you've utilized the most recent versions of codex or claude.


It's definitely theoretically possible, but not there yet. I use cursor, claude (opus 4.7), and several proprietary LLMs/LLM frameworks at my job. The institutional knowledge I have wouldn't fit in the context window, and AIs lack my mental index/intuition of where to look for answers. When my AI makes a PR, I generally have to make some important changes, without which it's solution would be fundamentally broken. AI also cannot be trusted to make the right business tradeoff decisions.


Many things at my software engineering job are like this, which require constantly changing human institutional knowledge that is almost always undocumented, or changing so quickly that it isn't relevant anymore. By the time you decide to automate it, the process changes. Tribal knowledge used to be something I hated seeing senior engineers keeping to themselves, but now it seems like an asset.


Can't access the paper, but I'm curious how they measured statistical significance. I wonder how much to interpret the result as "we didn't measure any effect" (which is a largely meaningless conclusion) versus "no effect exists." The latter wouldn't be a rigorous statement, but it seems to be the conclusion we are being led towards.


As someone who has never heard of most of these concepts before (plane trees, catalan numbers, ballot sequences, depth vectors), I found the question "Can you think of a way to efficiently generate a random plane tree?" confusing, and I only understood the problem being solved by first trying to understand the solution. After reading through, it seems like it's asking about generating a random plane tree drawn from a uniform distribution of all possible plane trees with a given number of nodes? Cool idea once I understood it though!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: