Hacker Newsnew | past | comments | ask | show | jobs | submit | niek_pas's commentslogin

Unless I'm missing something, there's also nothing in the paper to indicate this is "all of human ingredients"? It looks like it's 11 data sources covering a bunch of common cuisines, with the English + Chinese sources accounting for 90% (!) of the dataset. Among others, Africa and the Arab world are not present in the data (good for about 25% of the global population).

Also, all non-English terms were AI-translated to English which is methodologically understandable but surely leaves room for error.


translation is an interesting problem in and of itself still. its kind of a miracle we can do it at all, yet in some circumstances it seems obvious for there to be objective answers (cooking ingredients being one of them), but even then you never really know even with human translators if you've got it correct. even within the same language nearly every individual has their own version of it.

for example, how would you translate "chips" to another language without first knowing which version of English you are translating from? could be an american speaker with a british relative and they use the british definition of chips while otherwise mostly speaking american english.

there's a level of pragmatism in translation that needs to be assumed, and ultimately we have to accept that translated knowledge will always have low resolution. There is a layer of work that needs to be done with the source of the materials involvement to get written content to a level of formalism needed to be representative of the language it is written in. Generally, the work of editors. Which means successful translation for wide distribution, while still not guaranteed, is predicated on the editorial skills of the translator which begs for dialogue with the source.

Meanwhile, AI provides this super convenient band aid to get translation results you can't disprove.

I genuinely think people are severely underestimating the power held by these models for being translators and how literal truth is going to be determined by them deep behind the scenes under the disguise of accessibility. Not in a dangerous way necessarily, just in a way where what languages are and what words mean is going to shift towards whatever the models think they are.

In a way, over extended time, the models will not be wrong about the translations because their results will redefine what successful formal editing of language looks like, and disagreeing with them will amount to the same difference as having local slang.


Leaving out Indian, Southeast Asian and Arab cuisine means this is nigh useless.

There are 2,000+ varieties of mangoes alone. You could literally end up with a larger file using only mangoes.

Was going to give the same example with chili peppers. Tons of varieties and not exactly interchangeable

Thousands of cheeses, each of which is a unique experience. Heck, even the serving temperature completely alters the experience. Next: wines, charcuterie, ...

Pity the fool who can't taste the difference between any of these.


there are thousands of varieties of a lot of things though...

Use ChaatGPT

It's worst than useless, it's borderline criminal /s

The fabricated title targeted the sensation rather than substance, typical scenario whenever "All" is in the title, and the worst when it's in the very first word.


> Yeah agriculture is bad for the environment, but at least it feeds us to keep us alive

This is true, but don't forget a _lot_ of agriculture feeds _animals_ that we in turn eat. If you want to make optimal use of land for human needs, most modern agriculture is not that.


The problem is feed lots.

There's no problem the more conventional practice of letting animals graze the majority of the year. If we didn't use those fields to feed and eat the animals, the grass would turn into CO2 and methane anyway. Or turn into boring forests.

Not everything has to be optimal. That thinking leads to Thanos' snap. People generally enjoy meat. They also enjoy the landscape farmers created.


Genuine question: do you add 'please' and 'thank you' to Google searches? If not, what sets them apart?

Google searches being keyword based, rather than simulated conversations?

The same reason you wouldn't put in an entire actual question/sentence, unless you either don't know how to use Google, are pissed off, or have an actual reason to suspect that it would yield proper hits (e.g. looking up an excerpt).


Google has been optimized for sentence like questions so much that for a good 6+ years now it has been completely useless as keyword search.

To clarify: sentence search got slightly better at the cost of keyword search. So the result is unusable garbage.


It is rather hard to lose of habit of using search engine with keywords given the change took place without much fanfare. I have no problem using sentences with the current ai tools through.

Genuine question: do you write Google search queries in natural language?

I didn't used to but I do now that the searches go straight to an LLM. I almost always find the model output to be much more useful than the list of search results.

I don't. I was recently doing some searching for information I thought AI would be good for: fuzzy natural language search with some conditions. And it was, but ...

Gemini at least is not great at citing and picking sources. Or providing multiple sources for the same thing.

It tends to stop at threes. So if you want more, you have to prompt it uselessly, like: "any more?"


llms seem more human like so if you were to treat them badly then you are more likely to condition yourself to treat other living creatures badly.

Google isn’t conversational.

I searched for "Hey Google" and got this in response:

  Hey! I'm here and ready to help. What’s on your mind today? Whether you need to look up information, plan a trip, or get things done, just let me know!

That's only because Google is an LLM now.


One of the dumbest thing supposedly clever people keep bringing up.

"Update your priors" is a common expression in English: https://en.wiktionary.org/wiki/update_one%27s_priors#English

Your wiktionary link indicates it is not a common expression in English but instead something "rationalist community" people say.

HN is a rationalist community hangout.

we're reading comments on a post about math proofs

No it's not. Where do you come up with this? Just because you searched the phrase on Google and there's a single result for it on a wiki? Who do you know that's using this expression regularly?


"Common" is an exaggeration.

I don't mean to sound elitist, but in a way, Haskell's difficulty is kind of the point of the language.

The thing that's so elegant about Haskell is that it allows you to express programmatic constructs at a very abstract level. Abstraction is almost by definition difficult to grasp. That's why it takes a decade and a half for (most) people to go from arithmetic to calculus.


Difficulty is most certainly not the point. Abstraction, composability, yes, but difficulty is a language smell that CAN be fixed. (I love Haskell and it's my primary langauge, so this comes from a place of love).

RIP my browser history, I guess

What in the world is “other native” supposed to mean? Those languages don’t have names?

"Central Yupik" for Alaska, "Lakota" for S. Dakota

creole?

What's even worse is that when dealing with human software teams, a vague requirement will (at least in a well-run org) receive demands for further specification. "What do you mean by 'get data'?", etc.

An LLM will just say, "Sure! Here's the fully implemented code that gets the data and give it to the user. " and be done with it.


ChatGPT 5.5 responds:

> What data should I retrieve, and where should I get it from? Please specify at least: ...

And it then goes on to ask just exactly what is necessary, being all constructive about it.


You're both right. The parent was a toy example, and if asked literally to an LLM, it will definitely ask for more information. Yes, it's important to be accurate but I don't think that applies here.

But the point still stands: in most contexts, the LLM will fill in the blanks with what it deems appropriate like an overconfident intern at best and a bull in a China shop at worst.


When the cycles are short enough, though, that is to some degree the right thing. That is, it's the right thing for things the users can then immediately see and give feedback on, because it lets them give feedback on something tangible.

It's the wrong thing for important things under the hood (like durability and security requirements) that are not tangible to them.


IME you give it very precise specifications and it still fucks it up.

When we talk about "the" bottleneck being specs it just isnt the case that it's the only thing LLMs do poorly. Theyre really bad at a lot of stuff in the SDLC.

They're also good at providing results which are bad but look ok if you either dont look too closely or dont know what you're looking for.


Just as poorly designed code can still compile. This is operator error, not a failure of the technology.

Bit off topic but why in the world are people still posting on medium? The reading experience is abhorrent; I couldn’t even finish reading this article before a full screen popup literally blocked the sentence I was reading.

Is there some incentive I’m not seeing?


They have made an honest attempt to pay writers. It's a different model than substack, but that's why.

I look at it the same way I look at pay walls for newspapers. I don't like them but I understand why they are there.


Which is why it failed though. It turns out people won't pay one dollar to read an article like "If AI writes your code, why use Python?"

The situation is very unfortunate. We had perhaps once-in-a-lifetime chance to solve micropayment but we fucked up (crypto).


yup, I still wonder if BAT was onto something. loved the idea, never took off. oh well


It seems like it's just the latest evolution of the writer-friendly blogging platform; easier than Wordpress to package into a newsletter, and also easier to monetize with a paid tier.


But don't we have AI to deal with the complexity of Wordpress? :-)


Insofar as AI is great at accidentally deleting your production and backup Wordpress databases, and forcing you to start from scratch with something else.


> The reading experience is abhorrent

Nothing you read in the browser can provide ultimately great and hands-down the best reading experience equally for everybody - the modern web model is inherently at odds with that. A plain HTML page with no CSS is a near-perfect reading experience. The problem is that almost nobody ships that, because the web also became a publishing platform where authors compete for attention. A plain-text protocol under user control is closer to "best reading experience for everybody". The web could be that. It mostly isn't.

I stopped trying to read long articles in the browser. Why would I do that, if I can easily extract all the relevant, plain text (and even structured one) and read it in my editor instead? Where I have control over fonts, colors, navigation, etc. The browser is a delivery mechanism, not a reading environment. Treating it as one is a habit, not a necessity.

Long ago I stopped trying to type anything longer than three words anywhere but my editor. Of course, why wouldn't I? It already has everything I need - spellchecking, thesaurus, etymology lookup, translation, access to all my notes, LLM integration, etc. Try it one day - it's enormously liberating experience. And then maybe you'd stop reading long texts in the browser as well.


> A plain HTML page with no CSS is a near-perfect reading experience. The problem is that almost nobody ships that, because the web also became a publishing platform where authors compete for attention.

They don't ship it because of greed. They only want your attention because of greed. They only infest their website with ads because of greed.

> The browser is a delivery mechanism,

http is a delivery mechanism. The browser is a user agent. It's supposed to display content according to the preferences of the user. If your browser isn't doing that for you it's time to find a new browser or beat the one you have into submission until it behaves. "reader mode" is a useful compromise.


> It's supposed to display content according to the preferences of the user.

That's right, the original idea was exactly about that, but like I said - in practice that is no longer a thing.

Using the editor for reading any content is enormously underrated. Check this out - this entire thread opens in my editor as an outline with nested structure. Meaning that all the regular outline operations are available to me - folding, imenu (interactive TOC), narrowing, quick search, contextual search, pattern-based search, sparse-tree search.

Extracting all the URLs on the page while ignoring HN-internal ones is a single keypress for me - there's a link to a YT video - I can watch it, controlling the playback directly from my editor, I can extract transcript and summarize it with an LLM request - all without opening new tabs, without switching focus.

I can narrow on the sub-thread, or select a region and export only that part to a pdf, gfm, html or LaTeX. The possibilities are virtually unlimited. A web browser - even with three hundred different extensions won't let me have complete and utter control over plain text - it's just not designed for anything like that.


I'm assuming you use Emacs? Are you using a special "hacker news mode" or something more generic?


HN threads is probably not the best example because the site is pretty readable already. But it's not that difficult to fetch a thread and render it in the Org-mode outline format. nhreader.el¹ does that. For reading articles I just use eww. it has (eww-readable) that removes all the fluff like banners. The trade-off that eww (by design) doesn't do any javascript. That makes it difficult to use with websites with client-site rendering (React, et al.). For that, I have a little automation elisp² that uses OSA (JXA) and extracts the rendered content off the page. I need to figure something similar for Linux, but it's not so straightforward, the only way I know is to run the browser with the debugger port.

¹ https://github.com/thanhvg/emacs-hnreader

² https://github.com/agzam/.doom.d/blob/main/modules/custom/we...


Can you share your setup how to achieve what you described? I'm curious.


see the adjacent thread


> Why would I do that, if I can easily extract all the relevant, plain text (and even structured one) and read it in my editor instead?

Because that’s an enormous pain in the ass. Not scalable at all.


Its pretty easy with a system like Readwise. Yes, that's ANOTHER system, but its one system to quickly just add articles like these to an inbox and read them another time, in plain text.

Of course, it doesn't work 100% and certain sites are hostile to it and do stupid javascript tricks "for the views".

Mostly, I use it to put it on a reading list later, and to get around really, really abusive ad driven sites.


> Its pretty easy

100%. One can use mozilla/readability to extract the content. Even if you think that would require some effort, think about it - you have to do it ONLY once and never deal with that kind of annoyance EVER again. It really baffles me seeing devs complaining about shit like that. Why? Why won't they figure out a better way? You're a friggin' programmer - computers have to obey your will. You spend your lifetime staring at the screen, reading and editing text. Why not do it on your own terms? Even if it takes some effort, why choose to be henpecked by someone else's rules FOREVER?


I beg to differ. You clearly misinterpret what I'm talking about. Please expand on "scalable", what do you mean by that?


do you use emacs?


I do, but nothing stopping anyone from doing the same thing with nvim or vscode. I'm pretty sure, for vscode there probably extensions - it's already built atop a browser.


My best guess is momentum. Some people are very, very brand loyal and have to do things in relation to what/how others do things.

In reality it doesn't matter where something is posted, just give us a url, but some people don't operate that way.



It's a free, permanent host for your blog articles with a built-in community and monetization layer. There's only so many free hosts out there that I'd be confident will be around in 5 years, and Medium is one of them.


Yep, Medium was free and everyone donated content... then it put up reading paywalls and conned everyone, I'm also surprised when I see people writing on there.


Same reason people still post on X.


And I’m sure the boundary on what constitutes ‘badness’ is something everyone can agree on!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: