More

syats · 2026-06-10T23:05:18 1781132718

Great work.

syats · 2025-11-06T16:07:10 1762445230

Cool! These are indeed very common graph-building steps.

Thinking outloud here, but some of these were supposed to be solved with RML (https://rml.io/) for the RDF paradigm. I witnessed a bit of their evolution: it started with similar operations as GraFlo and eventually they built some support for arbitrary java code. For example, say you want your node ID to be generated by concatenating the values of the firstName column and the lastName column, but only after some weird string normalization (think of making sure everything is utf8)... you woundn't want to make your schema-mappings Turing-complete, so you'd eventually have to allow for calling other functions. Any way, all of that was for RDF graphs, it's cool to see something like this for property graphs.

syats · 2025-06-30T14:15:49 1751292949

Been building several of these functionalities myself for a while... Happy to know someone more skilled did it also and released it publicly.

acrostoic · 2025-06-30T14:36:55 1751294215

thank you! hopefully you will find it useful

syats · on Sept 14, 2023

For anyone interested in how the obsession for organization can end: https://en.wikipedia.org/wiki/Paul_Otlet

0x445442 · on Sept 14, 2023

Or this guy... https://www.flickr.com/photos/hawkexpress/albums

syats · on Sept 7, 2023

I thought for a second the title was missing a (1936).

syats · on Sept 2, 2023

Link is broken.

syats · on Sept 2, 2023

Somewhere in HN there's a post about disrupting or revolutionizing the laundromat industry, where some person is showered in praise (and later money) for setting up this lousy system.

If it ain't broken, don't fix it.

WeAddValue · on Sept 2, 2023

Long time ago I lived in a shady building where it was broke, broken into that is to get the coins in the machine.

usr1106 · on Sept 2, 2023

And now the LavaWash server could be hacked to steal user data that a washing machine would never need, but the implementer chose to store without reasonable protection.

syats · on Aug 22, 2023

In countries communicating in non-English languages which are written in the latin script, there is a very large use of Latin-1. Even when Latin-1 is "phased out", there are tons and tons of documents and databases encoded in Latin-1, not to mention millions of ill-configured terminals.

I think it makes total sense to implement this.

syats · on Aug 16, 2023

This article contains an excellent description of the work of a mathematician. It should be part of any curriculum in the field.

creer · on Aug 16, 2023

The discussion notes are awesome too. Lots of examples raised.

syats · on July 17, 2023

Thanks for the replication, this is important.

One question, did you try to replicate the other result table (Table 3)?

If I understand correctly, top-2 accuracy would be 1 if you have only 2 classes, but it will differ from "normal" accuracy less and less as the number of classes increases (on average). So this shouldn't change the results for table 3 thaaat much as the datasets have large amounts of classes (see table 1).

In any case, top-2 accuracy of 0.685 for the 20-newsgroups dataset is pretty neat for a method that doesn't even consider characters as characters[1], let alone tokens, n-grams, embeddings and all the nice stuff that those of use working on NLP have been devoting years to.

[1] In my understanding of gzip, it considers only bit sequences, which are not necessarily aligned with words (aka. bytes).

ks2048 · on July 17, 2023

I haven't yet replicated Table 3 because most of those datasets are much larger and it will take awhile to run (they said the YahooAnswers database took them 6 days).

Also, I have only tried the "gzip" row because that is all that is in the github repo they referenced.

Yeah, you're right, the more classes there are, probably the lower the effect this will have.