Hacker Newsnew | past | comments | ask | show | jobs | submit | throwdbaaway's commentslogin

And their disk-based caching is amazing. I got a long 700k context session spanning more than a week, with pauses in between that was longer than a day, and some rewinds mixed in as well.

Stats from pi:

↑400k ↓438k R432M 71.9%/1.0M

Half a billion tokens, $2.12


Hah, that's because the prompt itself was only about 30 tokens. We need a much bigger prompt to properly test PP.


If I understand correctly, both the staging database and the production database share the same volume. Thus, production data was gone as well after deleting the volume.

1st hint - the API call only contains one volume:

    curl -X POST https://backboard.railway.app/graphql/v2 \
      -H "Authorization: Bearer [token]" \
      -d '{"query":"mutation { volumeDelete(volumeId: \"3d2c42fb-...\") }"}'
2nd hint - this gem from the tweet:

> No "this volume contains production data, are you sure?"


"If I understand correctly, "

You don't. You are missing the part where the LLM had a token which blocked access as expected. Then the LLM searched the source base, found a different token with the delete privs and then used that.

PS That warning happens in staging envs too, the LLM doesn't know which env is which by design.


Huh that's not what I gathered from the tweet at all. If I am going to write a five why's analysis, the immediate cause is the LLM wrongly decided to delete a volume, while the root cause is the bad design to co-locate staging and production data in the same volume. The writing was quite vague though, let's wait for a response from railway.


Should be about 10~20 GiB per session. Save/restore is exactly what DeepSeek does using its 3FS distributed filesystem: https://github.com/deepseek-ai/3fs#3-kvcache

With this much cheaper setup backed by disks, they can offer much better caching experience:

> Cache construction takes seconds. Once the cache is no longer in use, it will be automatically cleared, usually within a few hours to a few days.


Based on the release schedule of 3.5 previously, my optimistic take is that they distill the small models from the 397B, and it is much faster to distill a sparse A3B model. Hopefully the other variants will be released in the coming days.


His Vibe Coding book is invaluable as a textbook example of slop.


https://github.com/anthropics/claude-code/issues/46829#issue... - Have you checked with your colleague? (and his AI, of course)


Doesn't what's said at the link approximately agree? The 5m bug was said to be isolated to use of overage (API billing).


> EC2 instances on shared hardware showed up to 30% variance between runs due to noisy neighbors.

Based on this finding, I suppose the better way is to rely on local hardware whenever possible?


Very nice TG improvement from Flash Attention KQ fusion. Is it something that was already done in ik_llama.cpp? If not, then it will be a welcomed addition for hybrid CPU/GPU inference.


https://github.com/THUDM/IndexCache - Might be some expected issue when rolling out this. They don't have enough compute, and have to innovate.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: