And their disk-based caching is amazing. I got a long 700k context session spanning more than a week, with pauses in between that was longer than a day, and some rewinds mixed in as well.
If I understand correctly, both the staging database and the production database share the same volume. Thus, production data was gone as well after deleting the volume.
You don't. You are missing the part where the LLM had a token which blocked access as expected. Then the LLM searched the source base, found a different token with the delete privs and then used that.
PS That warning happens in staging envs too, the LLM doesn't know which env is which by design.
Huh that's not what I gathered from the tweet at all. If I am going to write a five why's analysis, the immediate cause is the LLM wrongly decided to delete a volume, while the root cause is the bad design to co-locate staging and production data in the same volume. The writing was quite vague though, let's wait for a response from railway.
Based on the release schedule of 3.5 previously, my optimistic take is that they distill the small models from the 397B, and it is much faster to distill a sparse A3B model. Hopefully the other variants will be released in the coming days.
Very nice TG improvement from Flash Attention KQ fusion. Is it something that was already done in ik_llama.cpp? If not, then it will be a welcomed addition for hybrid CPU/GPU inference.
Stats from pi:
↑400k ↓438k R432M 71.9%/1.0M
Half a billion tokens, $2.12
reply