Knowing what to build (and that it hasn't already been built or bought elsewhere in the company) requires bits of information / person-to-person networking / visibility into the state of the company that not all managers or VPs have.
In fact, most people don't have that knowledge, because they're busy with existing or "local" problems , or because they didn't know to ask Davis the DBA or Kris the Kafka Cluster Manger or Alex from accounting if we have <resource> our team can plug into and use. "Oh, yeah, El has one under their desk they kick occasionally, ask them to hook you up!"
If you solve this problem in a turnkey way Fortune 500 companies will write you very large checks to help them prevent such duplicate waste, and will in turn become the 15th system they need to integrate....
That XKCD joke about "how 14 standards becomes 15 standards" also applies to the class of "one system to integrate with and report from all other systems"
You can also observe this in games like Dyson Sphere Program, (which is all workers and queues and buffers) where adding a buffer storage section of a belt only hides the fact that you are under-producing one of the components required.
The buffer smooths out bursty flow but you don't want that in the middle of the pipeline, as it actually represents mid-pipeline inefficiency. You should actually be fixing the upstream or downstream problem.
[1] or other automation games like Factorio, Mindustry
I'll note that speedrunners absolutely buffer mid-pipeline in Factorio, and not just for hand-crafting purpouses. Sometimes you're waiting for R&D, sometimes you're just running half the machines for twice as long, giving you the same output while saving on build costs. The actual bottlenecks are constantly shifting. "I'm not speedrunning!" you might say, but every regular game could've started as a speedrun that could've gotten you to where you are faster.
Understanding the tendency of mid-pipeline buffers to hide problems is useful, but scorning them entirely is also suboptimal.
Would you trust the operators of an orbital ring with your life though? The modern world seems to mostly be able to do sky scrapers at this point but ... ehh. We still struggle a bit even with just airliners.
As others have said, the server receives a function call request and decides what to do with it. Whether or not a user or session is currently authorized to perform the action they want is something you evaluate inside the function -- but you
E.g. https://github.com/beyond-all-reason/teiserver/blob/f6ff6d68... here, we are in a function call that handles requests to send a chat message into a game lobby. We updated the flood protection timestamps above, and then determine if the user has permission to send the message, and finally if they are speaking just as a client or via the Coordinator. Then we reply the updated state back to the websocket.
This is what I found beautiful about GenServers, by the way. It's a very explicit "starting state, consume from queue, and each message handling function returns the next system state", which makes it very clear that a state transition does not occur unless you reach the bottom of the event-handling function call, and at that point, it's an atomic state transition of the entire internal state.
In summary: don't trust the client. Independently determine, server-side, in the function itself, if the function call you just received is valid given the current state, not rate limited, etc, and then from there you can choose if you want to act on it.
Disclaimer: Elixir noob, but I have been using Teiserver to learn.
Nice, I might try using this as I'm currently on 16 GB of RAM / 11 GB VRAM and feel like the VRAM is usually idle except for when I game or try a local LLM.
It would be nice to have dynamic scaling or even just auto-shutoff on VRAM pressure if I forget I have this enabled and then fire up a game or LLM.
While I'm already using a Raspberry Pi, NATS, docker container, etc and am happy with my setup, I am looking forward to reading over this and seeing how it works! The use of an RTOS and berry looks interesting.
Thinking about it more, on my setup I have a DVI port on the motherboard that I would be happy to use with a DVI cable, but I instead need to buy a DisplayPort <-> DVI converter cable to plug directly into my video card...
Yeah, seems like an obvious thing for some motherboard providers to want to provide.
"[would have spent] $1,199 with Anthropic, $980 with OpenAI"
How many tokens is that, input/output-wise?
(a) I'm curious if you feel like you got $2000 worth of value out of them in the last month?
(b) I'm also curious if you would have gotten similar quality out of a slightly lower-cost provider of an open-weight model? (e.g. Kimi K2.6 and DeepSeek v4 Pro) and what the spend would have been for that.
I myself have managed to spend not quite $4 on OpenRouter and have felt it was very worth it; I just have much smaller, or more targeted requests I guess. (Lately, adding features to a static site generator in Python, or setting up log forwarding via a docker compose file)
Input tokens: 52,545,485
Output tokens: 5,767,253
Cache create tokens: 5,112,029
Cache read tokens: 1,475,069,465
Total tokens: 1,538,494,232
Total cost: $1,199.79
OpenAI Codex:
Input tokens: 52,598,013
Output tokens: 4,681,867
Reasoning output: 2,091,063
Cached input tokens: 1,153,844,864
Total tokens: 1,211,124,744
Total cost: $980.37
I'm confident I got value out of OpenAI - I've been mainly on Codex for the last few weeks.
Not so sure I got that value from Claude, just because I've been using it a lot less and somehow the price came to about the same as OpenAI.
Given the code I've been able to build in the past month I genuinely do think I got value for the API price version, and (don't tell OpenAI or Anthropic) I think I'd have paid full price.
I've not spent nearly enough time with GLM-5.1 and co to compare, but I do know that the prompts I'm using with the agents are not prompts I would have expected to work just three months ago.
If it were me I'd be asking "How long would it have taken me to do that, and what's the rate I'd have been charging for the work I would have been doing otherwise?"
Personally, I've probably spent $60 or so on OpenRouter in the last month or so and got a working project out of it that it would probably have taken me a fortnight to knock together (which is inevitably an under-estimate because it covered things I'd have to learn but K2.5/6 already knew). There's an orders-of-magnitude gap there.
I don't think your counterpart is arguing that OpenRouter created DeepSeek. Rather I suspect their argument is that there are 13 providers listed on OpenRouter for DeepSeek v4 Pro that are competing on price. (That's the default balancing algorithm in OpenRouter, roughly: weighted towards the lowest price and was available in the last 30 seconds)
If any providers are able to turn able to sustainably turn a profit, OpenRouter allows them to compete in an open market to process your tokens (or anyone else's tokens).
Thus anyone subsidizing tokens bears the brunt of the compute load and gains not much more than name recognition and tokens to train on, but since switching to a different provider is a matter of changing one setting in the config panel (and can be set to auto-switch based on price), switching costs are very low. Providers of open models via OpenRouter have almost zero ability to lock-in users.
So this claim that all 13 providers are selling subsidized inference is... a tough claim to swallow. Maybe some of them are, but all of them? I assume at least some providers want to show profitablity, and are pricing their service accordingly.
I asked the LLM to explain, rephrase, or rewrite things until I was happy. Some examples :
I asked for examples of how the algorithm worked. I asked for examples of how to call the code. I asked for a happy-path unit test and a simple error-handling unit test. I asked it to rewrite something as a pure function. I pointed out an obvious race condition and told it to guard against that issue. I asked it to rewrite a function in the style of this other function. I told it to separate one function into two separate functions that handle the first step and the second step separately.
Etc etc.
If you don't understand it, ask for more or better comments, or better variable names, or cut down the scope into a smaller section, or more examples.
Edit: also I almost entirely leave the LLM in read only mode... I tell it to make the smallest change possible, and tell it I will only copy paste it in its proposed change when I understand the change and where it needs to be made. That way it's my hands on the keyboard, interacting with the code by making recommended changes... 80% of the code is touched by me (via copy-paste) most-of-the-way before I will 'git commit'.
Sure, there was one recursive folder descent function that found the most recent file modification time that I didn't fully understand, but it's self-contained in a function, I don't care to learn every corner of file modification times, and it appears to work, so I left it as is for my static site generator.
In fact, most people don't have that knowledge, because they're busy with existing or "local" problems , or because they didn't know to ask Davis the DBA or Kris the Kafka Cluster Manger or Alex from accounting if we have <resource> our team can plug into and use. "Oh, yeah, El has one under their desk they kick occasionally, ask them to hook you up!"
If you solve this problem in a turnkey way Fortune 500 companies will write you very large checks to help them prevent such duplicate waste, and will in turn become the 15th system they need to integrate....
That XKCD joke about "how 14 standards becomes 15 standards" also applies to the class of "one system to integrate with and report from all other systems"
reply