Luckily very few people can configure and are interested in local models. But your nearby datacenter running Chinese open-weight models is also good enough.
My point is that dram demand is mostly orthogonal to whether everyone is using open weight models or secret weight models. Heavy demand for local models (whether secret or open weight) will require even more aggregate DRAM than for shared.
Demand will only go down if people reduce their use of these AI tools. Given how much folks here complain about quotas, I'm very skeptical that will happen willingly.
Open weight models allow for repurposing existing hardware locally, and there's a lot of it around - far more than the amount of new RAM being supplied. So they add some short-term downward pressure to the price. (But not very much, since these datacenter builds are long-term investments that are targeted at eventually running far larger models.)
If regular people can repurpose old hardware, so can shared providers, who can extract more value from the hardware and thus afford to pay more.
In a constrained market, supply and demand favors folks who can most efficiently extract rent. Local models only make sense in a world with abundant compute and energy.
The difficult part is the place and route algorithm, not the bitstream. The proprietary ones already take quite a long time to solve: I regularly have 12-24h runs. Perhaps an open source one could do better? But it's not quite as straightforward as reverse engineering a proprietary bitstream.
As someone actively working on nextpnr support for a fairly new FPGA architecture, it really is amazing that we have something like that in the open source world.
YosysHQ are one of my favorite companies to exist.
Nextpnr and Project X-Ray are amazing projects. Reverse engineering the physical map of, say, a 7-series FPGA is no small feat. However, I wonder if they'll ever be able to really compete with Vivado without getting access to the characterization models for timing. I would love to switch over, but the Fmax of my project routed with nextpnr is less than half of what I get with Vivado.
When I first started doing chip design my boss paid more for tools per year than he paid me ... now days open source tool chains are leaping ahead ... I don't need a boss (or VCs) in order to design chips
I have to admit that I haven't looked too closely into this but my understanding is that place & route is essentially an NP hard optimization problem. Would it be possible to translate this into a SAT problem and solve it with a state of the art SAT solver?
It's surely possible but if it's, for example, 10% slower, that easily eats into execution time and that directly translates into a sense of "maybe it's just worth it to pay the license fee for this year" after just a few 20h place and route runs.
Of course, if it were faster, that would be a huge win for the open source implementation.
For reverse engineering, you still need access to the FPGA tools provided by the vendor, to see what changes in the bitstream when you change the design.
If the bitstream is encrypted, you will not see the changes, so the only way is to reverse engineer the Vivado executables.
You do not need only the bitstream, but you also need a huge amount of timing parameters. In theory, they could be obtained by fuzzing, but that would require a huge amount of executions of the Vivado tools. So again the most plausible method is to reverse engineer the Vivado executables, to get the timing parameter database.
In some countries that should be legal, as such reverse engineering might become the only way to use the AMD FPGAs that one buys legally.
I think folks in this thread are underestimating how expensive it is to serve a SoTA model at 100 tokens a second. In addition to the $500k in capital costs, you also have significant electricity costs.
This stuff is expensive because supply is much lower than demand. If everyone was to run their own hardware with a batch size of
1, we'd have 100x more demand for inference hardware and electricity than we do now, and people would be even more frustrated. Efficiency is everything, and we need all the economies of scale we can get to meet demand.
But that's why you shouldn't expect local models to provide quick real-time answers, at least not with the same smarts as SOTA models running in the cloud. Slow batched inference (if possible - RAM capacity can obviously be a challenge with typical models and end-user hardware) can be a lot more effective.
My point is that it is WAY more efficient if we put the world's DRAM supply into a shared inference pool instead of stranding it in local machines where it won't have as high of batch size or utilization.
The cost of not being efficient is even higher DRAM costs than we have now, given supply and demand.
Much of the world's DRAM stock is sitting idle in consumers' local machines and on-prem servers. If that DRAM gets some use, even "inefficiently", that's a meaningful decrease in demand.
That DRAM would get even more use if it was removed from these machines and placed into a shared pool :)
I joke, but thanks to the brutal DRAM market there has been some movement in this direction lately...
I think the question of who controls the model is far more pressing than the question of who owns the DRAM.
It's easy to rattle off a half-dozen different vectors of likely enshittification over the next few years -- ranging from increasing censorship, to lower rate limits, to removal of existing features and forced addition of unwelcome new ones, to extortionate price increases, to unexplained and irreversible account bans. The only way to avoid them all is by running weights you own on hardware you control.
How smart and how fast is your local model? Those are certainly important questions, but "Does it exist at all?" is more important.
There isn't enough hardware in the world for everyone to run their own SoTA model. The only hope we have is if we work together to host these on shared infrastructure, benefiting from >50x economies scale due to batching, etc. That infrastructure doesn't have to be owned by greedy corporations.
Honest question. Commons, Guava, Spring, and more seem to take this approach successfully (as in, the drawbacks are outweighed by the benefits in convenience, quality, and security) in Java. Are benefits in binary size really worth that complexity?
And before someone says “just have a better standard library”, think about why that is considered a solution here. Languages with a large and capable standard library remain more secure than the supply-chain fiascos on NPM because they have a) very large communities reviewing and participating in changes and b) have extremely regulated and careful release processes. Those things aren’t likely to be possible in most small community libraries.
Why? It's the essence of "Simple Made Easy": you don't have other code to complect with. You have a smaller interface, focused on a singular goal. When a library has to work as a standalone project, it can't be accidentally entangled with other components of a larger project.
Smaller implementations are also easier to review against malware, because there are fewer places to hide. You don't have to guess how a component may interact with all the other parts of a large framework, because there aren't any.
There are also practical Rust-specific concerns. Fine-grained code reuse helps with compile times (a smaller component can be reused in more projects, and more crates increase build parallelism).
It makes testing easier. Rust doesn't have enough dynamic monkey-patching for mocking of objects, so testing of code buried deep in a monolith is tricky. Splitting code into small libraries surfaces interfaces that are easily testable in isolation.
It helps with semver. A semver-major upgrade of one large library that everyone uses requires everyone to upgrade the whole thing at the same time, which can stall like the Python 2-to-3 transition. Splitting a monolith into smaller components allows versioning them separately, so the stable parts stay stable, and the churning parts affect smaller subsets of users.
worth clarifying the build parallelism is because the fundamental unit of compilation in rust is the crate. so the main option available to cut down the running time of a large crate is to split it into smaller crates.
> Honest question. Commons, Guava, Spring, and more seem to take this approach successfully (as in, the drawbacks are outweighed by the benefits in convenience, quality, and security) in Java.
Commons and Spring have spent significant effort to break themselves up in the past, and would probably come as aggregations of much smaller pieces if they could be started today with the benefit of hindsight.
That dead code might have "dead dependencies" - transitive dependencies of its own, that it pulls in even though they are not actually used in the parts of the crate you care about.
In the worst case, you can also have "undead code" - event handlers, hooks, background workers etc that the framework automatically registers and runs and that will do something at runtime, with all the credentials and data access of your application, but that have nothing to do with what you wanted to do. (Looking at you, Spring...)
All those things greatly increase the attack surface, I think even more than pulling in single-purpose library.
The same issue occurs whether you bundle all the code together or not, it's just that if you bundle it together you don't see what's happening and you can't use only part of it easily.
Yeah I’d agree that multiple crates under one project is basically the same as 1 large crate. The real problem is how many people you’re trusting and it’s all coming from the same person.
PoE is lousy for sensors. The switch will cut the power if you draw less than 10mA (480 mW), so regardless of PHY efficiency (which is terrible compared to most RS-485, CAN, or even radio ICs), you are REQUIRED by the spec to generate heat that will mess up your sensor measurements.
>The switch will cut the power if you draw less than 10mA (480 mW), so regardless of PHY efficiency [...] you are REQUIRED by the spec to generate heat that will mess up your sensor measurements.
Out of genuine curiosity, could you elaborate on this further, or share some sources I could read more on? I knew that was once the case, but my understanding was that significant improvements were made for the Maintain Power Signature (MPS) requirements with dual signature and PD standards in the 802.3bt update. According to [0], in the section on 145.3.9 PD MPS:
>"To further reduce minimum standby power consumption for PoE systems, Type 3 and Type 4 dual-signature PDs can make use of optimized MPS timings when connected to a Type 3 or Type 4 PSE, as shown in Figure 19. PDs assigned to Class 1 through 5 must draw a current of 10 mA for at least 7 ms with no more than 310 ms between pulses. This translates to an average power consumption of 12 mW per pairset, or about 1/10th (12 mW/ 124.6 mW ) of the Type 1 / Type 2 minimum pulse average power consumption."
So my assumption was that the spec had significantly improved on this front starting around 7 years ago? I mean, I'm aware that there can be a very, very great deal of lag time between specs and sufficiently cheap and developed new chipsets taking advantage, but I don't think that's the spec's fault either. In principle if the market was there (and yes, it isn't) the tech could meet it right? My extremely limited experience too is that typical wireless battery powered setups can be sensitive to heat as well in the few applications I've dealt with where it's significant, which makes me wonder if in practice in some cases it might be better to use an IR sensor aimed at a semi-closed or closed but air separated material with known (presumably as close to 1.00 as feasible?) thermal emissivity.
Still, there's lots of sensor use cases where it just doesn't matter, but it'd be nice to be able to hard wire+network on the cheap stuff that's very isolated from wireless signal and physically awkward to get at. I'm fully cognizant though that it's a dream unlikely to be realized, just a personal wish there was more PoE IOT stuff (and while we're at it with magical dream lands that it all had open fully local APIs and everyone worked on first class Home Assistant support and...).
Good luck finding reasonably priced switches and low power PD ICs that support type 3 or type 4 PoE.
Also, supporting those tiny pulses requires large capacitors to hold a charge in between pulses. That plus the required magnetics make PoE sensors way more bulky and expensive to manufacture than old fashioned RS-485 sensors.
I remember using this thing when I was a kid, trying to figure out how all the switching effects worked, so stumbling on this manual many years later was really satisfying...
I had the misfortune of writing a complicated WPF app from scratch circa 2010-2011. Performance using the WPF widgets was terrible compared to HTML/Javascript/Blink; we ended throwing away most of the WPF code other than the main shell and a few dialogs, reimplementing the importantant stuff with immediate-mode Direct3D/Direct2D to get the necessary speed.
I recall wasting a lot of time staring at decompiled .NET bytecode trying to understand how to work around many problems with it, and it was clear from the decompiler output that WPF's architecture was awful...
Most software uses 10x more memory than is necessary to solve the problem. In an ideal world, developers would stop building bloatware if their customers can't afford the DRAM.
I agree, OTOH there are many very cool things that we can build if we're able to assume a user can spare 2GB of RAM that we'd otherwise have to avoid entirely like 3D scenes with Three.js, in-browser video/photo editing. Should be making sure that extra memory is enabling genuinely richer functionality, not just compensating for developer laziness (fewer excuses now than ever for that).
While Blackstone and other PE firms are involved in buying those assets directly (part of my old job), Blackrock is also indirectly involved by buying up massive portions of the REITs listed by these firms, which validated the business in the first place. Without the extremely insane amounts of money pumped by Blackrock, Vanguard and State Street into these structures, all for some measly 4-5% return (laughable for most sophisticated investors but apparently good enough for these guys), they were able to put the accelerant to the fire. Neither BX nor any other PE firm would be doing this model if a market didn't exist for it.
While I'm obviously biased here, imo Blackstone is much better still because you don't see Steve Schwarzman go around pontificating while using the voting rights of passive investors to force certain behaviors upon the boards of nearly every company.
Surely it is a more efficient use of DRAM to run inference on shared hardware with large batch sizes and more utilization.
reply