Heh. I built "Fusion" a few months ago as an MCP using OpenRouter. The idea was to give Claude a "panel of experts" to go talk to when it got stuck.
After extensive testing and benchmarking I discovered that when you ask one model to judge another's response you don't actually get a better answer. You are just asking it "how closely does this resemble the answer you would have given me." Additional rounds and all the "obvious" solutions that pop into your mind reading the proceeding sentence are essentially just cranking up the temperature.
I did find a solution, but it is insanely expensive. Maybe if this gains traction I'll release mine.
Prompt matters. Obviously if you want another model opinion you must generate from the scratch using the same prompt and then you can try to synthesize, but working with an existing response can work if desired. I use explicit instructions to find issues with assigned severities and then these are going through the panel of judges, only issues passing certain threshold are fixed in the original response.
I'll share a revelation which vastly improved my results: tell judges to evaluate truth and usefulness/should-be-fixed axis separately. Because inevitably with a prompt that is forcing to find issues you will end up with nitpicks. Plus truth axis allows to better evaluate the issue-finder models for your use case.
That's some part of what happens when I generate explanations like this one: https://hanzirama.com/character/%E6%9D%A5#explain - at this point the site is a small side product of my LLMs-evaluation machinery.
Bonus content for patient readers: if you need top quality you will likely need to pin provider(s) on OR, :exacto is not enough to get good repeatable results especially for open-weights models.
I've found that if I tell a judge that the answer came from a small and weak local LLM, it will pick the answer apart brutally...but since I have not done this systematically, I dont know how well it generalizes past my vibes.
Anyone else fell like if you can trick the LLM into a mode where it "feels" superior, it will act the asshole very well?
Yeah. I usually do this by telling it to be adversarial and find gaps and holes. Not fool proof but it does seem to increase the quality. It has helped when using local models in particular.
I made a rough version of this in 2024[0], interesting to see that the idea is still around. I had the ability to set "quality thresholds", but it didn't seem to matter, the frontier models pretty much always agreed with each other and scored the answer highly, I should revisit it since it is a whole different ballgame than it was 2 years ago.
I think it depends on whether the answer is verifiable.
I have tested two judge models in my apps:
1. Judge model for a resume tailor. It evaluated the result resume vs the base resume and JD and judged it out of 10 on fit and honesty. It worked well and was useful.
2. Review model in my LLM trading bot platform. It reviews decisions from the Main model. The problem here is that the bot is navigating ambiguity. So unless the Review model catches an outright blunder (e.g. making a decision on wrong candle price or a BUY when it should be a SELL), the Review model can do more harm than good.
First, it adds latency to decisions, decisions take twice the amount of time (like be 60s instead of 30s for Gemma 4 31B). Second, it can make the bot too cautious, because Review model only runs on BUY/SELL decisions and not HOLD decisions, so the bot will only make less trades instead of review model increasing number of trades (because of latency and cost).
So overall, I think you'll get better results with a better model single shotting it rather than a review model if the answer isn't easily verifiable. But then why do you need a judge model and not just have the same agent review itself?
ALSO, if you read the reasoning text for a reasoning model (like Gemma 4), you see that it ALREADY reviews itself. So it's doing its best, re-review isn't really adding information. It's an interesting experiment, but you need to evaluate on a case by case basis.
I've started to have different models review things like architectural planning docs- and I think for these more "fuzzy" outputs the differences between the outputs can be quite different and I can use my own "taste" to pick the best one.
I don't think it would work without a human in the loop but it is surprising to me how varied models' vibes are and how a system design varies by what it thinks is important to include and emphasize.
I’d be interested in the benchmarking if you ever write it up! People do seem to assume LLM as a judge/panel improves outcomes (and arguably it does in cases like code review?) but I suspect it is very situational and the priors from human panel of experts don’t always translate cleanly.
TLDR: Adafruit found out Flux was being dishonest about their user numbers. They also found and responsibly disclosed that they could get their Firebase keys by opening up Chrome's devtools.
> pretty sure this is stems from the insane US legal requirement to not export SSL technology to enemy countries
This is most likely OFAC. Lets Encrypt could apply for a license to do business with sanctioned entities, and given their use case it would most likely be approved.
OFAC regulates commerce, not speech. Let's Encrypt is not doing "business", they're operating a free informational service. Lots of organizations interpret any information exchange as subject to OFAC regulation, and you and Let's Encrypt have good company in this interpretation, but I think it's unnecessarily ceding ground.
The government may use as wide of an interpretation of commerce as they can get away with. We've seen this happen before [0]. Sure, Let's Encrypt isn't taking money from the entities they offer certificates to. But the OFAC desk jockey assigned to that case only has to concoct some sufficiently plausible-sounding trail of money connecting the backing 501(c)3 and a sanctioned entity in order to levy penalties, and the legal team will not like that risk, even if it's unlikely for OFAC to win on appeal in a court.
This is true, of course, and I understand why some companies don't want to take the risk. But I would hope that Let's Encrypt would take the opposite stance. They were born out of the EFF and have EFF & ACLU board members! These orgs live for this type of legal fight.
IANAL, but it seems like the argument from Wickard v Filburn would apply to LE. They may not be taking money but they do impact the commerce of the market for certificates.
I disagree with that ruling, and I have some serious problems with sanctions against entire countries/regions, but it definitely makes sense that LE would interpret it as being impacted by OFAC.
Providing information (website, CT log, CRL) is fine, but creating a certificate on request is clearly a service. How is that different than providing a computation or LLM output in response to a prompt? Moreover, it is clearly not just the physical act of signing a CSR, but the verification of ownership that comes with it. That's just as much as service fully automated as if a human were doing it.
Now, does this serve a policy purpose? Perhaps not--US computers trust plenty of non-US CAs that could continue to serve these customers. But that's not how comprehensive sanctions are set up, they are effectively a complete embargo.
A better question is whether telecom carveouts (general licenses) in the sanctions may allow this. That is a country by country question as each one is worded differently.
OFAC has authority to regulate commercial services under the Commerce Clause. Not all services are commercial in nature. There is no economic exchange inherent in running a certificate authority. If LE charged money for certificates, that would be a different matter. LE's differentiating factor from the previous era of CAs is that they are non-commercial.
In an alternate universe, Let’s Encrypt has a chat with someone and then states, publicly, like a speech, that they think that person owns a domain.
In our universe, Let’s Encrypt lets a client open an “account”, enters into a contract with the client (the contract is the topic of this entire post), and gives the client an API by which the client requests a certificate. Then Let’s Encrypt grants the certificate. Maybe the certificate is somehow speech. The rest sure doesn’t sound like speech to me.
To be clear, the differential here occurs because OpenSSL does the wrong thing. Go is correct to fail closed here, and it’s very hard to imagine a setting in which Go failing closed is a relevant security differential.
Just to be clear, OpenSSL isn't doing the wrong thing, based on the description in the blog post. The specification allows and even requires behavior similar to that.
You either die a hero, or live long enough to see yourself become Akamai.
The reason everyone came running when Cloudflare first started was obviously the "burn VC money to gain marketshare" but it was also the sheer simplicity. They had one product and a handful of features.
Until someone on the business side takes a step back and says "when I mouse over 'Products' on the homepage, why the fuck is there a 'See All Products' link" it will be impossible to have a usable customer experience. Start killing things and making them features.
Good meeting, appreciate the engagement, added new action items "add yet another way to do AI in Workers, say something about long schedule agentic driven" and "go to market with new unrelated Web Application Firewall product".
> Yeah, but Google has the money for this. They are quite literally the most profitable company in the world.
"Alphabet announced that its 2026 capital expenditures are expected to be $180-$190 billion, and that it expects 2027 capital expenditures to significantly increase [...] over the 12 months ended March 31, 2026, Alphabet generated $174 billion of operating cash flow"
Which is just wildly backwards. It is the same mindset of the cyberpunk "privacy advocates" of the early 2000s, move your stuff to Sealand or Switzerland.
The fundamental flaw with this plan is if your fear is genuinely of the United States, your data is far more protected inside the US. The intelligence community has no restrictions operating on foreign networks and servers.
Rather than go to a FISA court for approval, we just hack your box and take your data. Or ask a European intelligence service to use the much more lax laws to compel its disclosure.
Yes, data collection happens on US soil. But ask anyone who has worked on the inside how much of a pain it is to view or process USPER data.
>The intelligence community has no restrictions operating on foreign networks and servers.
there have been several bombshell revelations in the last 1-2 decades which indisputably show that the US intelligence community also has (effectively) no restrictions operating on US citizen networks and servers, and often does so with the direct help of US companies.
the legal standards are worthless when they can just be ignored without consequence. when the standards happen to work, just buy the data from the private sector.
secondly, these changes are also about mitigating any retaliatory decisions made when the US government gets upset at how tall another country's leader is, or whatever.
I wish I believed that they have to go to the FISA court for much of anything any more. Instead they go to Palantir and the like which simply buy the data and aggregate it. Very similar to the process of money laundering. And for the data that can't be bought there's the five eyes work around.
As an advocate (and practitioner) of European digital sovereignty, let me tell you, at least from my perspective, it has absolutely nothing to do with fear of US intelligence agencies spying on us, and everything to do with the catastrophic consequences of an unreliable and unstable American government pulling the plug on our vital infrastructure, or at least the very least weaponizing our dependency on American companies.
I live in Denmark, a country whose primary threat at the moment is the USA, and the thought of Donald Trump effectively having a kill-switch to our highly digitalized society is absolutely frightening. Reducing our dependence on American tech means that we are less vulnerable to a hostile power using it to extort us out of our territory. We cannot remove the threat entirely, but we can make the pain less extreme.
Other EU countries are also seeing things this way, that the US no longer has a stable government and is no longer a friendly country. Who cares about American spying when the real threat is your country being turned off?
As a Canadian who has been listening to the "51st state" wordvomit coming out of US administration your comment is very apt.
For some reason I can't fully grasp, a LOT of US citizens are ignorant to how the rest of the world is perceiving them at the current moment. There's countless US articles talking about US/Canada relations as if it is a trade dispute and that they think Canadians are eager to re-unite and go back to the way things were without ever addressing the threats to our sovereignty. Then you have comments like the parent to your post who is....wildly off the mark thinking that in a point of contention we'd prefer to keep our data on US controlled systems because their government would need to follow their own legal processes to acquire data of a foreign/hostile state??????
This becomes even more striking if you look at who they surveyed:
> They asked citizens across the G7 (Canada, France, Germany, Italy, Japan, the U.K., and the U.S.)
They're not even asking those from the half of the world that has been bombed or coup d'etated by the US in the last half century. They're asking those who should on paper dislike the US the least.
People from those countries ranking the US below India, Mexico, South Africa and Turkey is quite something. Israel coming in at 55th out of 60, below Saudi Arabia, is also fantastic proof of how incredibly unrepresentative these "representative democracies" are of their populace. The US and Germany are even 2 of the 7 surveyed countries! Without them I wouldn't be surprised if they came in last, under Iran and China.
For some reason I cannot grasp Canadians think the US citizens think about them at all. We may as well not have a northern neighbor, all that most of us think exist between Michigan and Alaska is snowy wilderness.
The parent to their post was saying your risk assessment of which country should host is incorrect, given who you believe to be your biggest threat, i.e. your preferences are not aligned with reducing your risk.
> Is the rest of the world going to stop trying to immigrate here, though
Yes. If you look for example at people from the world leader in science and technology, China, then there is a very noticeable drop in the number of young Chinese people wanting to study in,or after graduation stay in, the US.
>the more US citizens have to worry that that foreigners attempting to immigrate here for the long-term have a plan involving exercising influence on US politics from their position inside the country, in order to punish existing US citizens.
I am not aware of any, but I would love to hear any and all plans to punish US citizens.
> worry that that foreigners attempting to immigrate here for the long-term have a plan involving exercising influence on US politics from their position inside the country, in order to punish existing US citizens.
This is indeed a valid concern for immigrants like Elon Musk, Peter Thiel, and Rupert Murdoch. They’ve all had a significant deleterious effect on the US.
For the average immigrant just trying to live their life, it’sa much less valid concern. You could equally point the finger at the millions of natural-born US citizens who believe that the US is a “Christian nation”, who are well organized, and are trying to change the laws in that direction.
The entire reason that people in Europe care about moving their digital infrastructure from American cloud companies to European cloud companies is because they're upset about current American politics, particularly Trump being president. Immigration is one of the biggest issues in American politics right now, and is also a pretty big issue in the politics of basically every European country.
The comment I was responding to was claiming "For some reason I can't fully grasp, a LOT of US citizens are ignorant to how the rest of the world is perceiving them at the current moment." in the context of making an argument that American citizens should be concerned about people in foreign countries feeling threatened by potential actions of the US government and reacting to this by reducing their dependence on American cloud companies.
And my genuine response to this comment is that US citizens are less ignorant of how the rest of the world perceives them than the commenter thinks - because the rest of the world is still trying to physically come here (and often still trying to come here illegally, or remain here illegally - ICE is still arresting tons of people).
Immigrants who dislike the US generally don't stop talking about their problems with the US when they move here - but now they use their status as an immigrant to make a moral claim, that they are more authentically American than those who didn't move here, and so their understanding of what America is and should be is better and more moral than that of existing American citizens. You can go to any college campus and see what the foreign students are saying about the US, including what they're saying about the very policy of giving visas to foreign students to begin with. Or you can go to congress and see what the immigrant members of the house of representatives are saying about the US. There's no conspiracy.
The fear is not "NSA is snooping on our customer data", it's "Trump has a beef with our premiere minister/president, and Jeff Bezos accepted Trumps request to turn off AWS from them" that's the fear.
We're far beyond the default assumption that NSA snoops on absolutely everything, and more about protection ourselves from trade wars, tariffs and similar blockages as what Microsoft did with the ICC.
Businesses are scared to lose access to data hosted at US entities, because this recently happened, so they have good reason to fear something like that.
AFAIK, the US has never done that with IP space, but if we did see evidence of that, then you'd see similar worries about that for sure. But I think most of us see it as pretty implausible to happen, since the consequences of such move would be huge, and would probably end the internet as we know it today.
The US won't want to do that because China will have an alternative ready within a day and every China-friendly country will migrate to it. Now US leadership is demented, luckily they've never heard of IPs and I really don't believe it would happen. I think the likelihood of them starting WW3 is more likely than using IPs for power games.
Compelling Microsoft to turn off your Office 365 at least requires Microsoft to be complicit. Sovereign infrastructure didn't protect Venezuela or Iran.
If you rely on services provided by the US, you are one signature away from the current president forbidding US companies to provide service to you. This could be extremely disruptive.
The internet worked for so long because people responsible for each little island did what was for the most part in the best interests of the rest of the islands. If you didn't, other islands would shut off their links to you. Law enforcement was a last resort because 1. the courts don't move at the speed of the internet and 2. nobody wanted the internet getting top down governmental regulation because it was trans-national.
Cloudflare spent a bunch of venture capital to give away expensive things for free and buy market share. If you convince all the grocery stores to move to your island, you can operate a den of criminal activity with no fear of everyone else shunning you.
Talk to anyone who fights botnets, malware, or online scams. Once you hit the Cloudflare dead end you just have to give up. Law enforcement isn't going to take up a case where only 7,000 peoples computers are infected, and Cloudflare isn't going to investigate and take action themselves.
I do fight botnets, malware and scams. Criminals flock to any service where they can spread their stuff and appear legitimate. Google, Facebook, Vercel, Netlify, Amazon, Oracle, Microsoft, OVH, etc. In my experience, Cloudflare is not any more or less of a dead end than any of the other providers, there are some others in that list who deserve being called out a lot more.
Yes, Cloudflare has always been really shitty and automated at responding to abuse reports, and because they are the front-end connection, it is impossible to pursue the report against the 'real' host unless Cloudflare is willing to provide you with information about where that host is: which they won't typically do, even if you are a fellow infrastructure provider. It's been several years, so maybe they have gotten better, but I would be surprised.
reply