Working on exocomp [1] and gooey [2]. The former being an agentic environment for pentesting and reverse engineering and the latter being a UI framework in Go.
Currently working on some networking parts, because I want multiple exocomp instances to be able to cooperate in terms of knowledge sharing and workforce sharing. So I'm experimenting with websockets combined with multicast DNS-SD via UDP sockets. Might be kinda nice if I can make all services discoverable and plug and play. Also using DNS-SD for my llama.cpp wrapper already, which allows local model and inference service discovery quite nicely already.
> Someone or something should vuln-scan these packages as they're published, as a number of companies do for NPM now. That would likely have found these pretty quickly.
No. It wouldn't have. That's the whole point of the miasma worm, because it changes too fast in its signatures and helper methods. The encrypted malware implant uses a changing AES-128-GCM key that's used to decrypt the payload, and that key is per-where-it-is-uploaded on GitHub. The code itself is dynamically renamed in its methods, re-used shuffled offsets for encrypted symbols, among other things. It's a mutating malware and the worst enemy from tools that rely on signatures.
Ironically, APT28/29 is somewhat relying on Microsoft being too slow to auto block users and repositories on GitHub that are the C2 infrastructure. Think about that for a second what this implies for your cyber strategy.
By the time you're able to scan signatures or "strings" you're already playing a cat and mouse game with a fully automated botnet, which you will never win. The only other ones I've observed during the last week that seem to be able to track this malware implant's changes were socket.dev. ALL other supply chain tools didn't even know about Miasma and re-invented it as a new campaign. They didn't have the skilled enough people nor toolchain to reverse the malware payload quickly enough to be able to keep up every 24h when they push out a new adapter for another ecosystem.
By fully automated I mean they're already using the credentials they stole less than 48 hours ago from a different package ecosystem, because the email addresses and names etc keep appearing from people who likely didn't even understand the impact of this self-spreading worm.
And having an IOC that checks for, let's say, any package that depends on bun won't help either because the malware will just use external means to re-download it. See the second PyPi campaign, where they just changed the dropper to use compressed WHL files and the setup.pth files that are auto-executed to download the dropper. They changed this after the PyPi maintainers flagged the first wave of malware droppers from the RedHat campaign.
As long as the package managers in those ecosystems aren't fully rewritten from scratch to accomodate for chroots, sandboxes, network and domain logs that are _only allowlistable per entry_ this won't change, and will stay being a feasible malware deployment strategy for supply chain attacks.
Repo for Mitigation Tool (I'm human so I play catch 21 with an LLM powered botnet) [1] ... Tech details in the blog post [2]
Also this is a problem across all package managers. Composer is also affected. Rubygems is also affected. NPM is also affected. PyPi is also affected. Go is also affected.
Nobody is talking about this, and I think this should be more openly discussed how much negligence and external trust we put in package managers in general. This really needs to change.
With this _actual_ attack, it would have been trivial to detect. The signature was:
1. Orphaned package adopted
2. Has post-install hook added
3. Which uses npm or bun
Yes, you're right - detecting this could have led to a more sophisticated attack. Security is always a cat and mouse game. The purpose isn't to stop every attack - it's to raise the costs for attackers and the visibility for defenders.
Any attacker who wants to attack 1000s of packages is going to necessarily leave some signatures, unless they're extremely careful. If they change one thing but not another, you can tie them both together.
Think of this like email anti-spam. It hasn't gotten rid of spam, but it has made it much more expensive to operate.
Combine this with a minimum package age to give the scanners time to run and humans time to inspect, and the ecosystem as a whole gets much more secure.
But I think the only way to establish these laws would be an IT competent judicative branch of the government... which, as we all know, is pretty incompetent in these manners.
Building a good and working coding harness with smaller models is really hard. Everything evolves around the limited context size.
Tools must be specification driven to reduce noise and high temp hallucinations, tool call shrinking needs to remove errors and tryouts of different formats of parameters (because LLMs always ignore descriptions in the JSON...), and you have to deal with long running agents because you can't afford them. Planner/orchestrator architecture, agent to agent communication need to be summarized, and then you have the messed up scheduling parts, because you need to prioritize short running agents and give the planner a tool to wait for outputs of spawned contractor agents.
And that's not even talking about sandbox vs playground read/write/access policies of tools.
Harness engineering, if done correctly, is quite hard.
And all of this works 60% of the time, every time.
Anyways, that was somewhat the summary of the last 6 months building my exocomp agentic environment. And it's still not satisfying to work with.
In my limited experience, the smaller the model, the bigger the harness. Where with something like claude or deepseek the context size etc just let's you give it bash access and step back; small models tends to do better with simple action - response , new context each call. Context management becomes a continuous activity. Its a fun space , and I have found big models decent at building and improving these harnesses for the small ones. Using /loop and just run a continuous test - build - test loop.
Yes, the miasma worm does this since the new Hades campaign.
Note that the 3rd wave now also uses a pth file in pypi packages that _search system wide_ for any index.js or .github/setup.js to find its own payload. It literally splits up the payload on purpose to avoid detection.
Microsoft's introduction of 2 hour latencies for vscode extension installations to mitigate the ongoing worm spread is absolutely bonkers.
They did not read the source code of the worm implant and have absolutely no clue how the worm works, if that is their response.
The only way to meaningfully stop the worm is by requiring manual confirmations for git commit/push actions and for the auto-executed hooks in all IDEs. Also, these scripts should be sandboxed to only be allowed to run and interact with files inside the same opened project folder.
Well, that, or setting the host system language to Russian. Which I am kind of expecting Microsoft to do next...
If you force every user to just use "--enable-unsecure-feature", guess what will happen?
This is not about improving security. This is about shifting blame.
A much better alternative would've been the introduction of sandboxes or simulation runs that would output which scripts and programs are running due to unpredictable dependencies. This way the user could check before the actual execution, and maintain an allow list much easier. That could be done via an npm update && npm upgrade workflow where the update generates the list that the user has to manually confirm.
Heck, even a chroot would be an improvement, and they're almost pointless these days, considering how good malware got at escaping chroots.
I don't think it's pointless. A large number (the majority?) of users probably don't need install scripts, so disabling them by default is a net security improvement. Those that do can enable the insecure behavior, which will become an explicit decision that is trackable, auditable, etc.
You're not wrong about sandboxing, but sandboxing isn't something that can just be blithely introduced to a large packaging ecosystem that previously assumed full system access. Doing so results in the same kind of regression you point out: if the sandboxing breaks peoples' builds, they'll just disable it and move on with their goals.
Most users don't need it. Having it on by default is a feature for malware writers not users.
But to your point, Node has had permission flags for a while[0] but allows everything by default. Npm could use them to increase security even more. I just hope it doesn't take them another 10 years to change the default.
Most packages don’t need it, but I imagine a large percentage of users do since most projects pull in an insane number of packages.
Still, “default off” is better. It would be nice if there were a lightweight way to fork upstream packages, and cache the native builds. It’d improve build times, make the build step more explicit / sandboxable and allow for easier binary builds for operating systems and processors that M$ treats as second class.
> If you force every user to just write "mut", guess what will happen?
This is the wrong analogy.
The equivalent analogy would be using a compiler flag that is triggered for all dependencies and all included libraries without a per-library or per-file changeability. Something like "gcc --force-mut-all-yolo".
Variables have scopes of concern. This new NPM feature has no scope. And that's what my critique is about, because it makes it still unpredictable if any of your dependencies of dependencies needs a script.
The spread vector of potential malware stays identical, because the reason the miasma worm is spreading so fast is because of dependencies of dependencies that are impossible to audit on a case-by-case basis, given the lack of sandboxes and the lack of allowlisting scripts on a per-dep-and-version basis.
I created a mitigation tool that can be used to fix/remove the worm from all infected repositories, and did a writeup about this.
On Monday, the Hades campaign introduced Composer, Go and Pip support. Before that it had only support for NPM and AI assistant editors. (Well, and Ruby btw but nobody uses Rubygems anymore it seems).
What even Microsoft gets wrong: This is the first worm that runs on all platforms in the code ecosystem. Developer host machines, servers, ci/cd runners. And all of them spread the worm to all repositories that are accessible on those machines.
You would have to completely shutdown 100% of all computers AND aws ec2 AND google cloud platform AND azure AND kubernetes clusters AT THE SAME TIME to beat this worm. It literally spreads across all infrastructure.
Kill switch, as always with APT28 malware, is setting the host language to ru_RU.KOI8-R (LANG environment variable). That disables the spread mechanism.
My Mitigation Tool (I'm updating it as new package systems are targeted ...):
That's like recommending to use the xterm on Windows. Statistically, nobody uses their computer that way anymore. The world has moved on since the 1990s.
I was only not affected because I use a heavily customized VIM, but even there can I not control how package managers like npm, pip or composer or go are behaving, because they will happily execute the malware payloads on install.
And time wise it's an absurd thing to ask people to manually download all whl files of all their dependencies, extract all those files, and then check whether there was malware in them or not. It's simply not possible to do manually.
An LLM agent will only be as good as the environment it operates in.
If you build your environment to be specification based, you have to make sure you have good specifications. If your "memory solution" uses freeform markdown notes, you already lost from the start.
Also choose languages with good unit testing built-in, and languages with unified code styles, and unified toolchains. If you use C++, assume that there's a million ways to build your algorithm. If you use JS, assume 10 different build pipelines. If you use java, assume bloat by dependency hell.
LLMs mimic the ecosystem's variety and variadicity(?). Languages like Go shine so well because it's a very opinionated language, where there's only one proposed way on how to implement things. And that's a good harness to begin with. LLMs are like children on the playground. You have to build better rulesets and fences to make them behave how you expect them to.
Also, check out qwen3.6 coder and heretic models. 30b is the sweet spot for coding and unit testing. For planning and designing, gemma4 is pretty good.
Currently working on some networking parts, because I want multiple exocomp instances to be able to cooperate in terms of knowledge sharing and workforce sharing. So I'm experimenting with websockets combined with multicast DNS-SD via UDP sockets. Might be kinda nice if I can make all services discoverable and plug and play. Also using DNS-SD for my llama.cpp wrapper already, which allows local model and inference service discovery quite nicely already.
[1] https://github.com/cookiengineer/exocomp
[2] https://github.com/cookiengineer/gooey
reply