> AUR is just a collection of user-produced PKGBUILDs.
Is that much different from the entire pypi ecosystem, and npm, and dockerhub (people disable Selinux, --privileged turns off seccomp and apparmour, sandbox escape CVES exist)?
Not much different no, and people have equally bad practices around programming package managers as well.
The entire dev ecosystem has terrible security hygiene, largely because of the pressure to move fast and real security controls by their nature limit flexibility and can slow most processes down.
The main llm will refuse to scan for issues flagged or not, and the cheap model not do a good enough scan on its own.
For models designed/marketed for cybersecurity defensive uses, any predictable refusal mechanism is a vulnerability. It is like being able to cause a kernel panic or segmentation fault .
Even if the gate is fail-reject, an attacker can overwhelm HITL reviews with many false positives and use DoS vectors here.
Intriguing... "After months of misdiagnoses Dr Souhel Najjar, employs a test asking Susannah to draw a clock. Instead of the customary clock face, her condition led her to draw all the numbers 1 through 12 on the right side of the clock. This was the breakthrough moment; it was this clock drawing that enabled Dr Najjar to understand that the right side of Susannah’s brain was inflamed, further test revealed this inflammation was a result of anti-NMDA receptor encephalitis, initiating her path to recovery"
"since 2019, on the advice of the National Agency for the Safety of Medicines and Health Products, French health workers have been told not to treat fever or infections with ibuprofen." [1]
But yet in some countries pediatricians will libreally prescribe it to toddlers
Also
from [2] "In this systematic review of NSAID use during acute lower respiratory tract infections in adults, we found that the existing evidence for mortality, pleuro-pulmonary complications and rates of mechanical ventilation or organ failure is of extremely poor quality, very low certainty and should be interpreted with caution."
One of the problems is that if you give it to kids with chicken pox it can cause complications. There was also some hints early in the pandemic that ibuprofen had a similar effect on covid-19. However as you link to, the data doesn't really support that view anymore.
Anthropic/OpenAI could own this space. They should offer a paid service that offers a mirror with LLM scanned and sandbox-evaluated package with their next gen models. Free for individuals, orgs can subscribe to it.
OpenAI just acquired Astral who have an index service called pyx, so they would have a step up.
My understanding though is most corporations that take security seriously either build everything themselves in a sandbox, or use something like JFrog's Artifactory with various security checks, and don't let users directly connect to public indexes. So I'm not sure what the market is.
Judging by curl shutting down its bug bounty program due to AI slop, a likely outcome would be that this mirror has no packages because they are all blocked by false positives.
> “ The agent acted like a hyperparameter optimization algorithm with some basic reasoning baked in.”
Good lens.
The crux of the auto research repo is basically one file - program.md which is a system prompt that can be summarized as “do this in a loop: improve train.py, run the training, run evals, record result. Favor simplicity”. The other files are an arbitrary ML model that is being trained.
This is something I could almost never be bothered to do before, but I can now very lazily set up large parameter sweeps and visualization scripts to really probe things. There's a danger of "analysis paralysis" but I've still found it quite useful. Although I'm not sure it saves me time as much as sanity.
Is that much different from the entire pypi ecosystem, and npm, and dockerhub (people disable Selinux, --privileged turns off seccomp and apparmour, sandbox escape CVES exist)?
reply