First, thanks for sharing this link, it was an interesting read! A few remarks below.
I had a hard time reading the wc code in the article. First I had to go to the GitHub to understand that "da" stands for dynamic array, and then understand that what the author calls wc is not at all the wc linux commands, which by default gives you the number of lines, words, and characters in a file, not the count of occurrences of each word in the file, which is what the proposed code does.
Also, since I had to read the GitHub README, another remark: it says that sp_io uses pthreads rather than fork and exec. Both of those approach (but especially pthreads) are contradictory to the explicit goals of programming against lowest level interfaces. I believe the lowest level syscall is clone3 [1], which gives you more fine grained control on what is shared between the parent and child processes, allowing to implement fork or threads.
By the time you know enough to reasonably use clone3, you have also learned that doing so is an exceptionally bad idea save for very rare circumstances.
To be portable it's probably best to use the pthreads API for everything, make no additional assumptions, and rely on the user to provide the implementation. Consider what happens when someone is working with OpenMP, CUDA, or similar and attempts to make use of a dependency that in turn makes use of your library. The easier it is to understand the assumptions made by your library the better.
I agree with that. I'm just stating that it's contradictory with the project's own principles.
> Program directly against syscalls
It's the very first one of the listed principles. In the paragraph after this title it even says it "must" be the case in italic to insist on it, and there's a footnote to define what they mean, which is very clear in that pthreads should be out according to this principle.
The author of this paper holds that quantum computers will never be able to go above the limit of a thousand qubits.
> Hence, insofar as a classical computer will never factor a 2,048-bit RSA integer, RaQM (rational quantum mechanics) predicts that a quantum computer will not either. This predicted breakdown of QM could be testable in less than 5 y.
People here who know about quantum computing, what do you think of this work?
I know more about Quantum Chemistry than Quantum Computers. Both use the same Physics theory, the only difference are the goals and implementation details :)
Usually you have a molecule and the most stable state is a sum of products of orbitals (Slate determinant). You can write it using a base of orbitals, that mathematically is just an ortonormal base. In a Quantum Computer the support of the emplacements of the bases are far away (in atomic scales) to protect them against noise. In Quantum Chemistry, they are all overlapped and it's a mess that is impossible to separate, but that is what real molecules have.
The real physical properties does not depend on the ortonormal base you choose. A usual trick in Quantum Chemistry is to pick a ortonormal base were the solution looks almost like only one product of orbital, so you have a simple representation in a simulation in a Classical Computer and you can do the calculations very fast.
But you can also pick a nasty base, where the most stable state looks like a "maximal N-qubit superposition/entanglement". It's stupid because the calculations are slower. But in some calculations you first pick the base and then try to get the most stable state and then perhaps try to get a new ortonormal base that is nice for the calculations.
So, I can't see how is it possible that there are surprising restrictions that appear in the nasty base and don't change the results in the nice base.
The same thing happened with my blog a few weeks ago. It was well referenced for years and suddenly almost all of my entries are not indexed anymore. The Search Console indicates that the URLs were crawled but are currently not indexed, and contrary to technical problems, there nothing I can do to fix it, I just have to accept that most of my articles cannot be found via Google anymore.
EDIT: I don't actually think it is related, but now that I think of it, the timing corresponds with when I started setting up TDMRep to forbid using my content to train LLMs.
Same. I've been running a personal blog for over 20 years. Last year, I couldn't find any links to my blog on Google. Went to Google Search Console to find all my links are "Crawled by not indexed", with no reason given.
If Google already slurped up any training data from your site, then not indexing it probably gives them something of a moat over anyone using Google search for site discovery.
I moved my domains and mailboxes from Gandi to Infomaniak when Gandi went from "no bullshit" to full shit hole after TWS bought them. The service is top quality and their customers service was really helpful in transferring my third-level .name domain which has always been a hassle. This news makes me even more glad I chose Infomaniak.
Not just for LLMs, but in general if code is produced automatically by a tool and isn't going to be a hundred percent proofread and tested by humans who could have written it manually, it's always better to use the safest possible language so that the compiler can catch most of the errors. So yeah, Rust or OCaml are good candidates. Performance is also a good point but it's a secondary issue in my opinion.
Oh, since you're here, I want to say thank you for all your guides! I learned from them yeaaars ago, and still recommend them to my own students to this day, especially your network programming guide which is linked from all of the network lab session sheets of my systems and networks course. Thanks!
I teach such a course, and we don't have that. First, students must work on an existing issue of the project they choose and are only allowed (for my course) to submit an issue or a PR non related to an existing issue if they have already finished a first contribution that have been merged by the maintainers into to same project. The course grade is based on multiple factor and the code of the contribution itself is far from being the most important. The most important aspects are communication with the developers (and being respectful and polite certainly is significant) and the ability to identify and then respect the (often implicit) conventions of the project, as well as the proper use of the forge workflow for submitting a PR (fork, clone, branch, PR, discuss, etc.). Getting the contribution actually merged into the project is a neat bonus on the grade but is not required to pass the course.
Also, I totally ban using LLM, and unmotivated students often choose to work on very simple issues like easy refactoring or cosmetic aspects of web projects. It's okay with me for two reasons: first because it filters out unmotivated students from working of important issues and giving useless review work to open source maintainers, but also because we have all the other courses to do complex projects, here the point is to teach them by practice the workflow of contributing to an actual project, discussing with actual people, etc.
For some students it's already a good thing to have been able to get a copy of the latest development version of a given project, to install all of its development dependencies and tools, to compile it, and to reproduce the bug they chose to work on. It's not enough to pass the course, but it's a necessary first step to contribute to any project and it's quite a different experience from what they're used to with small school projects that are designed for teaching or that they entirely wrote themselves.
In the CS bachelor degree I'm responsible of, we have exactly that in the third and last year (it's in France, so as in ~all Europe the licence lasts three years and then students continue their studies doing a master in two years).
I've been teaching this course for ten years now, and it's been fantastic. A lot of open source contributions, mostly trivial, but some more significant than others too, have been made, to a lot of different projects. It teaches students to actually work on a real code base, using a real workflow (fork, clone, branch, commits, PR, review, commits, review, … hopefully merge), talking (in English) with maintainers, having to update tests and documentation not just code, and having to respect a lot of conventions that are not always explicitly listed anywhere (a first work that I always ask them to do is to present the project they have chosen, its tools, platforms, and languages, and to list all the programming conventions (indentation, naming, etc.) they can identify). At the end of it, it also make them realize what they can do, because at the beginning of the semester most of them think they will never be able to actually make a contribution to a real project.
This year only there were contributions to NewPipe, Cartes.app, Immich, Fossify apps, PyGameEngine, Jax, Shortcut, Wikimedia Commons App, Godot, …
Some years ago I even had students contributing to ls (yes, in the GNU core-utils).
I'm responsible for a CS bachelor degree in France (licence informatique). In our curriculum, we teach how to learn programming and then programming paradigms, not languages, and we use a lot of different programming languages for that from the first semester. At the end of the first year, our students have already approached 4 programming paradigms using ~10 different programming languages, during the second year this is reinforced by the introduction of a new paradigm (OOP) and some more advanced course using different languages (for example the introduction to functional programming in the first year mostly uses Racket, but our second functional programming course in the second year uses OCaml).
I had a hard time reading the wc code in the article. First I had to go to the GitHub to understand that "da" stands for dynamic array, and then understand that what the author calls wc is not at all the wc linux commands, which by default gives you the number of lines, words, and characters in a file, not the count of occurrences of each word in the file, which is what the proposed code does.
Also, since I had to read the GitHub README, another remark: it says that sp_io uses pthreads rather than fork and exec. Both of those approach (but especially pthreads) are contradictory to the explicit goals of programming against lowest level interfaces. I believe the lowest level syscall is clone3 [1], which gives you more fine grained control on what is shared between the parent and child processes, allowing to implement fork or threads.
[1] https://manpages.debian.org/trixie/manpages-dev/clone3.2.en....
reply