I am a muggle, but the three biggest use cases for me have been:
1) Boring stuff like JSON schema/JSON example modification and validation
2) Rubber ducky
3) Using this system prompt to walk me through areas in which I have no experience [0]
You are a very helpful code-writing assistant. When the user asks you for a solution to a long, complex problem, first, you will provide a plan with a numbered list of steps, each with the sub-items to complete. Then, you will ask the user if they understand and if the steps are satisfactory. If the user responds positively, you will then provide the specific code for step one. Next, you will ask the user if they are satisfied and understand. If the user responds positively, you will then proceed to step two. Continue the process until the entire plan is completed.
I recently finally used the OpenAI API in a project. It was gpt4-o to analyze news story sentiment. The ease of use and quality of output is impressive.
[0] I should add that I have been using "presets" in the LibreChat GUI to allow me to have many system prompts easily available. It's kind of like Custom GPTs. Also, using LibreChat for work feels better as I believe that OpenAI states that they do not train on data provided via API.
This seems like a legit way to do it. When I use ChatGPT, I treat it kind of like the Enterprise Computer. It's capable of providing information, interpreting data, conjecture, and suggestions using context.
The weakest use of AI is definitely treating it like a lossy database of everything it's learned.
This is the best use case for voice mode I've found. I also use voice mode to take notes/brainstorm, then have it put them into a CSV or whatever format. The prompt I use has something like "ask any questions if things aren't clear" and the rugby ducky aspect emerges.
I wonder how long it will be before AI can program financial independence. Loosely it would look like a mix between One Red Paperclip, drop shipping, Angi, trading bots, etc. The prompt might be "create a $1000 per month income stream in the next 30 days". Maybe the owner of the first AI to pass this could get a $1 million prize. 2030?
Then equally interesting would be to see how the powers that be maneuver to block this residual income. 2035?
Then after that, perhaps a contest to have an AI acquire resources equivalent to residual income so that it can't be stopped. For example by borrowing for cheap land, installing photovoltaics and a condenser to supply water, then building out a robotic hydroponic garden, carbon collector, mine, smelter, etc, enough to sustain one person off-grid continuously in a scalable and repeatable fashion. 2040?
That almost sounds like a science fiction scenario. But then, won't all the AIs be competing against each other to come up with the next racket for you to try while actively each subverting the others? Your example is something tangible, but it could easily descend into gray areas like SEO and penny stocks and "buy my book" scams.
My favorite thing to do with ChatGPT and coding is making fast prototypes. Since you know hallucinations might be a problem, and since ChatGPT struggles with larger contexts and files, just play to its strengths. I have lots of ideas of "small" to medium webapp ideas, and you can often get ChatGPT to write most of the code of a prototype and have it work quite quickly.
Prototypes are fun! Obviously production code or serious projects are different. But I've found a new joy in building software since GPT-4 came out - it's more fun than ever to build small ideas.
I do believe that ChatGPT as a programmer productivity tool is its most valuable value proposition to date. In addition to generating code, it makes it much easier for me to chase down obscure error messages and potential root causes and workarounds. It's definitely not perfect. But, as part of the Interact->Generate->Verify cycle that modern AI has transformed other disciplines (eg. materials science, protein folding, mathematics, and more), it serves as a valuable component of each of these.
Yes, and I will fill in definitions of obscure functions. If its named well, ChatGPT often can figure out what its doing. (No idea how this works!) I'm sure there are more streamlined ways, but this works well enough for me.
I have my reservations about the quality of LLM generated code, but since I have neither studied ML in depth, nor compared different LLMs enough, I'll refrain from addressing that side of the debate - except maybe for noting that "I test the code" is not good enough for any serious project because we know that tests (manual or automated) can never prove the absence of bugs.
Instead, I offer another point of view: I don't want to use LLMs for coding because I like coding. Finding a good and elegant solution to a complex problem and then translating it into an executable by way of a precise specification is, to me, much more satisfying than prompt engineering my way around some LLM until it spits out a decent answer. I find doing code reviews to be an extremely draining activity and using an LLM would mean basically doing code reviews all the time.
Maybe that will mean that, at some point, I'll have to quit my profession because programming has been replaced by prompt engineering. I guess I'll find something else to do then.
(That doesn't mean that there aren't individual use cases where I have used ChatGPT - for example for writing simple bash scripts, given that nobody in their right mind really understands bash fully. But that's different from having my entire coding workflow based on an LLM.)
Most of my time coding is spent on none of: elegant solutions, complex problems, or precise specifications.
In my experience, LLMs are useful primarily as rubber ducks on complex problems and rarely useful as code generation for such.
Instead, I spend most of my time between the interesting work doing rote work which is preventing me from getting to the essential complexity, which is where LLM code gen does better. How do I generate a heat map in Python with a different color scheme? How do I parse some logs to understand our locking behavior? What flags do I pass to tshark to get my desired output?
So, I spend less time coding the above and more time coding how we should redo our data layout for more reuse.
> Most of my time coding is spent on none of: elegant solutions, complex problems, or precise specifications.
I find that deeply sad and it's probably one of the reasons why I'm partially disillusioned in programming as a profession. A lot of it is just throwing stuff at the wall and seeing what sticks. LLMs will probably accelerate that process.
> I don't want to use LLMs for coding because I like coding. Finding a good and elegant solution to a complex problem and then translating it into an executable by way of a precise specification is, to me, much more satisfying than prompt engineering my way around some LLM until it spits out a decent answer. I find doing code reviews to be an extremely draining activity and using an LLM would mean basically doing code reviews all the time.
Exactly how I feel about it. This is why I don't find "coding with LLMs" as fun as many others seem to find it. Code reviews are draining because you start with a bunch of unknown unknowns or unproven pitfalls this code might have fallen into. And then you eliminate them on by one as you run the algorithm in your mind. It is a lot more draining than coding is because an experienced coder can take mindful steps and build something solid on a base that he trusts won't collapse. Code Review doesn't get any less draining as you get more experienced, unless you start outsourcing responsibility.
I'm left a bit confused still about using ChatGPT and Claude as someone who's still learning (2nd year CS student) and nowhere near being a professional dev.
I'm sure if I was was dev who had learnt and worked in the pre-GPT era I'd have no problem using these tools as much as possible, but having started learning in the GPT era I feel conflicted. I make sure I understand each line of code generated whenever I use AI. Despite that I have a feeling I'm handicapping myself using these tools? Will it just make me a code reviewer/copy-paster rather than someone who can write something from scratch?
If it is reasonable to use these tools, at what point does it become so? like at what point can I consider myself well enough at programming to be able to use it like in the post.
Right now I'm purposely restraining myself from using these tools too much because what I can make using them is much better than what I can make myself, so as to get upto a certain level myself before I start making use of these capabilities
Am I thinking about this the right way? At what point does it make sense to start using these tools more freely without worrying about handicapping my learning?
You remember a fraction of things you merely touch. More if you narrow your focus.
If you have the passion for programming it will manifest itself by itself over time and with training.
Use all the tools available for learning and doing. Question it just as much. Explore the mind and the different axis of the field. Read the books but think for yourself.
Programming without money involved is creative and fun. Make sure you remember this. It's not a science or backed by natural laws. Don't worry about being wrong, we all are without guidance, but seek the truth within yourself for what you believe is the best path for becoming a good programmer.
> Am I thinking about this the right way? At what point does it make sense to start using these tools more freely without worrying about handicapping my learning?
None of us can see the future so it's all just bloviating, but let me give you my two cents as someone who's been programming since I was a kid and went the long way around starting with assembler: it probably mostly depends on what kind of programmer you want to be.
If you're in CS for the career and you have other things you want to do that have nothing to do with technology after the 9-5, then it's probably not going to be a problem. There's a lot more to software engineering than the code and getting good at those areas - like solving non technical coworkers problems with software - is just as important as writing the code. Most code is also not high-risk and doesn't need to be perfect, just maintainable. Learning the skill of translating requirements to LLM conversations now may well pay dividends in the coming decades because it's pretty obvious that LLMs are here to stay.
If you really like programming, want to be an architect later in your career, or want to be the technical cofounder in a successful tech heavy startup or something, then you'll want to minimize your use of those tools to rubber ducking and minor questions. There's a lot of value in developing "grit", that may very well be hampered by use of LLMs. You need to absorb a lot of foundational knowledge so that you can make intuitive decisions about what the LLM is writing. You might be able to use an LLM as the primary guide to developing that knowledge, but it's risky.
To be honest, when I was younger I figured all the people who started learning in college and didn't know how the low level basics of CPUs or memory worked would be at a disadvantage, but that has proven to be dead wrong. The majority of people did just fine using Python or Javascript without any of that knowledge or experience and I figure it will play out the same with LLMs.
Think of ChatGPT (and its ilk) as just another tool, no different from a Google search in which you look for code to solve your problem. If you prioritize learning to code on your own, you'll avoid these tools. If you prioritize getting working code, you'll use these tools. It's a sliding spectrum, and there is no right or wrong answer.
Even if you rely on these tools heavily, you can't help but learn from them because you must still examine the their code, even if briefly. It will be a different type of learning than what you'd learn from, say, cracking open a language manual (yes, we used to do this back in the pre-Web days), but you will learn nonetheless.
I suggest you try all the tools available to you and the decide when it's appropriate for you to use each.
LLMs can accelerate your learning. I've been programming for 25+ years and the rate at which I'm learning new skills and tools has gone up a very material amount now that I can bounce things through ChatGPT and Claude.
Should you worry that if you don't struggle against a weird compiler message for 3 hours (which ChatGPT would have told you how to fix in 30 seconds) you won't be gaining essential crank-on-a-frustrating-problem-for-three-hours experience?
I'm personally not convinced that the frustration is worth it. I'd rather have spent those three hours learning a bunch of other stuff.
But maybe the only reason I'm a competent programmer today is that I worked through that pain earlier in my career?
Since you're clearly thoughtful and conscientious, I suggest trying both. Some days, go all-in on LLMs. Other days limit yourself and work through challenges without them. Experiment like that for a month or so to see which learning style appears to be working best for you.
Honestly, the first article I've seen where the actual usage is explained clearly and matches my own experience. Maybe because I tend to write software the same way.
The thing I've heard the most from other developers, particular those new to the profession, is that you "have to know most of what you're asking already to know if what you get from the LLM is right." You can use the LLM to learn, but for the actual programming they struggle because they don't have the background to understand the responses well enough to continue the implementation.
Also, for the record, C# and .NET, huge enterprise/ecommerce software, so not quite as malleable as bash scripts and what not.
Having a depth of experience helps a lot, because it means you can tell the thing exactly what to do. Effectively you get to treat it as a super-productive intern, very good at taking instructions, occasionally prone to getting stuck in an error loop or falling for weird conspiracy theories.
"I can now write clojure code without ever touching anything lisp-y before"
I have trouble with doing that myself. When I personally don't understand the code completely I worry I'm about to step in it. Maybe it's a flaw in my work ethic, but when stuff slips past code reviews and makes it even to QA because I didn't comprehend it well enough I feel bad about it. Maybe a bit too personally attached to the results.
My favorite thing is now being able to code fluently in new languages. I’d never used rust before but I was able to immediately get something up and running and I got a detailed explanation of how it works. With a programming background I can instantly understand most code once I get past the new syntax and idioms.
I was impressed with chatgpt in many a situation where it could understand me better than another person ona forum would. Will present me with a solution without creating an environment where I need to be careful how I'm gonna come across.
I've enjoyed finding answers to things and suggestions on how to do them differently of how I was thinking of doing them
I've enjoyed receiving answers to questions I asked google with no match from what I'm asking.
Can't bring myselt to use it to code for me though, but all the above leads me to believe it shouldn't far now until I'm on board too
Regarding the formatting, i use Llama 3 mainly, to generate org-mode documentation about functions, i tell it to enclose lines with *Line x-y* and begin_src for the corresponding lines.
It generates most of time perfect formatting which i readily export to markdown with org-mode-html-export.
Showcase of the generated formatting as a screenshot [1].
The prompt is what the screenshot shows. I give also a small example, and i put the code in the Real Definition. The whole command is shown above [1]. Replace the parse header with the function name/names.
I tried this command with GPT-O-mini, GPT-3.5, GPT-4.0, Llama-3-8b, Llama-3-70b, each one of them generates perfect formatting almost every time. No real differences between any of the models.
Using the factory defaults for formatting, GPT is better than any other model, but as soon as the user can control the formatting then every model performs well. But i like the defaults of GPT better, i agree with the article author about that.
I recently asked for a Dockerfile that builds an image which passes Docker's vulnerability scan. It gave me a file that failed over a dozen vulnerability tests.
Did you paste the test failures back into it to see what it did about them?
(This is the kind of test I'd expect it to fail at because of its knowledge cut-off - vulnerability scans are mostly looking for things like out-of-date packages, so any tool with a knowledge cut-off a few months in the past is very likely to include package versions that would no longer pass a scanner)
I googled, found the solution I was looking for. Zero vulnerabilities. Can't be arsed with training LLMS, writing a Dockerfile for my project was one of many little steps that I have no desire to turn into training sessions for an LLM.
Yeah, same use here. I work with data analysis, so 95% of the code I write is to wrangle data, or do something in the ETL-pipeline.
Almost all my ChatGPT use comes down to writing queries for loading or transforming data. Getting rid of the boilerplate has helped immensely on my productivity.
EDIT: I should note, the vast majority of errors I get using solutions from LLMs, tend to be code it includes that contain legacy or dead libraries. Sometimes it starts to mix old and new libraries in the same code snippet, which will either outright fail, or output some weird results.
Which makes sense, as some answers are the product of being trained on 14 year old StackOverflow posts, while others are trained on newer stuff.
I'm increasingly building entire functional prototypes from start to finish using Claude 3.5 Sonnet. It's an amazing productivity boost. Here are a few recent examples:
datasette-checkbox is a Datasette plugin adding toggle checkboxes to any table with is_ or has_ columns. Animated demo and prompts showing how I built the initial prototype here: https://simonwillison.net/2024/Aug/16/datasette-checkbox/
I still see some people arguing that LLM-assisted development like this is a waste of time, and they spend more effort correcting mistakes in the code than if they had written it from scratch themselves.
I couldn't disagree more. My development process has always started with prototypes, and the speed at which I can get a proof-of-concept prototype up and running with these tools is quite frankly absurd.
I'll echo this sentiment. LLMs have made it so easy to create bash scripts to make things easier, write code in languages I don't write in, or even just for writing up the README.md just based on the code I've written.
I just finished my initial prototype for a Raspberry Pi Zero W based audio recorder. These are all the things it does:
- It has a single button connected to GPIO that starts/stops recording
- An LED indicator connected to GPIO
- Queries the available recording devices, choosing the external USB audio interface if available and setting the appropriate bit-depth
- Launches ALSA's `arecord` in a subprocess to handle the actual recording (<5% CPU on a Pi Zero W when recording 48k @ 32bit)
- The script is setup as a system service
- Nginx is running to serve up the WAV files using a web UI
- There is a simple service to start/stop recording and get status via an API the web UI can use
It is all in Python, which I don't really write, but is the default for these kind of Pi projects. 100% of the code, service definitions, Nginx config, README.md, etc were written by LLMs, mostly Claude.
This is also doable with one line of bash via imagemagick in any terminal that supports the Kitty or iTerm2 graphics protocols, which is most mainstream ones (iTerm2 itself, Kitty, Konsole, etc.)
I don't say this to bash your techniques but to point out that there are other methods which permit quick prototyping and easy iteration, and that LLMs are, at least in my experience, not a paradigm-shifting improvement in this vein.
Imagemagick doesn't solve this particular problem. My goal here is to process an image so I can use it on my blog. I want to find the right balance between image size and visible quality - since images on my blog have a maximum size, I can often get away with a lower quality JPEG because it well be effectively treated as a "retina" image.
But to make that decision, I need to see the images. I could run a bash script to generate those images in a bunch of different qualities and then view them with some kind of image viewer, but that's extra steps - and it involves creating a bunch of temporary files that I then need to clean up.
With the web version I can snap a screenshot with CleanShot X and then drag that screenshot straight onto the web page. I instantly see the different images, pick one that looks good to me, download that and then drag it into my S3 uploading software (Transmit).
All of that said... if I was going to prototype this with imagemagick I would 100% use an LLM for that, too. I don't remember the imagemagick flags for this kind of thing; ChatGPT and Claude know those flags already.
> But to make that decision, I need to see the images. I could run a bash script to generate those images in a bunch of different qualities and then view them with some kind of image viewer, but that's extra steps - and it involves creating a bunch of temporary files that I then need to clean up.
That's not correct at all. You can, in fact, do all of these steps in a single command line program with Konsole (or iTerm2 on Mac, or Kitty - whatever terminal you're using, as long as it supports these features), imagemagick, and bash.
$ for size in $(seq 10 10 100); do; convert -resize $size% input.png output_$size.webp; timg output_$size.webp; done
timg, here, is https://github.com/hzeller/timg, but you could use anything that speaks iTerm2 or kitty. This approach generalizes easily, too; you can easily use this to vary any parameter imagemagick supports, like webp compression or posterization or dithering, and print out any parameters of the image, like size, along with the image itself.
> With the web version I can snap a screenshot with CleanShot X and then drag that screenshot straight onto the web page. I instantly see the different images, pick one that looks good to me, download that and then drag it into my S3 uploading software (Transmit).
In my workflow, I edit in Showfoto or Darktable, resize (or, in my case, more often dither and resize) as demonstrated, and then `cp` the appropriate selected image into my blog's main image folder. Hardly more difficult, and while you might not enjoy it, that's exactly my point - we can both make things we like, but you're asserting that LLMs massively changed the landscape overall, while I'm not using them at all.
I also have a script, which took about one minute to write, that cleans up the extra temporary images, so that's not much of a concern either.
I really think you overestimate how hard it is to use imagemagick. Unless you're doing really deep wizardry - or editing video on the command line, but I'm almost certain the LLM will have more trouble with that, too - it's not hard. I didn't take that command line in the above command from a script, by the way; I rewrote it from memory, in about thirty seconds, because it's not hard. There might be a syntax error, but that'd be, what, another ten seconds?
I'm not asserting that LLMs aren't useful. You demonstrate successfully that, to you, they are. I'm asserting that they're not orders of magnitude more useful than powerful, composeable interfaces to the software we already have - and they are orders of magnitude more environmentally and socially destructive.
Comparing them to "powerful, composeable interfaces to the software we already have" doesn't make sense to me, because I'm using them in conjunction with that - I'm far more able to take advantage of the software we have now because LLMs mean I don't have constantly look up zsh escaping rules, or ffmpeg invocations, or how to run AppleScript from my terminal, or how to get a fetch() to work cross-domain via CORS or whatever.
It's funny you say "editing video on the command line, but I'm almost certain the LLM will have more trouble with that, too" because ffmpeg is the perfect example of software that is almost unusable for me without LLMs, but thanks to LLMs is something I solve problems with every week: https://til.simonwillison.net/macos/quicktime-capture-script
I won't disagree on the environmentally and socially destructive aspects. The question for me is if the benefits they provide compensate for the many downsides - like the fact that I can prototype 5-10x faster, while people who have never programmed before are finally able to get started on that learning curve without being put off the first time they forget a semicolon.
Will I look back in ten year's time and say "wow, the positive applications of LLMs really had nothing on the negatives, I wish this had never been invented"? I don't know. Currently I'm leaning "it's worth it".
Could you not go a step further and use gpt4o-mini’s vision to check if the text is readable? Basically, extract the text from the biggest screenshot. Then compare it to extracts from increasingly smaller versions until the text degrades / diff crosses a threshold.
I don't think GPT-4o (or any of the other vision models) "see" at a high enough resolution to help here. The differences between high and low quality images are very slight - the text almost always remains legible in the smaller images, just with slightly more JPEG artifacts that are visible.
It may be possible to train a custom machine learning model that can identify the quality I'm looking for, but honestly it's very much a human judgement thing here.
Your tool is really cool, thanks for making it and sharing.
I notice that a lot of times, the multi-line output I get from my prompts is truncated or the model just aborts or something. The exit code is still 0 but it seems like something went wrong.
So you'd just type "use ffmpeg to convert 'my input.avi' to NTSC output and make the audio track quieter" => Alt+E => replaced with `ffmpeg -i "my input.avi" -target ntsc-dvd -af "volume=0.5" "output.mpg`
> I couldn't disagree more. My development process has always started with prototypes, and the speed at which I can get a proof-of-concept prototype up and running with these tools is quite frankly absurd.
I do agree. But I think people is focusing on speed as if it was the only important thing to measure. It may be for some people-in-suites, but what about professionals?
A friend of mine who had minor knowledge about internet (used to do wbesites in HTML with iframes back with Dreamweaver) was able to deploy a PHP app using GPT. Great! It saved him a lot of time. Now I asked him if he understood or was able to do a change and he told me that he mostly did copy and paste and asked GPT to change things for him.
Not saying GPT is going to dissappear and no one is able to maintain this, because that sounds silly, but even with GPT at your side, it's improtant to understand what's happening in case you need to do a change.
Two notes, before anyone jump to this: I am not talking about simonw particular case, he seems a developer who knows his way around Python and could do all of this by himself, it's just saving his time.
Secondly, probably in the future a LLM will be much better at coding and will be able to fix or improve your code even if you don't understand it and will be able to do an startup from zero, scale it, deploy servers, etc. Who knows?
Based on the majority of web sites on the Internet: Do we really care if the code is copy-pasted out of ChatGPT or out of Stack Overflow?
A lot of folks do not have the necessary knowledge to deeply understand the code they're writing - and I don't mean that as a diss. There's a lot of things they understand that better SWEs might not.
ChatGPT and StackOverflow both allow them to build things they otherwise couldn't, the same way e.g. cars are tools that allow people to travel faster even if they can't fix the cars or make smarter choices about where they go. At some point, that lack of knowledge will cause them harm, but that's not specific to any given tool, and the harm from ChatGPT advice isn't worse than the harm from SO advice.
Yeah, I'll second this. Our company has an aggressive push to use LLMs for development, and it has quite certainly been the the source of a lot of obscure problems that made it into production. Definitely going a bit too fast for their own good.
I think (based on observations from my admittedly flawed in many different ways workplace) the difference with Stackoverflow assisted development is scale. ChatGPT seems to have massively boosted the productivity of mediocre developers.
Obviously not saying that LLMs aren't empowering competent developers and spawning useful projects. But their ubiquity seems to have coincided with an avalanche of terrible code, at least at the fairly disorganized organization I work at.
1) Boring stuff like JSON schema/JSON example modification and validation
2) Rubber ducky
3) Using this system prompt to walk me through areas in which I have no experience [0]
I recently finally used the OpenAI API in a project. It was gpt4-o to analyze news story sentiment. The ease of use and quality of output is impressive.[0] I should add that I have been using "presets" in the LibreChat GUI to allow me to have many system prompts easily available. It's kind of like Custom GPTs. Also, using LibreChat for work feels better as I believe that OpenAI states that they do not train on data provided via API.