One of the key insights is that every program is representable as an integer, and you can map essentially any integer into a program. (If this seems strange, think of a file on a computer--traditionally a stream of bytes--as just a large base-256 integer.) This means we can iterate every possible program, as there are only a countably infinite number of programs.
I should also note that this is required by our conventional definition of program (specifically, a Turing machine). There's definitely models of computation more powerful than Turing machines, but these are not believed to be physically realizable (the Church-Turing Thesis); in any case, the diagonalization form of the proof of the halting problem's uncomputability indicates that these more powerful methods are also susceptible to their own variant of the halting problem.
I should also note that this is required by our conventional definition of program (specifically, a Turing machine). There's definitely models of computation more powerful than Turing machines, but these are not believed to be physically realizable (the Church-Turing Thesis); in any case, the diagonalization form of the proof of the halting problem's uncomputability indicates that these more powerful methods are also susceptible to their own variant of the halting problem.