I've read the D20 docs (e.g., HMM ...) and I can't find a reference to this anywhere so I would like to confirm: D20 memory needs to be added in sets of 2 or 3, right (i.e., DIMMs can be added one-by-one)?
In general, the D20 will handle "unbalanced" DIMM configs. Systems have been tested in combinations anywhere from 1 to 6 DIMMs per CPU with no issues (other than occasionally not seating a DIMM correctly). That being said, it's generally best for performance to fill across the channels first, and try to keep those channels balanced. If you look closely at the motherboard (or the system label), you'll notice the slot numbering initially appears a bit erratic. This numbering scheme tries to ensure that DIMMs are filled across the channels first, and then fill with 2 DIMMs per channel if you have enough DIMMs. Filling in order (1,2,3,4,5,6) according to the slot numbering on the board will accomplish this.
Now to add just a bit of additional complexity, actual performance will differ for 1DPC (1 DIMM per channel) vs. 2DPC depending on your CPU and the memory speed you're trying to run at. For Nehalem CPUs, 1333MHz memory will be clocked down to 1066MHz if 2DPC configs are detected. If that memory is quad ranked and 2DPC, it will go even further and clock it down to 800MHz for Nehalem. Westmere doesn't suffer from this MRC limitation, and can fully support 1333MHz DIMMs in 2DPC configs.