I learned (assembly)programming on a chip that had no multiplication instruction. It was 6510 (a version of popular 6502) and I fail to see the benefit. Back then every multiplication had to be done via addition in loop and division with subtract/compare(except certain numbers like powers of 2 where one could bit shift). You can imagine how slow it was. I was envious of my friends with Amigas (68k cpu) who had chips that were capable of multiplication in hardware. It seems obvious that a properly tuned hardware implementation is always going to be faster than doing the same thing in software. Taken to the extreme this is the crux of the old RISC vs CISC debate.