Ah I see. Well, remember, with analog circuits, we are talking about subthreshold currents. This current is orders of magnitude less than the current in a digital circuit (nA vs uA). Correspondingly, the power consumption will be negligible in comparison, even if you expand the current range. And that is only a fraction of the total power consumption. Adding more bits in a digital circuit linearly increases total power, dominated by interconnect capacitance.
That was an important observation. Fighting noise is one of the primary reasons the first digital computers were invented.
To give a bit of a dramatic illustration, if you circuit has on the order of 1 nV of thermal noise and you wanted to do the linear analog equivalent of 64bit integer arithmetic, you would need a signal on the order of 10,000,000,000 V to have enough precision. In fact, in terms of power consumption it's even worse. If the 1 nV signal consumes something like 1 pW, you would need something like the total power output of the Sun (on the order of 10^26 W) -- a bit of an expensive multiplication, no :) ? That's how crazy it is!
Again, if you can get away with less than 8 bits of precision and imperfect linearity the picture changes, but I wouldn't declare it superior a priori without looking at the numbers.
Or, you could split your 64 bit computation into 8 bit computations, which could be done with analog circuits, and still save a lot of power! :-)
But yes, I understand your point. Both analog and digital implementations have their strengths and weaknesses. If you value power over precision, go with analog. If the opposite - go with digital.
Right, but note you can't even split it, if you are thinking of linear circuits. Precision necessarily means how your signal compared to the thermal noise floor. It is possible to show can't compose 8-bit precision linear units to get a >8-bit precision value. What happens is actually the opposite, if the noise of the units are uncorrelated noise will propagate and increase to the tune of sqrt(number of operations). Avoiding error propagation is another advantage of digital operations.
The reason NNs don't exhibit strong error propagation is because of the non-linearities between linear layers that perform operations analogous to threshold/majority voting or the like, which have error correction properties.
Interesting, but then how do you explain that rectified linear operations between layers work better than sigmoids?
According to your logic, ReLU should have worse error propagating quality than squashing functions?
I'm going to reply to your question below here since HN is preventing a reply (anti-flaming/long threads I guess).
Be careful with jumping to conclusions: I never even cited ReLUs or Sigmoids in my post! I don't have any opinion on which non-linearity is better, I only know both are dramatic non-linearities. My claims were about linear circuits. You should use whatever nonlinear element works best in your Neural Network, of course (and I've heard ReLUs have good advantages).