We know for a fact that current LLMs are massively inefficient, this is not a new thing. But every optimization you make will allow you to run more inference with this hardware, there's not a reason for it to make it meaningless any more than more efficient cars didn't obsolete roads.