The way ML people use "dimension", each free parameter is an extra dimension. A high-resolution 2D image is considered to have millions of dimensions - one for each pixel.
That’s not just ML, that’s linear algebra. Moreover, across fields in math, the dimension of an object is defined differently, but it’s always a very fundamental property of an object that captures loosely how complex an object is. Often, that complexity is related to how many numbers you need to describe said object.
These are all expressions of the same concept. The confusion is that physicists are describing the dimensionality of specific systems (space, spacetime, superstring theory, supergravity) - that doesn't mean that this limits the dimensionality of other systems (which is often where confusion lies between laymen).
We usually do HxWxC, for height, width, and channels, so each pixel is addressed via the two first dims of the input, and then it has 3 channels. Of course, you can transpose the tensor to CxHxW or CxWxH. Different ordering behaves differently with respect to memory locality.