Of course, if you do it that way then the longest string you can store with the small string optimization is probably ~15 bytes instead of ~23. So although you do save 1/3 on the size of each string, on average you're probably still going to end up doing a greater number of dynamic allocations because of the reduced small string capacity. Unless of course you know a priori that a sufficiently large portion of your strings will be > 15 bytes anyway, which of course the implementors of std::string almost certainly don't know.
Edit: I failed to notice the part about the length being in the data block (doh). I guess the disadvantage to putting the length there would be that an extra indirection is required to get the length, a rather common operation. And as others have pointed out, that only saves 4 bytes, which will be used anyway for alignment..
If you want to get really fancy, you can do it in 8 bytes. Pointers are only 48-bits on 64-bits, so you can squeeze a 16-bit size field. If size overflows that, then you can use a cookie before the data string to find the size. Capacity could be stored in such a cookie, or junked entirely and you rely on your memory allocator to get the size of the allocation (small-string optimization obviously not even being considered in this model).
Eh, I was thinking about pretty much exactly what you said. "Length in the data block" would be when length doesn't fit in 32 bits. (I.e. >2GB or maybe >4GB.)
It would require an additional branch to test for huge strings, but it will be almost never executed, and I think modern CPUs are pretty good at optimizing out such branches...
Edit: I failed to notice the part about the length being in the data block (doh). I guess the disadvantage to putting the length there would be that an extra indirection is required to get the length, a rather common operation. And as others have pointed out, that only saves 4 bytes, which will be used anyway for alignment..