You could say that while most string are small, vectors are all kind of sizes and the T size itself can be large, so the optimization is less of a win in for a general purpose container.
But it is mostly for historical reasons. I believe the original STL used the SSO optimiziation [1], so there was never any assumption about the stability of references to string elements, while there is a lot of code that assumes that references to vector elements do not change.
[1] The SGI STL, direclty derived from the original HP STL had extensive rationale on why it didn't implement COW; libstdc++, which I believe also traces its roots from it, decided to instead do COW. The rest is history.
But it is mostly for historical reasons. I believe the original STL used the SSO optimiziation [1], so there was never any assumption about the stability of references to string elements, while there is a lot of code that assumes that references to vector elements do not change.
[1] The SGI STL, direclty derived from the original HP STL had extensive rationale on why it didn't implement COW; libstdc++, which I believe also traces its roots from it, decided to instead do COW. The rest is history.