Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

WTF-8 and WTF-16 are a thing: https://simonsapin.github.io/wtf-8/

Basically WTF-16 is any sequence of 16-bit integers, and is thus a superset of UTF-16 (because UTF-16 doesn't allow certain combinations of integers, mainly surrogate code points that exist outside of surrogate pairs).

Then WTF-8 is what you get if you naively transform invalid UTF-16 into UTF-8. It is a superset of UTF-8.

This is very useful when dealing with applications like Java and Javascript that treat strings as sequences of 16-bit code points, even though not all such strings are valid UTF-16.



> Basically WTF-16 is any sequence of 16-bit integers, and is thus a superset of UTF-16 (because UTF-16 doesn't allow certain combinations of integers, mainly surrogate code points that exist outside of surrogate pairs).

If WTF-16 is the ability in potentia to store and return invalid UTF-16 without signalling errors, I don't know that there's any actual UTF-16 system out there to the possible exception of… HFS+ maybe?.


APFS continues to normalize codepoints as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: