Purely anecdotal, but I hoard a lot of personal documents (shopping receipts, co...

mcswell · 2025-12-27T23:15:26 1766877326

BOM is normally used with UTF-16, not with UTF-8 (both of which, along with UTF-32, are encodings of Unicode).

I've worked with lots of minority languages in academic situations, but I've never run into anything that couldn't be encoded in Unicode. There's a procedure for adding characters (or blocks of characters) for characters or character sets that aren't already included. There are fewer and fewer of those. The main requirement is documentation.

makeitdouble · 2025-12-28T02:27:36 1766888856

Thanks!

On adding new characters to Unicode, as for any commitee there will be rejection and cases where going through the whole process is cumbersome/not worth it.

It's more commonly discussed in the CJK circles, it reminded me of the Wikipedia entry (unsurprisingly with no English equivalent)

https://ja.wikipedia.org/wiki/Wikipedia:%E8%A1%A8%E7%A4%BA%E...

> minority languages

More archaic that minority, but one language I had in mind was one using color coded strings and knots representation. There are latin alphabet mappings, so as long as we trust the translation record keeping per se works in Unicode, but if one wanted to keep the exact original writing it would obviously not work out in plain text. I imagined it's not an isolated instance, but I'm also way out of my depth on this one

https://en.wikipedia.org/wiki/Quipu

Archelaos · 2025-12-27T16:37:44 1766853464

> stuff I saved only 10 years ago

There have been a lot of practical options around in the last three decades for using Unicode. To name just a few: Unicode is around since 1991. UTF-16 was supported in Windows NT in 1993. XML (1998) was specified based on Unicode code points. ...

makeitdouble · 2025-12-27T17:10:25 1766855425

As for many standards, the question is less what's available/supported and more what's the format actually used irl.

Half the mail I received from that period was in iso-2022 (a JIS variant), most of the rest was latin-1. I have an auto-generated mail from google plus(!) from 2015 in iso-2022-jp, I actually wonder when Google decided it was safe to fully move to utf-8.