Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It pushes a localization and UI problem down into the filesystem layer. Case-insensitivity is pretty easy for US-ASCII, but in release 2 of your filesystem, you realized you didn't properly handle LATIN WIDE characters, the Cyrillic alphabet, etc. In release 7 of your FS, you get case sensitivity correct for Klingon, but some popular video game relied on everything except Klingon being case-insensitive on your FS, and now all of the users are complaining.

How do you handle the case where the only difference between two file names is that one uses Latin wide characters and the other uses Latin characters? This one bit me when writing a CAPTCHA system back in 2004. (Long story, but existing systems wouldn't work between a credit card processing server that had to validate in Perl, and a web form that had to be written in PHP, where the two systems couldn't share a file system. It's simple enough to do using HMAC and a shared key between the two servers, but for some reason, none of the available solutions did it.) I noticed that Japanese users had a disturbingly high CAPTCHA failure rate. It turns out that many East Asian languages have characters that are roughly square, and most Latin characters are roughly half as wide as they are tall, so mixing the two looks odd. So, Unicode has a whole set of Latin wide characters that are the same as the Latin characters we use in English, except they're roughly square, so they look better when mixed with Unified Han and other characters. Apparently most Japanese web browsers (or maybe it's an OS level keyboard layout setting) will by default emit Latin wide unicode code points when the user types Latin characters. Whether or not to normalize wide Latin characters to Latin characters is a highly context-dependent choice. In my case, it was definitely necessary, but in other cases it will throw out necessary information and make documents look ugly/odd. Good arguments can be made both ways about how a case-insensitive filesystem should handle Latin wide characters, and that's a relatively simple case.

Most users don't type names of existing files, exclusively accessing files through menus, file pickers, and the OS's graphical command shell (Finder/Explorer). So, if you want to avoid users getting confused over similar file names, that can be handled at file creation time (as well as more subtle issues that are actually more likely to confuse users, such as file names that have two consecutive spaces, etc., etc.) via UI improvements.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: