View Single Post
  #10 (permalink)  
Old 28th December 2004
goulo goulo is offline
Junior Member
BS.Player Newbie
 
Join Date: Dec 2004
Posts: 2
Rep Power: 0
goulo is an unknown quantity at this point
Default Unicode, UTF-8, etc.

Windows most certainly does handle UTF-8 files, as do many modern applications. All modern web browsers easily display webpages with charset=utf-8. UTF-8 has become arguably the best solution to handling non-ASCII Unicode text. Any ASCII characters continue to be just 1 byte in UTF-8 (so any plain ASCII file is trivially also UTF-8), while non-ASCII characters are encoded with 2 or more bytes. Any Unicode character is representable in UTF-8.

I personally am interested in this for making subtitles in Esperanto. Currently with BSPlayer as far as I know I must use Latin-3 (aka ISO-8859-3 aka South European) coding, which limits the number of fonts available to me and is a generally less appealing older encoding method. UTF-8 nicely handles all Unicode instead forcing you to use different encodings for different languages (and UTF-8 thus also permits different languages to be mixed together, not possible with Latin-3 etc. which are all 1-byte encodings which thus only permit 256 characters to be represented instead of all Unicode characters, which sucks if, e.g., some character has a French name or German name or whatever and needs letters not in Latin-3.)

Markus Kuhn has a nice FAQ about Unicode, UTF-8, and all that:
http://www.cl.cam.ac.uk/~mgk25/unicode.html

BTW, as Brdja observes, UltraEdit certainly easily detects if a file is UTF-8 or not. Yes, a program that was written assuming all characters are 1-byte will need some rewriting. But it's not a fundamentally hard problem to process UTF-8 text.
Reply With Quote
 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20