Character Encoding
A character encoding system is a numbering of each character in a given character set, in which each character is assigned a distinct number.
Example
In the ASCII character set, character A is represented by the number 65, B by 66, C by 67 and so on in increasing sequence. These numbers are often referred to as 'codes' or 'character codes'. Internally, software uses these sequences of these numbers to represent text.
Supported Encodings
IMan provides the following implementations of the Encoding to support current Unicode encodings and other encodings:
asmo-708
Arabic (ASMO 708)
big5
Chinese Traditional (Big5)
cp866
Cyrillic (DOS)
cp875
IBM EBCDIC (Greek Modern)
dos-720
Arabic (DOS)
dos-862
Hebrew (DOS)
euc-jp
Japanese (JIS 0208-1990 and 0212-1990)
euc-jp
Japanese (EUC)
euc-kr
Korean (EUC)
gb2312
Chinese Simplified (GB2312)
ibm00858
OEM Multilingual Latin I
ibm037
IBM EBCDIC (US-Canada)
ibm437
OEM United States
ibm500
IBM EBCDIC (International)
ibm737
Greek (DOS)
ibm775
Baltic (DOS)
ibm850
Western European (DOS)
ibm852
Central European (DOS)
ibm855
OEM Cyrillic
ibm857
Turkish (DOS)
ibm860
Portuguese (DOS)
ibm861
Icelandic (DOS)
ibm863
French Canadian (DOS)
ibm864
Arabic (864)
ibm865
Nordic (DOS)
ibm869
Greek, Modern (DOS)
ibm870
IBM EBCDIC (Multilingual Latin-2)
iso-2022-jp
Japanese (JIS)
iso-2022-jp
Japanese (JIS-Allow 1 byte Kana - SO/SI)
iso-2022-kr
Korean (ISO)
iso-8859-1
Western European (ISO)
iso-8859-13
Estonian (ISO)
iso-8859-15
Latin 9 (ISO)
iso-8859-2
Central European (ISO)
iso-8859-3
Latin 3 (ISO)
iso-8859-4
Baltic (ISO)
iso-8859-5
Cyrillic (ISO)
iso-8859-6
Arabic (ISO)
iso-8859-7
Greek (ISO)
iso-8859-8
Hebrew (ISO-Visual)
iso-8859-9
Turkish (ISO)
koi8-r
Cyrillic (KOI8-R)
koi8-u
Cyrillic (KOI8-U)
ks_c_5601-1987
Korean
macintosh
Western European (Mac)
shift_jis
Japanese (Shift-JIS)
unicodefffe
Unicode (Big endian)
us-ascii
US-ASCII
utf-32
Unicode (UTF-32)
utf-32be
Unicode (UTF-32 Big endian)
utf-7
Unicode (UTF-7)
utf-8
Unicode (UTF-8)
windows-1250
Central European (Windows)
windows-1251
Cyrillic (Windows)
windows-1252
Western European (Windows)
windows-1253
Greek (Windows)
windows-1254
Turkish (Windows)
windows-1255
Hebrew (Windows)
windows-1256
Arabic (Windows)
windows-1257
Baltic (Windows)
windows-1258
Vietnamese (Windows)
windows-874
Thai (Windows)
x-mac-arabic
Arabic (Mac)
x-mac-ce
Central European (Mac)
x-mac-chinesesimp
Chinese Simplified (Mac)
x-mac-chinesetrad
Chinese Traditional (Mac)
x-mac-croatian
Croatian (Mac)
x-mac-cyrillic
Cyrillic (Mac)
x-mac-greek
Greek (Mac)
x-mac-hebrew
Hebrew (Mac)
x-mac-icelandic
Icelandic (Mac)
x-mac-japanese
Japanese (Mac)
x-mac-korean
Korean (Mac)
x-mac-romanian
Romanian (Mac)
x-mac-thai
Thai (Mac)
x-mac-turkish
Turkish (Mac)
x-mac-ukrainian
Ukrainian (Mac)
Byte Order Mark (Unicode Files)
The byte order mark (BOM) is a Unicode character used to signal the ‘endianness’ (byte order) of a text file or stream. Its code point is U+FEFF. BOM use is optional, and, if used, should appear at the start of the text stream. Beyond its specific use as a byte-order indicator, the BOM character may also indicate which of the several Unicode representations the text is encoded in.
Because Unicode can be encoded as 16-bit or 32-bit integers, a computer receiving Unicode text from arbitrary sources needs to know which byte order the integers are encoded in. The BOM gives the producer of the text a way to describe the text stream's ‘endianness’ to the consumer of the text without requiring some contract or metadata outside of the text stream itself.