UTF-16

The UTF-16 encoding encodes common characters as a 16-bit representation of their code points, but encodes uncommon characters using a surrogate pair consisting of two 16-bit code units called the high and low surrogates. (The high surrogate is so named because it comes first, but the code units for high surrogates are actually less than the code units for low surrogates.)

Although the presence of surrogate pairs makes UTF-16 more complex, the fact that common characters are represented by 16 bits means UTF-16 usually takes up only half the space of UTF-32. On the other hand, UTF-16 representations of English text are twice as large as US-ASCII or Latin-1 representations.

UTF-16 is used to represent strings in Java and C#, and a variant of UTF-16 is used by text files on Windows machines.

For debugging: Click here to validate.