UTF-8

The UTF-8 encoding is a variable-width encoding that encodes US-ASCII characters by an 8-bit representation of their 7-bit code points, but uses two, three, or four 8-bit bytes to represent non-ASCII characters.

The variable-width encoding makes UTF-8 more complex than UTF-32, but UTF-8 encodings usually consume less space than UTF-16. For English texts, UTF-8 is twice as compact as UTF-16 and four times as compact as UTF-32.

UTF-8 is used for Unicode text files on Unix, Linux, and Macintosh machines, and is used to represent strings in some programming languages and libraries.

For debugging: Click here to validate.