UTF-8 is a binary text encoding, used for Unicode characters. This makes it possible to use UTF-8 for (almost) any language on Earth, which is one of the reasons for its popularity.
The UTF-8 text encoder tool stores each Unicode character into a variable number of bytes. Since JavaScript (and many other languages) internally use only two bytes for each character (UTF-16), this conversion utility will only handle code points up to U+FFFF:
| Byte 1 | Byte 2 | Byte 3 | Comment |
|---|---|---|---|
| 0xxxxxxx | - | - | ASCII (7-bit) characters are unmodified |
| 110xxxxx | 10xxxxxx | - | Code points up to U+07FF |
| 1110xxxx | 10xxxxxx | 10xxxxxx | Code points up to U+FFFF |
By setting the input encoding to UTF-8, the same UTF-8 tool can also decode a raw byte stream in UTF-8 format or correct a garbled text.
See the Wikipedia article on UTF-8 for more info.