HTML Character Sets
To display an HTML page correctly, a web browser must know the character set used in the page.
The Importance of UTF-8
Character encoding is the process of converting characters into a binary format that computers can understand. Over the years, many different encoding standards have been used, like ASCII, ANSI, and ISO-8859-1.
However, the modern standard for the web is UTF-8 (Unicode Transformation Format - 8-bit). UTF-8 can represent any character in the Unicode standard, including all languages, symbols, and emojis. This makes it the most flexible and widely supported character set.
Specifying UTF-8 in HTML
To set the character encoding to UTF-8 for an HTML page, include the following <meta> tag as the very first element in your <head> section:
<meta charset="UTF-8">Declaring the character encoding early is important because it tells the browser how to interpret the bytes of the HTML file into readable text.
Test Yourself with an Exercise
How do you specify the character set for an HTML document?