HTML special characters are symbols and letters not found on standard keyboards, or characters reserved for HTML syntax, that require specific entity codes to display correctly in web browsers. Also called HTML entities, character references, or symbol entities, these codes prevent the browser from misinterpreting your content as markup. For marketers and SEO practitioners, proper escaping ensures that title tags, meta descriptions, and user-generated content render correctly without breaking page structure or creating security vulnerabilities.
What is HTML Special Characters?
HTML special characters include any symbol that cannot be typed directly into HTML source code without causing display issues or parsing errors. This encompasses reserved characters that form the HTML language itself, such as the less-than sign (<) and ampersand (&), as well as extended symbols like currency signs (€), mathematical operators (∑), and accented letters (é).
You represent these characters using entity codes. An entity code can be a named entity, such as © for the copyright symbol, or a numeric reference, such as © (decimal) or © (hexadecimal). All three formats render the same symbol in the browser, but named entities are easier to read when editing code.
Why HTML Special Characters matters
- Prevent page breakage. Unescaped reserved characters can cause the browser to interpret your text as HTML tags, breaking layout or hiding content. Escaping
<and>ensures they display as literal characters. - Block XSS attacks. Converting characters like
<and>to<and>in user input prevents malicious scripts from executing in visitors' browsers. - Maintain SEO integrity. Reserved characters in title tags, meta descriptions, or structured data can corrupt HTML and hurt search visibility. Proper entities ensure crawlers read your markup correctly.
- Ensure cross-browser consistency. Entity codes render identically across operating systems and browsers, regardless of the user's keyboard layout or installed fonts.
- Support international content. Entities allow you to display Western European accented characters (ISO 8859-1) and symbols without requiring UTF-8 input methods.
How HTML Special Characters works
When a browser parses HTML, it looks for specific syntax to identify tags and attributes. If it encounters a reserved character in text content, it may try to parse it as markup, causing errors.
To display a special character, you insert an entity code into your HTML source:
- Identify the character. Determine if you need a reserved character (like
<) or an extended symbol (like©). - Choose the format. Use either a named entity (
&name;), a decimal number (&#number;), or a hexadecimal reference (&#xhex;). - Insert the code. Place the entity directly in your HTML where you want the symbol to appear.
- Declare encoding. Ensure your document specifies a character set (such as UTF-8) that matches your entity references to avoid empty strings or corrupted output.
The browser replaces the entity code with the corresponding character during rendering. For example, typing € tells the browser to display the euro symbol (€), while < displays a literal less-than sign rather than starting a new HTML tag.
Types of HTML Special Characters
The corpus organizes these into distinct categories based on usage and origin.
Reserved Characters
These five characters form the HTML language syntax. You must escape them in content to prevent parsing errors.
| Character | Entity Name | Decimal | Use case |
|---|---|---|---|
| " | " |
" |
Quotation marks |
| ' | ' |
' |
Apostrophe |
| & | & |
& |
Ampersand |
| < | < |
< |
Less-than sign |
| > | > |
> |
Greater-than sign |
General Symbols
Includes currency (€ £ ¥), punctuation (em-dash, curly quotes), and spacing characters (non-breaking space). These enhance readability but are not on standard keyboards.
Mathematical Symbols
Operators like summation (∑), square root (√), and Greek letters used in scientific content.
ISO 8859-1 Characters
Western European accented letters (à, é, ñ, ü) and symbols covering French, German, Spanish, and other Latin-based languages.
Best practices
- Escape all user input. Before displaying form data or database content in HTML, convert
&,<, and>to entities using a function like PHP'shtmlspecialchars(). This blocks XSS attacks. - Use double quotes for attributes. When escaping attribute values, wrap them in double quotes. If you use single quotes, escaped single quotes may still break the attribute depending on your flags.
- Specify UTF-8 explicitly. Declare
<meta charset="UTF-8">in your document head and ensure your source files are saved as UTF-8. Mismatched encoding causeshtmlspecialchars()to return empty strings. - Prefer entity names for readability. Use
©instead of©when editing templates manually. Names are easier to scan and remember than numeric codes. - Avoid ambiguous ampersands. Always terminate entity names with a semicolon. Write
©not©to prevent validation errors. - Set comprehensive PHP flags. Use
ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML5to convert both single and double quotes, replace invalid sequences with the Unicode replacement character, and support the broadest set of named entities.
Common mistakes
- Mistake: Typing reserved characters directly into content. Writing
if x < yin a paragraph can break the page if the browser interprets<as a tag opening. Fix: Use<for less-than and>for greater-than. - Mistake: Leaving ampersands unescaped in URLs. Query strings like
?a=1&b=2contain raw ampersands that should be&in HTML attributes. Fix: Convert&to&within href and src attributes. - Mistake: Using single quotes around HTML attributes after escaping. If you escape quotes with
htmlspecialchars()but wrap the attribute in single quotes, the value may truncate at the first single quote. Fix: Use double quotes for HTML attributes, or ensure you use theENT_QUOTESflag. - Mistake: Assuming
htmlspecialchars()sanitizes for SQL databases. This function only prepares data for HTML display. It does not protect against SQL injection. Fix: Use database-specific escape functions (likepg_escape_string) for SQL queries. - Mistake: Mismatched character sets between files and functions. Saving files as ISO-8859-1 but calling
htmlspecialchars()with UTF-8 (the default since PHP 5.4) results in empty strings when special characters appear. Fix: Save source files as UTF-8 and explicitly declare the encoding in your function calls.
Examples
- Displaying copyright: Use
© 2024 Company Nameor© 2024 Company Nameto render © without relying on the character map. - Escaping user input in PHP:
echo htmlspecialchars($comment, ENT_QUOTES | ENT_HTML5, 'UTF-8');converts a malicious<script>alert('xss')</script>input into<script>alert('xss')</script>, rendering it harmless text. - Mathematical notation: Write
The sum ∑ of valuesto display "The sum ∑ of values" without needing a math keyboard. - Non-breaking spaces: Use
100 kmto prevent the number and unit from splitting across two lines.
FAQ
What characters must I escape in HTML5?
You must escape the less-than sign (<) and ambiguous ampersands (& followed by something that could start an entity name but isn't valid). In attribute values, also escape the quote character used to delimit the value.
What is the difference between © and ©?
Both produce the copyright symbol (©). © is a named entity that is easier to remember. © is a decimal numeric reference that works even if the entity name is not defined in the document type.
Why does htmlspecialchars() return an empty string?
The input string likely contains characters encoded in a different character set than the one specified in the function. Since PHP 5.4, the default encoding is UTF-8. If your source file is ISO-8859-1, the function returns an empty string. Ensure both match.
Should I use htmlentities() or htmlspecialchars()?
Use htmlspecialchars() for security and UTF-8 compatibility. It only converts the five critical characters and is faster. htmlentities() converts all applicable characters to entities, which can unnecessarily bloat your HTML and corrupt UTF-8 strings unless you specify the encoding correctly.
Can I use HTML entities in meta descriptions and titles?
Yes, but reserved characters must be escaped to prevent breaking the tag. For example, use " for quotes within a meta description attribute.
What is an ambiguous ampersand?
It is an ampersand followed by alphanumeric characters that resembles an entity name but lacks a terminating semicolon (like © without the semicolon). This causes HTML validation errors and unpredictable rendering.