Web Development

HTML Special Characters: Entity Codes and Usage Guide

Use HTML special characters to display symbols and prevent XSS attacks. Apply entity codes and named entities to ensure consistent web rendering.

4.4k
html special characters
Monthly Search Volume

HTML special characters are symbols and letters not found on standard keyboards, or characters reserved for HTML syntax, that require specific entity codes to display correctly in web browsers. Also called HTML entities, character references, or symbol entities, these codes prevent the browser from misinterpreting your content as markup. For marketers and SEO practitioners, proper escaping ensures that title tags, meta descriptions, and user-generated content render correctly without breaking page structure or creating security vulnerabilities.

What is HTML Special Characters?

HTML special characters include any symbol that cannot be typed directly into HTML source code without causing display issues or parsing errors. This encompasses reserved characters that form the HTML language itself, such as the less-than sign (<) and ampersand (&), as well as extended symbols like currency signs (€), mathematical operators (∑), and accented letters (é).

You represent these characters using entity codes. An entity code can be a named entity, such as &copy; for the copyright symbol, or a numeric reference, such as &#169; (decimal) or &#xA9; (hexadecimal). All three formats render the same symbol in the browser, but named entities are easier to read when editing code.

Why HTML Special Characters matters

  • Prevent page breakage. Unescaped reserved characters can cause the browser to interpret your text as HTML tags, breaking layout or hiding content. Escaping < and > ensures they display as literal characters.
  • Block XSS attacks. Converting characters like < and > to &lt; and &gt; in user input prevents malicious scripts from executing in visitors' browsers.
  • Maintain SEO integrity. Reserved characters in title tags, meta descriptions, or structured data can corrupt HTML and hurt search visibility. Proper entities ensure crawlers read your markup correctly.
  • Ensure cross-browser consistency. Entity codes render identically across operating systems and browsers, regardless of the user's keyboard layout or installed fonts.
  • Support international content. Entities allow you to display Western European accented characters (ISO 8859-1) and symbols without requiring UTF-8 input methods.

How HTML Special Characters works

When a browser parses HTML, it looks for specific syntax to identify tags and attributes. If it encounters a reserved character in text content, it may try to parse it as markup, causing errors.

To display a special character, you insert an entity code into your HTML source:

  1. Identify the character. Determine if you need a reserved character (like <) or an extended symbol (like ©).
  2. Choose the format. Use either a named entity (&name;), a decimal number (&#number;), or a hexadecimal reference (&#xhex;).
  3. Insert the code. Place the entity directly in your HTML where you want the symbol to appear.
  4. Declare encoding. Ensure your document specifies a character set (such as UTF-8) that matches your entity references to avoid empty strings or corrupted output.

The browser replaces the entity code with the corresponding character during rendering. For example, typing &euro; tells the browser to display the euro symbol (€), while &lt; displays a literal less-than sign rather than starting a new HTML tag.

Types of HTML Special Characters

The corpus organizes these into distinct categories based on usage and origin.

Reserved Characters

These five characters form the HTML language syntax. You must escape them in content to prevent parsing errors.

Character Entity Name Decimal Use case
" &quot; &#34; Quotation marks
' &apos; &#39; Apostrophe
& &amp; &#38; Ampersand
< &lt; &#60; Less-than sign
> &gt; &#62; Greater-than sign

General Symbols

Includes currency (€ £ ¥), punctuation (em-dash, curly quotes), and spacing characters (non-breaking space). These enhance readability but are not on standard keyboards.

Mathematical Symbols

Operators like summation (∑), square root (√), and Greek letters used in scientific content.

ISO 8859-1 Characters

Western European accented letters (à, é, ñ, ü) and symbols covering French, German, Spanish, and other Latin-based languages.

Best practices

  • Escape all user input. Before displaying form data or database content in HTML, convert &, <, and > to entities using a function like PHP's htmlspecialchars(). This blocks XSS attacks.
  • Use double quotes for attributes. When escaping attribute values, wrap them in double quotes. If you use single quotes, escaped single quotes may still break the attribute depending on your flags.
  • Specify UTF-8 explicitly. Declare <meta charset="UTF-8"> in your document head and ensure your source files are saved as UTF-8. Mismatched encoding causes htmlspecialchars() to return empty strings.
  • Prefer entity names for readability. Use &copy; instead of &#169; when editing templates manually. Names are easier to scan and remember than numeric codes.
  • Avoid ambiguous ampersands. Always terminate entity names with a semicolon. Write &copy; not &copy to prevent validation errors.
  • Set comprehensive PHP flags. Use ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML5 to convert both single and double quotes, replace invalid sequences with the Unicode replacement character, and support the broadest set of named entities.

Common mistakes

  • Mistake: Typing reserved characters directly into content. Writing if x < y in a paragraph can break the page if the browser interprets < as a tag opening. Fix: Use &lt; for less-than and &gt; for greater-than.
  • Mistake: Leaving ampersands unescaped in URLs. Query strings like ?a=1&b=2 contain raw ampersands that should be &amp; in HTML attributes. Fix: Convert & to &amp; within href and src attributes.
  • Mistake: Using single quotes around HTML attributes after escaping. If you escape quotes with htmlspecialchars() but wrap the attribute in single quotes, the value may truncate at the first single quote. Fix: Use double quotes for HTML attributes, or ensure you use the ENT_QUOTES flag.
  • Mistake: Assuming htmlspecialchars() sanitizes for SQL databases. This function only prepares data for HTML display. It does not protect against SQL injection. Fix: Use database-specific escape functions (like pg_escape_string) for SQL queries.
  • Mistake: Mismatched character sets between files and functions. Saving files as ISO-8859-1 but calling htmlspecialchars() with UTF-8 (the default since PHP 5.4) results in empty strings when special characters appear. Fix: Save source files as UTF-8 and explicitly declare the encoding in your function calls.

Examples

  • Displaying copyright: Use &copy; 2024 Company Name or &#169; 2024 Company Name to render © without relying on the character map.
  • Escaping user input in PHP: echo htmlspecialchars($comment, ENT_QUOTES | ENT_HTML5, 'UTF-8'); converts a malicious <script>alert('xss')</script> input into &lt;script&gt;alert(&#039;xss&#039;)&lt;/script&gt;, rendering it harmless text.
  • Mathematical notation: Write The sum &sum; of values to display "The sum ∑ of values" without needing a math keyboard.
  • Non-breaking spaces: Use 100&nbsp;km to prevent the number and unit from splitting across two lines.

FAQ

What characters must I escape in HTML5?

You must escape the less-than sign (<) and ambiguous ampersands (& followed by something that could start an entity name but isn't valid). In attribute values, also escape the quote character used to delimit the value.

What is the difference between &copy; and &#169;?

Both produce the copyright symbol (©). &copy; is a named entity that is easier to remember. &#169; is a decimal numeric reference that works even if the entity name is not defined in the document type.

Why does htmlspecialchars() return an empty string?

The input string likely contains characters encoded in a different character set than the one specified in the function. Since PHP 5.4, the default encoding is UTF-8. If your source file is ISO-8859-1, the function returns an empty string. Ensure both match.

Should I use htmlentities() or htmlspecialchars()?

Use htmlspecialchars() for security and UTF-8 compatibility. It only converts the five critical characters and is faster. htmlentities() converts all applicable characters to entities, which can unnecessarily bloat your HTML and corrupt UTF-8 strings unless you specify the encoding correctly.

Can I use HTML entities in meta descriptions and titles?

Yes, but reserved characters must be escaped to prevent breaking the tag. For example, use &quot; for quotes within a meta description attribute.

What is an ambiguous ampersand?

It is an ampersand followed by alphanumeric characters that resembles an entity name but lacks a terminating semicolon (like &copy without the semicolon). This causes HTML validation errors and unpredictable rendering.

Start Your SEO Research in Seconds

5 free searches/day • No credit card needed • Access all features