Web Development

Punycode Explained: How it Encodes Unicode for DNS

Convert Unicode domain names into ASCII using Punycode. Understand the xn-- prefix, prevent homograph attacks, and manage international DNS records.

12.1k
punycode
Monthly Search Volume
Keyword Research

Punycode is an encoding system that converts Unicode domain names into ASCII strings using only letters, digits, and hyphens. It enables internationalized domain names (IDNs) containing characters like Japanese (例) or German umlauts (ü) to resolve in the DNS infrastructure as ASCII-compatible equivalents (e.g., xn--fsq.com). For marketers and SEO practitioners, understanding Punycode prevents technical errors in global campaigns and protects against homograph phishing attacks that spoof your brand using visually similar Unicode characters.

What is Punycode?

Punycode is a Bootstring algorithm defined in [RFC 3492] (IETF) that encodes Unicode strings into the LDH subset (letters, digits, hyphens) required by DNS hostnames. The encoding prepends the ACE prefix xn-- to the output, signaling that the string represents an internationalized domain name. Modern implementations also comply with [RFC 5891] (RFC 5891) for IDNA (Internationalized Domain Names in Applications) processing.

The algorithm processes domain names by first copying ASCII characters, then encoding non-ASCII characters as variable-length integers appended after a hyphen separator. For example, the German word "München" encodes to Mnchen-3ya, which becomes xn--mnchen-3ya when used as a domain.

Why Punycode matters

  • Enables global SEO: Supports native language domains in Cyrillic, CJK, Arabic, and other scripts, allowing region-specific domain strategies that improve local relevance and click-through rates.
  • Hides homograph attacks: Browsers display decoded Unicode (e.g., apple.com using Cyrillic characters) while underlying DNS requests route to Punycode, enabling sophisticated phishing that mimics legitimate brands.
  • Technical necessity: DNS infrastructure only accepts ASCII characters. Without Punycode, international domains would fail to resolve.
  • Browser display risks: Modern browsers automatically convert Punycode to Unicode in address bars, masking the suspicious xn-- prefix from users.

How Punycode works

  1. ASCII separation: The algorithm copies all ASCII characters from the input to output. If any ASCII characters exist, it appends a hyphen to mark the boundary (e.g., bücher becomes bcher-).
  2. Non-ASCII encoding: Remaining Unicode characters sort by code point value and encode as generalized variable-length integers using letters a-z and digits 0-9. Each character encodes as (insertionPoints × reducedCodepoint) + index.
  3. ACE prefix addition: For DNS use, the system prepends xn-- to the encoded string (e.g., bcher-kva becomes xn--bcher-kva).
  4. Decoding: The process reverses: remove the xn-- prefix, separate ASCII and encoded portions, and reconstruct Unicode characters from the variable-length integers.

Best practices

Verify raw Punycode before clicking Inspect suspicious international links by decoding them first. A domain displaying as "microsoft.com" might actually resolve to xn--micrsoft-65a.com using Cyrillic characters. Use conversion tools to reveal the underlying ASCII.

Force ASCII display in Firefox [Configure the setting network.IDN_show_punycode to true] (Huntress). This exposes the raw xn-- strings in the address bar, preventing visual spoofing attacks.

Monitor for fraudulent IDN registrations Enroll in domain monitoring services that track Unicode variations of your brand name. Attackers register visually identical domains using different scripts to capture credentials from international users.

Use IDNA2008 standard tools When converting domains, ensure your tools implement the [IDNA2008 standard with Unicode TR#46 compatibility processing] (Unicode Consortium). This prevents encoding conflicts between IDNA2003 and IDNA2008 implementations.

Common mistakes

Falling for homograph attacks Malicious actors register domains using visually identical Unicode characters (e.g., Cyrillic а instead of Latin a). Users see a familiar brand name but enter credentials on a spoofed site. Fix: Always verify the underlying Punycode string for suspicious international communications.

Double-encoding Punycode Running Punycode conversion on already-encoded strings creates invalid domains (e.g., encoding xn--mnchen-3ya produces xn--xn--mnchen-3ya-). Fix: Check if the domain starts with xn-- before processing.

Relying on browser protection Modern browsers display Unicode versions by default, hiding the encoded xn-- prefix that would alert users to spoofing attempts. Fix: Manually decode suspicious domains using verification tools before accessing them.

Using deprecated Node.js modules The [punycode module bundled in Node.js has been deprecated since version 7.0.0] (Node.js Documentation). Continuing to use it may result in compatibility issues. Fix: Switch to the userland Punycode.js module or the WHATWG URL API for domain encoding.

Examples

Input Punycode Notes
München xn--mnchen-3ya German umlaut
例.com xn--fsq.com CJK character (Japanese/Chinese)
mañana maana-pta Spanish tilde
bücher bcher-kva Book (German)
правда xn--80aafi6cg Russian Cyrillic (truth)
😉 n28h Winking emoji

Punycode vs Unicode

Feature Punycode Unicode
Goal DNS compatibility Universal character representation
Character set ASCII only (LDH subset) Multi-script (CJK, Cyrillic, emoji, etc.)
Domain usage Required for IDN registration and DNS records Displayed to users in browsers
Prefix Requires xn-- ACE prefix No prefix required
Security risk Enables homograph spoofing None inherent, but enables visual similarity attacks
Example xn--maana-pta mañana

Rule of thumb: Use Unicode for content display and user interface text; use Punycode for DNS records, server configurations, and domain validation checks.

FAQ

What does the "xn--" prefix mean?

The xn-- prefix marks ACE (ASCII Compatible Encoding), signaling to DNS systems that the following string is Punycode representing Unicode characters. RFC 3492 specifies this encoding to prevent standard ASCII domains containing hyphens from being misinterpreted as internationalized domains.

Why do I see Unicode in my browser but "xn--" in my DNS records?

Browsers automatically decode Punycode to display human-readable Unicode for usability. However, DNS infrastructure only accepts ASCII characters, so zone files and technical records must use the Punycode format starting with xn--.

How do I convert a domain to Punycode?

Use conversion tools that implement the [IDNA2008 standard with Unicode TR#46 compatibility processing] (Unicode Consortium). Input the Unicode domain, and the tool outputs the ASCII-compatible version with the xn-- prefix.

Is Punycode secure for international domains?

The encoding itself is technically neutral, but it facilitates homograph attacks where malicious domains use visually identical characters from different scripts (e.g., Cyrillic vs. Latin) to spoof legitimate brands. Security teams should [force ASCII display in browsers] (Huntress) and monitor for fraudulent registrations.

Why is Punycode deprecated in Node.js?

The [bundled punycode module has been deprecated since Node.js v7.0.0] (Node.js Documentation) and will be removed in future major versions. Developers should migrate to the userland Punycode.js module or use the WHATWG URL API for domain encoding operations.

Can Punycode encode emojis?

Yes. For example, the winking emoji encodes to n28h. However, practical use in live domains is limited by registry policies and browser support restrictions on emoji domains.

Start Your SEO Research in Seconds

5 free searches/day • No credit card needed • Access all features