Punycode Phishing and IDN Homograph Attacks
How attackers use Punycode lookalike domains for phishing — and how browsers protect against it.
Published:
Tags: Punycode phishing attacks, IDN homograph attack, Unicode domain spoofing
Punycode Phishing and IDN Homograph Attacks Part of our complete guide to this topic — see the full series. Internationalized domain names enable global internet use. They also enable a class of phishing attacks that are nearly undetectable without specialized tools. Here is how the attacks work and how to defend against them. --- All the tools discussed here are available for free at theproductguy.in — client-side, no sign-up required. What about The Homograph Attack: Background? The IDN homograph attack was first described in detail by Evgeniy Gabrilovich and Alex Gontmakher in their 2002 paper "The Homograph Attack." They registered the domain (with Cyrillic 'а') as a proof of concept and demonstrated that it was indistinguishable from the real PayPal domain in the browser address bars…
Frequently Asked Questions
What is an IDN homograph attack?
An IDN homograph attack registers a domain name using Unicode characters that look visually identical to the characters in a legitimate domain. For example, Cyrillic 'а' (U+0430) is nearly indistinguishable from Latin 'a' (U+0061). A phishing site at 'аpple.com' (Cyrillic а) appears to be 'apple.com' in many fonts and early browser address bars.
How do browsers display Punycode domains?
Modern browsers use display policies to decide when to show the Unicode form vs the Punycode (xn--) form. A domain is shown in Unicode if it uses characters from a single script and the TLD's policy permits it. If the domain mixes scripts (e.g., Latin + Cyrillic), uses known confusable characters, or triggers other heuristics, the browser shows the Punycode xn-- form to alert users.
Which Unicode characters look like ASCII?
Many Unicode scripts contain characters visually similar to ASCII letters. Cyrillic а (U+0430) looks like Latin a. Greek ο (U+03BF) looks like Latin o. Cyrillic с (U+0441) looks like Latin c. The Unicode Consortium maintains a confusables database listing thousands of character pairs that can be confused in common fonts.
How do I detect Punycode phishing domains?
Check if a domain converts to Punycode using xn-- prefixes and then inspect which Unicode characters are present. Look for characters from unexpected scripts given the domain's claimed language. The Unicode confusables.txt file lists all known lookalike character pairs. Security tools like Python's idna library and the confusables package can automate this detection.
What is mixed-script detection?
Mixed-script detection identifies domain labels that combine characters from multiple Unicode scripts (e.g., Latin + Cyrillic). Since legitimate domains almost never mix scripts, mixing is a strong signal of a homograph attack. Modern browsers use the Unicode standard's script-mixing rules (IDNA 2008) to decide whether to display a domain in Unicode or fall back to Punycode.
All articles · theproductguy.in