Confusable Usernames: Security Risks
How Unicode lookalike characters create username confusion attacks — prevention strategies.
Published:
Tags: confusable username attacks, username spoofing Unicode, homograph username
Confusable Usernames: Security Risks A confusable username attack registers a username that looks identical to a legitimate user's handle by swapping ASCII characters with visually similar Unicode equivalents. The attacker impersonates a trusted account without hacking it — just exploiting the gap between what computers store and what human eyes see. --- What is the anatomy of a confusable attack? Unicode contains tens of thousands of characters across dozens of scripts. Many characters look nearly identical: | Visible text | Code points | Script | |---|---|---| | | U+0061 U+0064 U+006D U+0069 U+006E | Latin | | | U+0430 U+0064 U+006D U+0069 U+006E | Cyrillic + Latin | | | U+0430 U+0064 U+006D U+0456 U+006E | Cyrillic (і = Ukrainian i) | The second and third rows are visually identical to…
Frequently Asked Questions
How do attackers create confusable usernames?
Attackers register usernames that look identical to a target username by replacing one or more ASCII characters with visually similar Unicode characters — for example, replacing lowercase 'a' (U+0061) with Cyrillic 'а' (U+0430). The database sees them as different strings, but human eyes see the same name.
How do I prevent Unicode username spoofing?
Normalize all usernames to NFC before storage, check registrations against the Unicode confusables list (UCD Confusables.txt), restrict allowed scripts per user profile, and display a visual warning when a new username is confusable with an existing one.
Should I restrict usernames to ASCII?
ASCII-only usernames eliminate confusable attacks entirely but exclude legitimate users whose names use non-ASCII characters. A middle path: allow a single Unicode script per username (e.g. all-Cyrillic or all-Latin) and block mixed-script strings, which is the approach recommended in Unicode TR39.
What is the Unicode Security Considerations document?
Unicode Technical Report #36 (UTR36) and Technical Standard #39 (UTS39) describe security risks from Unicode text — including confusable characters, mixed scripts, and bidi overrides. UTS39 is the active standard and includes the confusables data file used for detection.
How do I normalize usernames for comparison?
Apply NFC normalization, convert to lowercase using Unicode case folding (not locale-specific lowercasing), and strip zero-width characters before comparison. For extra safety, compare the skeleton form defined in UTS39 — which maps all confusable characters to a canonical representative.
All articles · theproductguy.in