Zero-Width Characters for Text Encoding
How zero-width characters are used to encode hidden data in text — steganography and watermarking.
Published:
Tags: zero-width text steganography, hidden text encoding unicode, text steganography unicode
Zero-Width Characters for Text Encoding Zero-width characters are Unicode code points that render with no visible width — they exist in the byte stream but produce no glyph. By mapping these characters to binary values, it's possible to embed a hidden payload inside any visible text without changing a single visible character. These characters are defined in the Unicode Standard — the relevant code points are documented in Unicode Technical Report #36 (Unicode Security Considerations) as they relate to text spoofing and steganography risks. --- The Zero-Width Character Set Unicode defines several characters that render invisibly: | Character | Code point | UTF-8 bytes | Common name | |-----------|-----------|-------------|-------------| | ZERO WIDTH SPACE | U+200B | E2 80 8B | ZWSP | |…
Frequently Asked Questions
How can I hide data in plain text?
Zero-width Unicode characters (ZWSP, ZWNJ, ZWJ, and WORD JOINER) are visually invisible in most renderers. By encoding binary data as sequences of these characters — for example mapping 0→ZWSP and 1→ZWNJ — you can embed hidden data anywhere in visible text. The payload appears invisible but is present in the byte stream.
What is text-based steganography?
Text steganography hides information within text without altering its visible content. Methods include using zero-width characters, varying whitespace, using Unicode homoglyphs, or embedding data in line-ending choices. Unlike image steganography, text steganography leaves no binary artifact — the carrier is human-readable text.
How do zero-width characters encode bits?
A common scheme uses two zero-width characters to represent binary digits: for example, Zero-Width Space (U+200B) = bit 0, Zero-Width Non-Joiner (U+200C) = bit 1. A sequence of 8 such characters encodes one byte. The encoded data is inserted between visible characters of the cover text.
How do I detect zero-width steganography?
Paste the suspicious text into a tool that renders zero-width characters visibly (like the Zero-Width Detector on this site). Programmatically, scan for code points U+200B, U+200C, U+200D, U+FEFF, U+2060–U+2064, and U+FFF9–U+FFFB. These characters appear with frequency far above zero in stego-text versus baseline.
What tools detect hidden text in documents?
The Zero-Width Detector tool highlights all invisible characters in pasted text. For documents, open the file in a hex editor and search for the byte sequences of zero-width code points (e.g. E2 80 8B for U+200B in UTF-8). LibreOffice's 'Show Formatting Marks' also reveals non-printing characters.
All articles · theproductguy.in