Binary Encoding in Programming
How computers encode text as binary — ASCII, UTF-8, and Unicode in Python, JavaScript, and Java.
Published:
Tags: binary encoding programming, text encoding binary code, binary string Python
Binary Encoding in Programming Part of our complete guide to this topic — see the full series. Every string in every programming language is ultimately a sequence of bytes. This guide covers how Python, JavaScript, Java, and Go represent text as binary — with practical code examples for each. --- All the tools discussed here are available for free at theproductguy.in — client-side, no sign-up required. What is The Encoding Stack? Text encoding works in layers: Unicode code point: a number assigned to each character (e.g., 'A' = U+0041 = 65) Encoding: a rule for representing code points as bytes (ASCII, UTF-8, UTF-16) Binary: the actual bytes stored in memory or written to disk Most modern languages use UTF-8 as the default string encoding. UTF-8 represents: Code points 0–127 as a single…
Frequently Asked Questions
How does Python represent strings in binary?
In Python 3, strings are Unicode by default (internally stored as UTF-32 or UTF-8 depending on content). To get the binary representation, you first encode the string to bytes using str.encode('utf-8') or str.encode('ascii'), then convert each byte to its binary form using format(byte, '08b'). Python does not expose a raw binary string type — you work with bytes objects.
How do I convert a string to bytes in JavaScript?
Use the TextEncoder API: const encoder = new TextEncoder(); const bytes = encoder.encode('Hello'); This returns a Uint8Array of UTF-8 encoded bytes. To go back, use TextDecoder: const text = new TextDecoder().decode(bytes). For ASCII-only strings you can use charCodeAt(i) to get the byte value of each character.
How does Java encode strings?
Java strings are stored internally as UTF-16 (since Java 9, compact strings use Latin-1 or UTF-16 depending on content). To convert to a byte array in a specific encoding, use str.getBytes(StandardCharsets.UTF_8). To convert bytes back to a string, use new String(bytes, StandardCharsets.UTF_8). Always specify the charset explicitly — the system default can vary.
What is the ord() function in Python?
ord() returns the Unicode code point of a single character. ord('A') returns 65, ord('é') returns 233, ord('😀') returns 128512. It is the inverse of chr(): chr(65) returns 'A'. For multi-byte characters in UTF-8, ord() returns the Unicode code point, not the individual bytes — to get bytes, use char.encode('utf-8').
How do I print the binary of a string in Go?
In Go, convert the string to a byte slice with []byte(s), then iterate and format each byte: for _, b := range []byte(s) { fmt.Printf("%08b\n", b) }. Go strings are UTF-8 by default; ranging over a string with range gives runes (Unicode code points), while ranging over []byte(s) gives individual bytes.
All articles · theproductguy.in