BLOG
Base64, JWT, and URL Encoding: A Developer's Reference
A junior developer on our team once spent three hours debugging an API integration. The payload was correct, the authentication was valid, the endpoint was right. The problem? A + character in a Base64-encoded value was being interpreted as a space when passed through a URL query parameter. Standard Base64 and URLs have conflicting opinions about what + means, and nobody told him that.
Encoding is one of those topics that feels simple until it breaks. Base64, URL encoding, and JWT tokens each solve different problems, use different character sets, and fail in different ways. This reference covers how each one works, when to use which, and the specific mistakes that waste hours of debugging time.
Base64: Turning Binary Into Text
Base64 isn't encryption. It isn't compression. It's a way to represent binary data using only printable ASCII characters. That's its entire purpose.
Why would you need that? Because many transport protocols—email (SMTP), JSON, XML, HTML attributes—are text-based. They choke on raw binary data. If you need to embed an image in an email, attach a file to a JSON payload, or stuff a binary value into an HTML attribute, you Base64-encode it first.
The encoding takes every 3 bytes of input and converts them into 4 Base64 characters, using an alphabet of A-Z, a-z, 0-9, +, and /, with = for padding. This means Base64-encoded data is always about 33% larger than the original. A 750KB image becomes roughly 1MB when Base64-encoded.
Base64 vs. Base64URL
This is where the confusion starts. Standard Base64 uses + and /, which are special characters in URLs. If you put a standard Base64 string in a URL query parameter, the + gets interpreted as a space and the / gets interpreted as a path separator. Everything breaks.
Base64URL solves this by swapping + for - and / for _, and dropping the = padding. Same encoding scheme, URL-safe characters. JWTs use Base64URL internally, which is why JWT segments can safely appear in URLs.
| Feature | Standard Base64 | Base64URL |
|---|---|---|
| Character 62 | + |
- |
| Character 63 | / |
_ |
| Padding | = (required) |
Omitted |
| URL-safe? | No | Yes |
| Used by | Email, data URIs, PEM certs | JWTs, URL parameters |
A Base64 encoder/decoder handles both variants. When you're debugging, the first question is always: standard or URL-safe? Getting that wrong produces output that decodes to garbage.
When to Use Base64 (and When Not To)
Good uses: embedding small images in CSS (data: URIs), encoding binary data in JSON payloads, encoding email attachments, encoding certificates and keys in PEM format.
Bad uses: "encrypting" data (it's trivially reversible), storing large files (the 33% size overhead adds up), anything where a direct binary transfer is possible (HTTP supports binary bodies natively).
A common anti-pattern: encoding a 2MB image as Base64 and embedding it in HTML. The image is now 2.7MB of inline text that the browser can't cache separately. Serve the image as a normal file and let the browser cache it.
URL Encoding: Making Strings URL-Safe
URLs have a defined set of allowed characters. Letters, digits, -, _, ., ~ are safe. Everything else—spaces, ampersands, question marks, non-ASCII characters—must be percent-encoded. A space becomes %20. An ampersand becomes %26. The Japanese character becomes %E4%B8%AD.
This matters because URLs have structural characters. ? separates the path from the query string. & separates query parameters. # marks the fragment. If your data contains these characters and you don't encode them, the URL parser interprets them as structure instead of data.
Example of what goes wrong:
// User searches for "cats & dogs"
// Without encoding:
/search?q=cats & dogs
// Browser interprets: q=cats, second parameter named " dogs" (broken)
// With encoding:
/search?q=cats%20%26%20dogs
// Correct: q = "cats & dogs"
A URL encoder/decoder converts strings to and from percent-encoded form. It's especially useful when debugging API calls where parameter values contain special characters, or when constructing URLs with user-provided input.
The + vs. %20 Confusion
Spaces in URLs can be encoded two ways: %20 (standard percent encoding) or + (form encoding, application/x-www-form-urlencoded). These are NOT interchangeable everywhere. + is only valid as a space in query string parameters submitted via HTML forms. In the URL path, + means a literal plus sign.
This is why a Base64 string containing + breaks in a URL—the server interprets it as a space. Use Base64URL encoding or percent-encode the entire Base64 string before putting it in a URL.
JWT Tokens: The Three-Part Sandwich
JSON Web Tokens show up in almost every modern authentication system. They look like three chunks of gibberish separated by dots:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
Each section is Base64URL-encoded JSON (except the signature, which is binary). Decode them and you get:
Header (algorithm and token type):
{"alg": "HS256", "typ": "JWT"}
Payload (claims—the actual data):
{"sub": "1234567890", "name": "John Doe", "iat": 1516239022}
Signature (verification that nothing was tampered with).
The critical thing to understand: the header and payload are NOT encrypted. Anyone with the token can decode them. The signature only proves the token wasn't modified—it doesn't hide the contents. Never put sensitive data (passwords, credit card numbers, SSNs) in a JWT payload unless you're also encrypting the token (JWE).
A JWT decoder splits the token into its three parts and shows you the decoded header and payload. This is invaluable for debugging authentication issues: expired tokens (exp claim), wrong audience (aud), missing scopes, or clock skew between services.
JWT Security Pitfalls
- The "none" algorithm attack. Some JWT libraries accepted tokens with
"alg": "none", which means no signature verification. Always validate the algorithm server-side and reject "none". - Algorithm confusion. If a server expects RS256 (asymmetric) but an attacker sends HS256 (symmetric) signed with the public key, some libraries accepted it. Pin the expected algorithm in your verification code.
- No expiration. JWTs without an
expclaim live forever. Always set expiration, and keep it short (15-60 minutes for access tokens). - Storing in localStorage. Accessible to any JavaScript on the page, including XSS payloads. HttpOnly cookies are more secure for token storage, though they introduce CSRF considerations.
Hashing: One-Way Encoding That Isn't Encoding
Hashing gets conflated with encoding, but they're fundamentally different. Encoding is reversible—you can decode Base64 back to the original. Hashing is a one-way function: you can compute the hash of a value, but you can't reverse it to get the original value back.
Common hash algorithms:
- MD5: 128-bit output. Broken for security purposes (collision attacks are practical), but still used for checksums and non-security contexts.
- SHA-1: 160-bit output. Also deprecated for security since 2017. Don't use for new projects.
- SHA-256: 256-bit output. Currently the standard for most security applications.
- SHA-3: The newest member of the SHA family, designed as a fallback if SHA-2 ever breaks.
A hash generator computes these values instantly. Practical uses: verifying file integrity (does the downloaded file match the published checksum?), storing password hashes (never store plaintext passwords), generating content-based cache keys, and deduplication.
UUIDs: Unique Identifiers Without a Central Authority
Not strictly encoding, but UUIDs come up in the same conversations because developers often need to generate and parse them alongside encoded values. A UUID (Universally Unique Identifier) is a 128-bit value formatted as 32 hexadecimal digits with hyphens: 550e8400-e29b-41d4-a716-446655440000.
The magic of UUIDs is that any system can generate one independently with near-zero collision probability. No central server required, no coordination needed. UUID v4 (random) has 2122 possible values—you'd need to generate 2.71 quintillion UUIDs to have a 50% chance of a single collision.
A UUID generator produces version 4 UUIDs instantly. Use them for database primary keys, session IDs, correlation IDs in distributed systems, or any situation where you need a guaranteed-unique identifier without a sequence generator.
Quick Reference: Which Encoding When
| Scenario | Use | Why |
|---|---|---|
| Binary data in JSON | Base64 | JSON can't represent raw bytes |
| Token in URL parameter | Base64URL | No +, /, or = to break URL parsing |
| User input in query string | URL encoding | Prevent special chars from breaking URL structure |
| Authentication token | JWT (Base64URL internally) | Self-contained, verifiable, standardized |
| File integrity check | SHA-256 hash | Detects any modification to the file |
| Password storage | bcrypt/scrypt/Argon2 hash | Intentionally slow, salted, irreversible |
| Unique identifier | UUID v4 | No central authority needed |
| Hiding data from users | Actual encryption (AES) | Encoding is NOT encryption |
Debugging Encoding Issues: A Systematic Approach
When something isn't decoding correctly, work through this checklist:
- Identify the encoding. Is it Base64, Base64URL, URL-encoded, or hex? Base64 strings end with
=or==. URL-encoded strings have%followed by two hex digits. Hex is all0-9anda-f. - Check for double encoding. If you see
%253Dinstead of%3D, the value was URL-encoded twice.%25is the encoding of%itself. Decode once and you get%3D; decode again and you get=. - Check for encoding mismatch. Base64 decoded with a Base64URL decoder (or vice versa) produces garbled output. The
+vs.-and/vs._substitution looks like random corruption. - Check for truncation. Base64 strings must be a multiple of 4 characters (or have padding). If the string was cut off during transport, decoding fails or produces partial results.
- Check character encoding. A Base64 string represents bytes. Those bytes might be UTF-8, UTF-16, or ASCII text. Decoding the Base64 gives you bytes; interpreting those bytes requires knowing the character encoding.
The fastest debugging approach: paste the problematic string into a Base64 decoder and a URL decoder separately. One of them will produce readable output and immediately tell you which encoding was used.
Encoding is plumbing. Nobody notices it when it works. But when a + becomes a space and your authentication breaks at 2 AM, knowing the difference between Base64 and Base64URL saves you hours. Bookmark the tools. You'll need them more often than you'd think.