developer

Base64 Isn't Encryption: What It Actually Does (and Why the Web Needs It)

It looks scrambled, so people assume it's secret — but Base64 hides nothing. Here's what Base64 really is, how 3 bytes become 4 characters, why it's 33% bigger, and where it quietly runs the internet.

Base64 Isn't Encryption: What It Actually Does (and Why the Web Needs It)
8min read
1.6Kwords
0views
3topics
🔐Try the toolBase64 Encode/Decode

Open a website's stylesheet and you might stumble across something that looks like a cat walked across the keyboard: a small icon crammed into the CSS as a wall of gibberish that begins data:image/png;base64,iVBORw0KGgo…. That gibberish is a real image, smuggled into a text file as plain characters. The trick that makes it possible is Base64 — one of those invisible technologies the entire internet quietly leans on, and one almost everyone misunderstands.

The problem Base64 was invented to solve

To appreciate Base64, you have to remember an awkward truth about the early internet: a lot of it could only handle text — and not even all text, just a narrow set of English characters.

Email is the classic example. The original email systems were built to carry 7-bit ASCII: letters, digits, and basic punctuation. Each byte has 8 bits, but mail servers of the era often used or stripped that 8th bit, and they treated certain byte values as special control signals. Plain English survived fine. But a photo, a PDF, a ZIP file — any binary data — is made of arbitrary bytes using all 8 bits, including values that a text-only mail server would mangle or interpret as commands. Send a JPEG straight through, and what arrived on the other end was confetti.

So the question became: how do you push binary data through a pipe that only safely carries a small alphabet of text characters? The answer is to translate the binary into nothing but those safe characters — and translate it back on the other side. That translation is Base64.

How Base64 actually works

The name is the whole spec in disguise: it encodes data using a base of 64 characters. The chosen alphabet is deliberately boring and universally safe: the 26 uppercase letters A–Z, the 26 lowercase a–z, the 10 digits 0–9, and two extra symbols, + and /. That's 64 symbols, every one of which survives a trip through text-only systems.

Here's the clever part. Computers store data in 8-bit bytes, but Base64 works in 6-bit chunks — because 6 bits gives exactly 2⁶ = 64 possibilities, one for each character in the alphabet. So the algorithm takes your data 3 bytes at a time (that's 24 bits), then re-slices those same 24 bits into four groups of 6 bits. Each 6-bit group becomes one character from the alphabet. Three bytes in, four characters out — every single time.

When the data doesn't divide neatly into groups of three, Base64 pads the end with one or two = signs. The equals sign isn't part of the data; it's a little note that says "the last group was short — ignore the filler." That's why so many Base64 strings end in = or ==.

Why Base64 is about 33% bigger

This 3-bytes-to-4-characters ratio has an unavoidable consequence: the output is always larger than the input. Four characters to represent three bytes is a 4∶3 ratio, which works out to roughly 33% bigger. Encode a 600 KB image and you'll get about 800 KB of text.

That overhead is the price of safety. You're trading size for the guarantee that the data will pass cleanly through systems that only trust text. For an email attachment or a tiny icon embedded in a page, that trade is usually worth it. For a huge file, it's a real cost — which is one reason you embed small images as Base64 data URIs but still link to large ones as separate files. If you ever want to see the exact inflation for your own data, a Base64 encoder/decoder will show the before-and-after sizes side by side.

The myth that won't die: "Base64 is encryption"

If you remember one thing, make it this: Base64 is not encryption, and it never was.

This is comfortably the most common misconception in the field, and it has caused real security incidents. Because Base64 output looks scrambled and unreadable to a human, people assume it's hiding something. It isn't. Base64 is encoding — a reversible, public transformation with no key and no secret. Anyone, anywhere, can decode a Base64 string back to the original in milliseconds. There's no password involved because there was never any protection involved.

Encryption is about secrecy: turning data into something only the holder of a key can read. Encoding is about format: turning data into a shape that some channel can carry. They solve completely different problems. Base64-ing a password before storing or sending it provides exactly zero protection — it's the digital equivalent of writing your PIN backwards and calling it a code. Every developer eventually learns this, hopefully not the hard way.

Where you bump into Base64 every day

Once you know what to look for, Base64 is everywhere:

  • Data URIs. That data:...;base64,... string lets you embed an image, font, or small file directly inside HTML or CSS, saving a separate network request.
  • Email attachments. Every photo you've ever emailed was Base64-encoded by MIME on the way out and decoded on the way in. You just never saw it.
  • JSON Web Tokens (JWT). The familiar xxxxx.yyyyy.zzzzz token is three Base64url segments — a header and payload anyone can decode, plus a signature that actually provides the security.
  • HTTP Basic authentication. When a browser sends credentials, it Base64-encodes username:password. The official spec even uses a charming example: the username "Aladdin" and password "open sesame" encode to QWxhZGRpbjpvcGVuIHNlc2FtZQ==. Decode that and the secret falls right out — which is precisely why Basic auth must only ever be used over HTTPS.
  • Certificates and keys. The -----BEGIN CERTIFICATE----- blocks you see in PEM files are Base64-encoded binary.

URL-safe Base64: a small but important variant

There's a wrinkle. Two characters in the standard alphabet — + and / — already mean something special in URLs and filenames. A / looks like a path separator; a + can be read as a space in query strings. Drop a normal Base64 string into a URL and it can quietly break.

So a variant called base64url was standardized: it keeps everything the same but swaps + for - and / for _, and it often omits the = padding too (since = is also awkward in URLs). It's the same idea, lightly disguised for travel. JWTs use it, and any time you put encoded data in a link or filename, it's the version you want. Good Base64 tools let you flip between the two with a single toggle.

A little history

Base64 didn't appear fully formed. Unix systems had uuencode in the early days for shoving binary through text-only links. The secure-email project Privacy-Enhanced Mail (PEM) used a similar Base64 scheme in the late 1980s — which is why certificate files still carry the "PEM" name today.

The version we all use was nailed down by MIME (Multipurpose Internet Mail Extensions) in 1992, the standard that finally taught email how to carry images, audio, and any file type. Base64 was MIME's workhorse for binary content. Years later, RFC 4648 tidied the whole family into a single reference, defining standard Base64, the URL-safe variant, and relatives like Base32 and Base16 (which is just hexadecimal). That document is why implementations across every programming language agree precisely on the rules — the same quiet, boring consistency that makes Base64 so dependable.

A few quirks worth knowing

Base64 is simple, but it has sharp edges that trip people up. The most infamous lives right in the browser. JavaScript ships with built-in btoa() and atob() functions for Base64, but they only understand Latin-1 — feed them an emoji or a Chinese character and they throw an error or quietly corrupt the data. The fix is to convert the text to UTF-8 bytes first and then Base64-encode those bytes, which is exactly what a properly built encoder does behind the scenes. It's the reason "just use btoa" advice fails the moment real-world text shows up.

Then there's whitespace. Classic MIME email actually inserts a line break every 76 characters so the encoded blob doesn't become one impossibly long line. Decoders are therefore expected to ignore newlines and spaces — which is why you can usually paste a messy, wrapped Base64 string and still get a clean result.

Base64 also has a family. The same RFC defines Base32, which uses 32 case-insensitive characters (handy when data might be read aloud or typed by hand, like some 2FA secrets), and Base16, which is just plain hexadecimal. Base64 is the densest of the three that still sticks to safe printable characters, which is why it became the default for moving binary around.

Boring on purpose

Base64 will never be glamorous. It doesn't protect anything, it makes data bigger, and its output is unreadable to humans. But that's exactly the point. It does one humble job — make arbitrary bytes survive a text-only world — and it does that job identically everywhere, decade after decade, without drama.

So the next time you spot a long ribbon of letters and digits ending in ==, you'll know it's not encryption and it's not magic. It's just three bytes becoming four characters, over and over, quietly carrying a picture or a token or a certificate across an internet that, deep down, still prefers plain text.

#developer#web#encoding
Gaurav SinghWritten byGaurav SinghView profile →

More from the blog

Your BMI Was Invented by an Astronomer Who Never Meant It for Your Body

Body Mass Index runs modern medicine — but it began as a 19th-century population statistic by a Belgian stargazer, was renamed by a heart researcher in 1972, and reclassified millions overnight in 1998. Here's the strange, true story.

8 min read

Who Invented the Pomodoro Technique? The Tomato and the Science of Focus

The world's most famous focus method is named after a tomato-shaped kitchen timer. Here's the real story of the Pomodoro Technique — why 25 minutes, Parkinson's Law, the 23-minute cost of interruption

8 min read

Who Invented the To-Do List? From Da Vinci to the $25,000 Productivity Tip

A consultant once gave a steel tycoon a to-do list method and got paid $25,000 for it. From Leonardo da Vinci's notebooks to the Eisenhower Matrix and GTD, here's the surprising history — and science

8 min read