Reading different Character Sets

A cartoon owl shrugs at a table with three letters: one in English, one in Arabic, and one in Chinese. Above it, bold text reads “CHARACTER ENCODINGS.”

Character Encoding

Files aren’t just letters, they’re bytes plus an encoding. The safe default is UTF-8. If you use the wrong encoding, you’ll see strange characters or get a UnicodeDecodeError.

It’s a bit like receiving a letter written in French but trying to read it as if it’s Spanish. Some words look fine, others look like gibberish, and a few make no sense at all.

So why are there different encodings in the first place? The short answer is history. Computers were first built with limited memory and could only handle simple alphabets, like English letters and numbers. As time went on, people needed to represent accented letters, then whole new alphabets such as Cyrillic, Arabic, and Chinese. Different regions invented their own encodings, which led to a confusing mixture. UTF-8 eventually became the standard because it can represent almost every character in the world while staying efficient for plain English text.

Try this:

  • Save a file with an emoji and open it as latin-1 to see how strange it looks.

Newlines on Different Systems

Windows uses \r\n, while Linux and macOS use \n. Python normally smooths this out, but if you need the original line endings, open with newline=””.

Think of it as accents in handwriting — different regions have slightly different ways of ending lines, but Python usually tidies them up for you.

Binary Files

Not everything is text. For images, PDFs, or audio, use binary mode (“rb”) and skip encodings:

Try this:

  • Open an image in binary mode and print the first 20 bytes — you’ll see what looks like random symbols.

Summary

The reading files tutorial showed how to open files safely, read them, and handle common errors.

This part introduced encodings (and why the wrong one can feel like decoding a mystery letter), newline quirks, and binary data.

Master these two parts, and you’ll be comfortable handling just about any file in Python.

Main Topic

Python Files and Directories

A cartoon owl with wide eyes stands in front of grey filing cabinets. One drawer is open, filled with folders, and the owl is holding a single document. The background is a warm orange tone, and the words “PYTHON FILES” appear in large bold text above the cabinets.

A light-hearted intro to handling files and directories in Python, featuring an owl mascot and fun “Python Files” imagery.

Other Tutorials in this Topic

A cartoon owl with wide eyes stands against an orange background, holding and reading a letter. Above it, bold text reads “READING FILES IN PYTHON.”

Reading Files in Python

This tutorial teaches how to safely open and read files in Python, explore different reading methods, and…

A cheerful cartoon owl writes in an open notebook with a quill pen beside a filing cabinet drawer labelled “w, a, x.” Above, bold text reads “Writing Files in Python.”

Writing to Files in Python

Part 1 introduces writing to files in Python using context managers, explains file modes w, a, x,…

A cartoon owl sits at a desk holding two papers. One shows “Café 😊” clearly, the other displays scrambled symbols. Behind, a chalkboard reads “Character Encoding.”

Writing different Character Sets

This tutorial explains character encoding, why encodings differ, handling Unicode errors, appending safely, and using safer update…

A cartoon owl takes a document from a filing cabinet and drops it into a bin. Above, bold text reads “File Operations.”

Renaming and Deleting Files in Python

This tutorial covers renaming, moving, and deleting files in Python using pathlib, with examples, common exceptions, and…

A cartoon owl stands in front of a bookshelf, holding an old book while placing a new one on the shelf. Above, bold text reads “File Operations.”

More Python File Operations

This tutorial explores safer file operations, including overwriting with os.replace, moving across drives, creating backups, soft deletes,…

A cartoon owl stands at a crossroads holding a map. Signposts point to Linux (/home/), Windows (C:\Users), and Reports. Above, bold text reads “Navigating Directories in Python.”

Navigating Directories in Python

This tutorial explains navigating directories in Python with pathlib, covering creating, listing, deleting folders, handling exceptions, and…