Lesson 10 - Working with Files

Reading and Writing Data

In real-world data analytics, you’ll constantly work with data stored in files—CSV files from databases, text files with logs, JSON files from APIs, and more. Being able to read data from files and save your analysis results is an essential skill.

Up until now, we’ve been creating data directly in our Python code using lists and dictionaries. But imagine having to manually type out thousands of book records! That’s not practical. Instead, we read the data from files.

By the end of this lesson, you’ll be able to:

  • Open and read files in Python
  • Write data to files
  • Work with CSV files using the csv module
  • Use the with statement for safe file handling
  • Handle file paths correctly
  • Understand different file modes

Let’s start by learning how to read a simple text file.


Opening and Reading Files

Python provides a built-in open() function to work with files. Here’s the basic syntax:

file = open('filename.txt', 'r')
content = file.read()
print(content)
file.close()

Let’s break this down:

  • open('filename.txt', 'r') opens the file in read mode ('r')
  • file.read() reads the entire content of the file as a string
  • file.close() closes the file when we’re done

Important: Always close files after opening them to free up system resources.

Let’s say we have a file called book_titles.txt with this content:

Whispering Leaves
Berry Tales
Moon Over Pine
Desert Echoes
Golden Harvest

We can read it like this:

file = open('book_titles.txt', 'r')
content = file.read()
print(content)
file.close()

Output:

Whispering Leaves
Berry Tales
Moon Over Pine
Desert Echoes
Golden Harvest

Reading Files Line by Line

Sometimes you don’t want to read the entire file at once. You can read it line by line:

file = open('book_titles.txt', 'r')

for line in file:
    print(line)

file.close()

Output:

Whispering Leaves

Berry Tales

Moon Over Pine

Desert Echoes

Golden Harvest

Notice the extra blank lines? That’s because each line in the file ends with a newline character (\n), and print() adds another one. We can fix this using strip():

file = open('book_titles.txt', 'r')

for line in file:
    print(line.strip())

file.close()

Output:

Whispering Leaves
Berry Tales
Moon Over Pine
Desert Echoes
Golden Harvest

You can also read all lines into a list:

file = open('book_titles.txt', 'r')
lines = file.readlines()
file.close()

print(lines)

Output:

['Whispering Leaves\n', 'Berry Tales\n', 'Moon Over Pine\n', 'Desert Echoes\n', 'Golden Harvest\n']

Exercise: Create a text file with three book titles. Write code to read the file and print each title in uppercase.


The with Statement

There’s a better way to work with files that automatically closes them for you—the with statement:

with open('book_titles.txt', 'r') as file:
    content = file.read()
    print(content)

# File is automatically closed here

The advantages of using with:

  • The file is automatically closed when the block ends
  • The file is closed even if an error occurs
  • Your code is cleaner and more readable

This is the preferred way to work with files in Python.

Here’s how to read line by line with with:

with open('book_titles.txt', 'r') as file:
    for line in file:
        print(line.strip())

From now on, we’ll use the with statement for all file operations.


Writing to Files

To write data to a file, open it in write mode ('w'):

titles = ['Whispering Leaves', 'Berry Tales', 'Moon Over Pine']

with open('new_books.txt', 'w') as file:
    for title in titles:
        file.write(title + '\n')

This creates a file called new_books.txt with:

Whispering Leaves
Berry Tales
Moon Over Pine

Important notes about write mode:

  • If the file doesn’t exist, it will be created
  • If the file already exists, its contents will be completely replaced
  • You must add \n yourself if you want newlines

If you want to add content to an existing file without erasing it, use append mode ('a'):

with open('new_books.txt', 'a') as file:
    file.write('Desert Echoes\n')
    file.write('Golden Harvest\n')

Now new_books.txt contains:

Whispering Leaves
Berry Tales
Moon Over Pine
Desert Echoes
Golden Harvest

Exercise: Write a program that creates a file called ratings.txt and writes five ratings (like 4.5, 4.2, etc.) to it, one per line.


File Modes Summary

ModeDescriptionCreates if doesn’t exist?Overwrites existing?
'r'Read onlyNo (raises error)No
'w'Write onlyYesYes
'a'Append onlyYesNo (adds to end)
'r+'Read and writeNo (raises error)No
'w+'Read and writeYesYes
'a+'Read and appendYesNo (adds to end)

For most data analytics work, you’ll primarily use 'r' for reading and 'w' or 'a' for writing.


Working with CSV Files

CSV (Comma-Separated Values) files are one of the most common formats for storing tabular data. Python’s csv module makes it easy to work with them.

Let’s say we have a file called books.csv:

title,price,rating
Whispering Leaves,12.99,4.2
Berry Tales,0.00,4.5
Moon Over Pine,7.50,4.6
Desert Echoes,15.00,4.4
Golden Harvest,9.99,4.3

Here’s how to read it:

import csv

with open('books.csv', 'r') as file:
    reader = csv.reader(file)

    for row in reader:
        print(row)

Output:

['title', 'price', 'rating']
['Whispering Leaves', '12.99', '4.2']
['Berry Tales', '0.00', '4.5']
['Moon Over Pine', '7.50', '4.6']
['Desert Echoes', '15.00', '4.4']
['Golden Harvest', '9.99', '4.3']

Each row is read as a list of strings. Notice that the first row is the header.

Here’s how to skip the header and work with the data:

import csv

with open('books.csv', 'r') as file:
    reader = csv.reader(file)
    next(reader)  # Skip the header row

    for row in reader:
        title = row[0]
        price = float(row[1])
        rating = float(row[2])
        print(f"{title}: ${price:.2f} - {rating} stars")

Output:

Whispering Leaves: $12.99 - 4.2 stars
Berry Tales: $0.00 - 4.5 stars
Moon Over Pine: $7.50 - 4.6 stars
Desert Echoes: $15.00 - 4.4 stars
Golden Harvest: $9.99 - 4.3 stars

Or you can read all rows at once:

import csv

with open('books.csv', 'r') as file:
    reader = csv.reader(file)
    data = list(reader)

header = data[0]
books = data[1:]

print(header)
print(books[0])

Output:

['title', 'price', 'rating']
['Whispering Leaves', '12.99', '4.2']

Exercise: Using the books.csv file, write code to calculate the average rating of all books.


Writing CSV Files

You can also write data to CSV files:

import csv

books = [
    ['title', 'price', 'rating'],
    ['Whispering Leaves', 12.99, 4.2],
    ['Berry Tales', 0.00, 4.5],
    ['Moon Over Pine', 7.50, 4.6]
]

with open('output_books.csv', 'w', newline='') as file:
    writer = csv.writer(file)

    for row in books:
        writer.writerow(row)

The newline='' parameter prevents extra blank lines on Windows systems.

You can also write all rows at once:

import csv

books = [
    ['title', 'price', 'rating'],
    ['Whispering Leaves', 12.99, 4.2],
    ['Berry Tales', 0.00, 4.5]
]

with open('output_books.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(books)  # Note: writerows (plural)

Exercise: Create a list of books with their information, then write it to a CSV file called my_books.csv.


Working with File Paths

So far, we’ve been working with files in the same directory as our Python script. But often, files are in different locations.

Relative paths:

# Same directory
with open('books.csv', 'r') as file:
    pass

# Subdirectory
with open('data/books.csv', 'r') as file:
    pass

# Parent directory
with open('../books.csv', 'r') as file:
    pass

Absolute paths:

# macOS/Linux
with open('/Users/username/projects/data/books.csv', 'r') as file:
    pass

# Windows
with open('C:\\Users\\username\\projects\\data\\books.csv', 'r') as file:
    pass

For cross-platform compatibility, use the os module:

import os

file_path = os.path.join('data', 'books.csv')
with open(file_path, 'r') as file:
    pass

Practical Example: Analyzing Book Sales

Let’s create a complete program that reads book data, performs analysis, and writes results to a new file:

import csv

# Read the data
with open('books.csv', 'r') as file:
    reader = csv.reader(file)
    next(reader)  # Skip header

    free_books = []
    paid_books = []

    for row in reader:
        title = row[0]
        price = float(row[1])
        rating = float(row[2])

        if price == 0.0:
            free_books.append([title, rating])
        else:
            paid_books.append([title, price, rating])

# Calculate statistics
if free_books:
    avg_free_rating = sum(book[1] for book in free_books) / len(free_books)
else:
    avg_free_rating = 0

if paid_books:
    avg_paid_rating = sum(book[2] for book in paid_books) / len(paid_books)
    avg_price = sum(book[1] for book in paid_books) / len(paid_books)
else:
    avg_paid_rating = 0
    avg_price = 0

# Write results to a file
with open('analysis_results.txt', 'w') as file:
    file.write('Book Analysis Results\n')
    file.write('=' * 50 + '\n\n')
    file.write(f'Total free books: {len(free_books)}\n')
    file.write(f'Average rating (free): {avg_free_rating:.2f}\n\n')
    file.write(f'Total paid books: {len(paid_books)}\n')
    file.write(f'Average price (paid): ${avg_price:.2f}\n')
    file.write(f'Average rating (paid): {avg_paid_rating:.2f}\n\n')
    file.write('Top rated free books:\n')

    # Sort free books by rating
    free_books_sorted = sorted(free_books, key=lambda x: x[1], reverse=True)
    for book in free_books_sorted[:3]:  # Top 3
        file.write(f'  - {book[0]}: {book[1]} stars\n')

print("Analysis complete! Results saved to analysis_results.txt")

This program:

  1. Reads book data from a CSV file
  2. Separates free and paid books
  3. Calculates statistics
  4. Writes a formatted report to a text file

Common File Errors

When working with files, you might encounter these errors:

FileNotFoundError:

# File doesn't exist
with open('nonexistent.csv', 'r') as file:
    pass
# FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent.csv'

PermissionError:

# Don't have permission to write
with open('/system/protected.txt', 'w') as file:
    pass
# PermissionError: [Errno 13] Permission denied

IsADirectoryError:

# Trying to open a directory as a file
with open('data/', 'r') as file:
    pass
# IsADirectoryError: [Errno 21] Is a directory: 'data/'

In the next lesson, we’ll learn how to handle these errors gracefully using exception handling.


Looking Ahead

You’ve now learned how to work with files in Python—reading data, writing results, and working with CSV files. This is a crucial skill for any data analytics project.

In the next lesson, you’ll learn about exception handling, which will help you write more robust code that can handle file errors and other unexpected situations gracefully.

Exercise: Create a complete data processing program that:

  1. Reads a CSV file with book data
  2. Filters books with ratings above 4.0
  3. Writes the high-rated books to a new CSV file
  4. Creates a summary text file with statistics about the filtered books