Lesson 2 - Software Testing Fundamentals
On this page
- Welcome to Software Testing Fundamentals
- Why Automated Tests Matter
- The Testing Pyramid: Three Levels, Three Trade-offs
- Writing Unit Tests for Ledgerly’s Pricing Functions
- Test Doubles: Faking the Payment Gateway
- One Broad End-to-End Test vs. Several Unit Tests
- What Test Coverage Means, and Where It Falls Short
- Practice Exercises
- Summary
- Next Steps
- Continue Building Your Skills
Welcome to Software Testing Fundamentals
In the last lesson, Ledgerly’s team rewrote a tangled pricing function into five small, well-named functions: calculate_subtotal, apply_tier_discount, apply_loyalty_discount, add_tax, and add_late_fee. Clean code answers one question: can a person read this and understand it? Testing answers a different question: does this code keep doing the right thing, today and after every future change?
Without automated tests, the only way to check calculate_invoice_total still works is to run Ledgerly by hand, create a fake customer, add line items, and read the number off the screen. That takes minutes, and a developer has to remember to do it every time they touch pricing code. This lesson replaces that manual check with automated tests that run in a fraction of a second, so Ledgerly’s three-person team can change pricing logic without fear of quietly breaking a customer’s bill.
By the end of this lesson, you will be able to:
- Explain why automated tests catch regressions that manual testing misses
- Describe the testing pyramid and the speed-versus-realism trade-off at each level
- Write pytest unit tests for a pricing function using the Arrange-Act-Assert pattern
- Replace a real payment gateway with a fake test double so tests run without a network call
- Explain what a code coverage percentage does and does not tell you about test quality
Why Automated Tests Matter
An automated test is a small program that runs another piece of code and checks the result against an expected value, without a person watching the screen. Once written, it can run thousands of times, on every developer’s machine and before every deploy, at no ongoing cost beyond a few milliseconds of computer time.
Manual testing does not scale the same way. If Ledgerly has ten pricing rules and a developer checks each one by hand before every release, that is ten minutes of repetitive, error-prone work, every single time. A developer under deadline pressure is far more likely to skip that check than a computer is to skip running a test file.
Automated tests also catch a specific kind of bug that manual testing is bad at: a regression, meaning old behavior that used to work but silently breaks because of an unrelated change elsewhere in the code. A test for add_late_fee will fail the moment someone changes the late fee rate by accident, even if that person was only trying to fix a typo in an unrelated function.
A test is a specification that runs
A well-written test does two jobs at once. It checks that the code behaves correctly right now, and it documents, in runnable code, exactly what “correctly” means for that function. A new developer reading test_add_late_fee_when_overdue learns the late fee rule faster than they would from a paragraph of prose, because the test shows the real input and the real expected output side by side.
The Testing Pyramid: Three Levels, Three Trade-offs
Not every test needs to exercise the whole system, and not every test should be limited to one function. The testing pyramid is a guideline for how many tests to write at each of three levels, based on how fast each level runs and how much of the system it touches.
Unit tests check one function or one class in isolation, with every dependency faked or removed. A unit test for apply_tier_discount needs only a subtotal and a tier name; it never touches a database, a network, or another function’s internals. Unit tests run in milliseconds, so a project can have hundreds of them without slowing anyone down.
Integration tests check that two or more real components work together correctly, such as BillingService talking to an actual database instead of a fake one. They run slower, seconds rather than milliseconds, because they do real work like opening a database connection. A project needs fewer of these than unit tests, aimed at the seams between components rather than the internal logic each component already covers on its own.
End-to-end tests check a complete user workflow through the real system, the way an actual customer would experience it: sign up, create an invoice, pay it. They run the slowest, often taking whole seconds or more per test, because they exercise every layer at once. A project should have only a handful of these, reserved for the workflows that matter most.
The pyramid shape reflects a simple trade-off: tests lower down are faster and more stable, so a project wants many of them; tests higher up are slower and more realistic, so a project wants only as many as it needs to trust the whole system works together.
Writing Unit Tests for Ledgerly’s Pricing Functions
Ledgerly’s team installs pytest, the most widely used Python testing framework, then writes one test file per module. Each test follows the Arrange-Act-Assert pattern: arrange the inputs, act by calling the function, and assert the result matches what is expected.
Here are the five pricing functions from Lesson 1, unchanged, saved in pricing.py.
# pricing.py
TIER_DISCOUNTS = {"gold": 0.15, "silver": 0.05, "bronze": 0.0}
TAX_RATE = 0.08
LOYALTY_DISCOUNT = 0.02
LATE_FEE_RATE = 0.05
LATE_FEE_THRESHOLD_DAYS = 30
def calculate_subtotal(line_items):
"""Sum price times quantity across every line item."""
return sum(item["price"] * item["quantity"] for item in line_items)
def apply_tier_discount(subtotal, customer_tier):
"""Reduce the subtotal based on the customer's plan tier."""
discount_rate = TIER_DISCOUNTS.get(customer_tier, 0.0)
return subtotal * (1 - discount_rate)
def apply_loyalty_discount(amount, is_loyalty_member):
"""Apply an extra discount for enrolled loyalty members."""
if is_loyalty_member:
return amount * (1 - LOYALTY_DISCOUNT)
return amount
def add_tax(amount):
"""Add sales tax on top of the discounted amount."""
return amount * (1 + TAX_RATE)
def add_late_fee(amount, days_overdue):
"""Add a late fee only when the invoice is overdue past the threshold."""
if days_overdue > LATE_FEE_THRESHOLD_DAYS:
return amount * (1 + LATE_FEE_RATE)
return amount
def calculate_invoice_total(line_items, customer_tier, is_loyalty_member, days_overdue):
"""Orchestrate every pricing step to produce one final invoice total."""
subtotal = calculate_subtotal(line_items)
discounted = apply_tier_discount(subtotal, customer_tier)
discounted = apply_loyalty_discount(discounted, is_loyalty_member)
with_tax = add_tax(discounted)
final_total = add_late_fee(with_tax, days_overdue)
return round(final_total, 2)Now the test file. Each test arranges a small, specific input, calls exactly one function, and asserts one outcome. pytest.approx handles floating-point comparisons, since 0.1 + 0.2 in Python does not equal exactly 0.3.
# test_pricing.py
import pytest
from pricing import (
calculate_subtotal,
apply_tier_discount,
apply_loyalty_discount,
add_tax,
add_late_fee,
calculate_invoice_total,
)
def test_calculate_subtotal_sums_price_times_quantity():
line_items = [{"price": 40, "quantity": 3}, {"price": 15, "quantity": 2}]
assert calculate_subtotal(line_items) == 150
def test_apply_tier_discount_for_gold_customer():
assert apply_tier_discount(150, "gold") == pytest.approx(127.5)
def test_apply_tier_discount_for_unknown_tier_applies_no_discount():
assert apply_tier_discount(150, "platinum") == 150
def test_apply_loyalty_discount_for_member():
assert apply_loyalty_discount(100, is_loyalty_member=True) == pytest.approx(98.0)
def test_apply_loyalty_discount_for_non_member():
assert apply_loyalty_discount(100, is_loyalty_member=False) == 100
def test_add_tax_adds_eight_percent():
assert add_tax(100) == pytest.approx(108.0)
def test_add_late_fee_when_overdue():
assert add_late_fee(100, days_overdue=45) == pytest.approx(105.0)
def test_add_late_fee_when_not_overdue():
assert add_late_fee(100, days_overdue=10) == 100
def test_calculate_invoice_total_matches_full_pipeline():
line_items = [{"price": 40, "quantity": 3}, {"price": 15, "quantity": 2}]
total = calculate_invoice_total(line_items, "gold", True, 45)
assert total == 141.69Running pytest -v on this file produces the following, run for real against the code above:
test_pricing.py::test_calculate_subtotal_sums_price_times_quantity PASSED
test_pricing.py::test_apply_tier_discount_for_gold_customer PASSED
test_pricing.py::test_apply_tier_discount_for_unknown_tier_applies_no_discount PASSED
test_pricing.py::test_apply_loyalty_discount_for_member PASSED
test_pricing.py::test_apply_loyalty_discount_for_non_member PASSED
test_pricing.py::test_add_tax_adds_eight_percent PASSED
test_pricing.py::test_add_late_fee_when_overdue PASSED
test_pricing.py::test_add_late_fee_when_not_overdue PASSED
test_pricing.py::test_calculate_invoice_total_matches_full_pipeline PASSED
9 passed in 0.02sNine tests ran in two hundredths of a second. Testing each small function separately also means a failure points at the exact broken piece: if test_add_late_fee_when_overdue fails, the bug is in add_late_fee, not somewhere in a 60-line function that also handles discounts and tax.
Test Doubles: Faking the Payment Gateway
BillingService needs to charge a customer’s card once an invoice total is calculated. A real payment gateway sends a request to an external service over the network, which is slow, costs money per test run, and can fail for reasons that have nothing to do with Ledgerly’s code, such as the payment provider being briefly unreachable.
A test double is a stand-in object used only during testing, built to behave enough like the real dependency that the code under test cannot tell the difference. This lesson uses a fake: an object with a real, working implementation, just a simplified one with no actual network call. A fake differs from a stub, which only returns pre-set canned answers, and from a mock, which additionally records and verifies exactly how it was called.
# payment_gateway.py
from pricing import calculate_invoice_total
class PaymentGateway:
"""Defines what any payment gateway must do: charge a customer's card."""
def charge(self, amount_cents, customer_token):
raise NotImplementedError
class FakePaymentGateway(PaymentGateway):
"""A test double standing in for a real gateway. No network call happens here."""
def __init__(self):
self.charges = []
def charge(self, amount_cents, customer_token):
charge_id = f"fake_ch_{len(self.charges)}"
self.charges.append((charge_id, amount_cents, customer_token))
return {"charge_id": charge_id, "status": "succeeded"}
class BillingService:
"""Calculates an invoice total, then collects payment through a gateway."""
def __init__(self, gateway):
self.gateway = gateway
def collect_payment(
self, line_items, customer_tier, is_loyalty_member, days_overdue, customer_token
):
total = calculate_invoice_total(
line_items, customer_tier, is_loyalty_member, days_overdue
)
amount_cents = round(total * 100)
return self.gateway.charge(amount_cents, customer_token)BillingService only calls self.gateway.charge(...). It never imports a real payment provider directly, so any object with a matching charge method works, real or fake. The test below builds a BillingService with the fake gateway, then checks both the returned result and the exact amount the fake gateway recorded.
# test_billing.py
from payment_gateway import BillingService, FakePaymentGateway
def test_collect_payment_charges_the_calculated_total():
gateway = FakePaymentGateway()
billing = BillingService(gateway)
line_items = [{"price": 40, "quantity": 3}, {"price": 15, "quantity": 2}]
result = billing.collect_payment(line_items, "gold", True, 45, "cust_42")
assert result["status"] == "succeeded"
assert len(gateway.charges) == 1
charge_id, amount_cents, token = gateway.charges[0]
assert amount_cents == 14169
assert token == "cust_42"Run for real:
test_billing.py::test_collect_payment_charges_the_calculated_total PASSED
1 passed in 0.02sThe total, 141.69, matches the pipeline test from the previous section, converted to 14169 cents. No network request happened, no real payment provider account was needed, and the test still verified that BillingService computed the right amount and passed it to the gateway correctly.
One Broad End-to-End Test vs. Several Unit Tests
To see the testing pyramid’s trade-off directly, compare the nine pricing unit tests and one billing unit test above against a single end-to-end test that simulates a full customer flow: sign up, create an invoice, charge the card, each step standing in for a real network call.
# e2e_flow.py
import time
def run_signup_and_first_invoice_flow():
"""Simulates a full browser-driven flow: sign up, create an invoice, charge a card."""
time.sleep(0.2) # stand-in for a real signup request over the network
time.sleep(0.2) # stand-in for generating and saving the invoice
time.sleep(0.2) # stand-in for charging the payment gateway
return {"signed_up": True, "invoice_created": True, "charged": True}# test_e2e_flow.py
from e2e_flow import run_signup_and_first_invoice_flow
def test_new_customer_can_sign_up_and_pay_first_invoice():
"""One broad test covering signup, invoice creation, and payment together."""
result = run_signup_and_first_invoice_flow()
assert result["signed_up"] is True
assert result["invoice_created"] is True
assert result["charged"] is TrueRunning all eleven tests together, with --durations=0 to show how long each one took, produces this real output:
test_pricing.py::test_calculate_subtotal_sums_price_times_quantity PASSED
test_pricing.py::test_apply_tier_discount_for_gold_customer PASSED
test_pricing.py::test_apply_tier_discount_for_unknown_tier_applies_no_discount PASSED
test_pricing.py::test_apply_loyalty_discount_for_member PASSED
test_pricing.py::test_apply_loyalty_discount_for_non_member PASSED
test_pricing.py::test_add_tax_adds_eight_percent PASSED
test_pricing.py::test_add_late_fee_when_overdue PASSED
test_pricing.py::test_add_late_fee_when_not_overdue PASSED
test_pricing.py::test_calculate_invoice_total_matches_full_pipeline PASSED
test_billing.py::test_collect_payment_charges_the_calculated_total PASSED
test_e2e_flow.py::test_new_customer_can_sign_up_and_pay_first_invoice PASSED
============================== slowest durations ===============================
0.61s call test_e2e_flow.py::test_new_customer_can_sign_up_and_pay_first_invoice
(32 durations < 0.005s hidden. Use -vv to show these durations.)
11 passed in 0.64sOne end-to-end test took 0.61 seconds. All ten unit tests combined took less than 0.03 seconds, each one individually too fast to even show in the default report. This is not a special case; it is what the testing pyramid predicts. The end-to-end test is also more brittle: if run_signup_and_first_invoice_flow fails, the assertion only says the flow did not complete, without pointing at whether signup, invoice creation, or the charge step was the actual cause. Each of the smaller unit tests, by contrast, fails with the name of the exact function that broke.
None of this means end-to-end tests are worthless. Ledgerly still keeps one or two, to catch problems that only appear when real components are wired together. It means a whole test suite built only out of end-to-end tests would be slow to run and slow to debug, which is exactly why the pyramid keeps them few and keeps unit tests plentiful.
What Test Coverage Means, and Where It Falls Short
Test coverage is a percentage showing how many lines of a file actually ran while the test suite executed. Running pytest with the --cov option against pricing.py and the nine tests from earlier produces this real report:
Name Stmts Miss Cover Missing
------------------------------------------
pricing.py 27 0 100%
------------------------------------------
TOTAL 27 0 100%
9 passed in 0.04sEvery line in pricing.py executed during the test run, so coverage reports a clean 100%. That number answers only one question: did each line run at least once? It says nothing about whether every meaningful input was tried. Here is a real example of what 100% coverage misses.
from pricing import calculate_subtotal
bad_line_items = [{"price": 40, "quantity": -3}]
print(calculate_subtotal(bad_line_items))-120calculate_subtotal already has 100% line coverage from the tests above, because its one line of logic ran during every test. None of those tests, however, ever passed a negative quantity, so nothing caught that calculate_subtotal happily returns a negative subtotal for it, a value that should never reach a real invoice. Coverage tells you a line executed; it does not tell you every input worth testing was tried against that line.
Ledgerly’s team treats coverage as a tool for finding untested lines, not as a target to chase. A coverage report that flags add_late_fee as untested is a useful prompt to write a test for it. A team that adds a test with no real assertion just to push a percentage number higher gains nothing, since a passing test with no meaningful check catches no bugs at all.
Practice Exercises
Exercise 1: Write a unit test for a new discount rule
Ledgerly adds a new tier, "platinum", to TIER_DISCOUNTS with a 20% discount. Write a unit test, following the Arrange-Act-Assert pattern used in this lesson, that checks apply_tier_discount(200, "platinum") returns the correct discounted amount.
Hint
Arrange a subtotal of 200 and a tier of "platinum". Act by calling apply_tier_discount(200, "platinum"). Assert the result equals pytest.approx(160.0), since a 20% discount on 200 leaves 160. The test looks almost identical to test_apply_tier_discount_for_gold_customer, just with different numbers, which is exactly what a good unit test should look like.
Exercise 2: Add a second fake gateway behavior
Extend FakePaymentGateway with a mode where it simulates a declined card: if customer_token equals "cust_declined", charge() should return {"charge_id": None, "status": "declined"} instead of succeeding. Then write a test using this fake to check that BillingService.collect_payment correctly surfaces "status": "declined" for that customer.
Hint
Add an if customer_token == "cust_declined": return {"charge_id": None, "status": "declined"} check at the top of charge(), before the normal success path. The test then calls billing.collect_payment(line_items, "gold", True, 45, "cust_declined") and asserts result["status"] == "declined". This lets you test a failure path without ever touching a real payment network, which is exactly why fakes are useful for testing error handling.
Exercise 3: Decide where a new test belongs in the pyramid
A teammate wants to add one test that creates a real customer in Ledgerly’s actual database, generates a real invoice, and charges a real test-mode Stripe account, all in one test function. Where does this test belong in the testing pyramid, and what trade-off should the team accept by adding it?
Hint
This is an end-to-end test, sitting at the top of the pyramid, since it exercises the customer, invoice, and payment layers together through the real system. The team should expect it to run much slower than the unit tests in this lesson, likely a full second or more like test_new_customer_can_sign_up_and_pay_first_invoice, and to be more brittle, since a failure could come from any of the three layers. The right move is to keep this as one of only a few such tests, while the individual pieces, like pricing and gateway charging, stay covered by fast unit tests.
Summary
Automated tests replace slow, error-prone manual checks with a program that verifies behavior in milliseconds and can run before every change. The testing pyramid guides how many tests to write at each level: many fast unit tests like the ones written for calculate_subtotal, apply_tier_discount, and the rest of Ledgerly’s pricing functions, some slower integration tests for real component interactions, and only a few end-to-end tests for complete workflows. This lesson measured that trade-off directly: ten unit tests ran in under three hundredths of a second combined, while one simulated end-to-end signup-to-payment flow took over half a second on its own. A FakePaymentGateway test double let BillingService be tested without any real network call, returning controlled, predictable responses instead. Finally, test coverage measures which lines ran, not which inputs were tried, so a function can reach 100% coverage and still hide a bug, like calculate_subtotal silently accepting a negative quantity.
Key Concepts
- Automated test — a program that runs code and checks its result against an expected value, without manual verification.
- Testing pyramid — many fast, isolated unit tests, some integration tests, and few slow end-to-end tests.
- Arrange-Act-Assert — the standard unit test structure: set up inputs, call the code, check the result.
- Test double — a stand-in object, such as a fake, stub, or mock, used in place of a real dependency during a test.
- Test coverage — the percentage of code lines executed during a test run; it measures what ran, not what was verified.
Why This Matters
Ledgerly’s pricing logic touches real money, and a silent regression in add_late_fee or apply_tier_discount could overcharge or undercharge a real customer without anyone noticing until a complaint arrives. The tests in this lesson turn that risk into something checked automatically, in a fraction of a second, every time the code changes. The FakePaymentGateway test double means Ledgerly’s three-person team can test billing logic hundreds of times a day without ever touching a real payment account. Understanding the testing pyramid keeps that test suite fast as it grows, so testing remains something the team does constantly, not something they dread and skip.
Next Steps
Lesson 3: Behaviour-Driven Development
Learn how BDD turns plain-language examples into executable specifications that both developers and non-technical stakeholders can read.
Back to Module Overview
Return to the Writing Quality, Tested Code module overview
Continue Building Your Skills
You can now write pytest unit tests using Arrange-Act-Assert, replace a real dependency like a payment gateway with a fake test double, and explain why the testing pyramid favors many fast unit tests over a few slow end-to-end ones. The next lesson builds on this foundation with behaviour-driven development, writing tests as plain-language examples that describe Ledgerly’s features in a form both developers and non-technical teammates can read and agree on.