Lesson 7 - Boolean Filtering in Pandas

Powerful Data Filtering

You learned Boolean indexing in NumPy. Now you will apply these same concepts to DataFrames—filtering rows based on conditions using labeled columns.

By the end of this lesson, you will be able to:

Create Boolean masks on DataFrame columns
Filter rows based on single conditions
Combine multiple conditions with & (AND), | (OR), ~ (NOT)
Use .isin() for membership testing
Filter and select specific columns simultaneously
Count rows that meet conditions

Boolean filtering is the foundation of data analysis—selecting exactly the rows you need based on your criteria. Let’s master it.

Understanding Boolean Filtering

Boolean filtering works exactly like NumPy arrays:

Create a Boolean mask (True/False for each row)
Use the mask to filter the DataFrame

The difference: Instead of numeric positions, you reference columns by name.

Let’s create sample data:

import pandas as pd
import numpy as np

# Create companies dataset
companies = pd.DataFrame({
    'company': ['TechCorp Global', 'FreshMart Inc', 'AutoDrive Motors', 'FirstBank Holdings', 'PowerGen Energy',
                'MediPharm Solutions', 'RetailHub Ltd', 'SkyWings Airlines', 'SteelCore Industries', 'NetLink Telecom',
                'LogiTrans Services', 'SolarNova Energy', 'SecureLife Insurance'],
    'sector': ['Technology', 'Food', 'Automotive', 'Financials', 'Energy',
               'Healthcare', 'Retail', 'Transportation', 'Materials', 'Technology',
               'Transportation', 'Energy', 'Financials'],
    'revenues': [125000, 89000, 156000, 234000, 178000,
                 98000, 112000, 187000, 145000, 165000,
                 92000, 156000, 201000],
    'profits': [12000, 8500, -3000, 45000, 23000,
                15000, 9800, 21000, 18000, 28000,
                -5000, 31000, 38000],
    'employees': [1200, 890, 2300, 5600, 3400,
                  2100, 4500, 8900, 3200, 6700,
                  1800, 980, 4200],
    'country': ['USA', 'USA', 'USA', 'UK', 'Germany',
                'USA', 'UK', 'Germany', 'USA', 'UK',
                'Germany', 'UK', 'USA']
})

print(f"Shape: {companies.shape}")
companies.head()

Single Condition Filtering

Filter DataFrame rows based on one condition.

Creating a Boolean Mask

# Create Boolean mask: Which companies are from USA?
is_iran = companies['country'] == 'USA'

print("Boolean mask:")
print(is_iran)
print(f"\nType: {type(is_iran)}")

Output:

Boolean mask:
0      True
1      True
2      True
3     False
4     False
...
Name: country, dtype: bool

Type: <class 'pandas.core.series.Series'>

A Series of True/False values, one for each row.

Applying the Mask

# Apply the mask to filter DataFrame
iran_companies = companies[is_iran]

print(f"Iranian companies: {iran_companies.shape[0]}")
iran_companies

Only rows where the mask is True are kept.

Shorthand Syntax

# More common: combine creation and application
iran_companies = companies[companies['country'] == 'USA']

print("Iranian companies (shorthand):")
print(iran_companies[['company', 'country', 'revenues']])

This is the typical way to filter—create and apply the mask in one step.

More Examples

# Companies with revenues > $150,000
high_revenue = companies[companies['revenues'] > 150000]

print(f"High revenue companies: {high_revenue.shape[0]}")
high_revenue[['company', 'revenues']]

# Unprofitable companies (negative profits)
unprofitable = companies[companies['profits'] < 0]

print("Unprofitable companies:")
unprofitable[['company', 'profits']]

# Technology sector companies
tech_companies = companies[companies['sector'] == 'Technology']

print(f"Technology companies: {tech_companies.shape[0]}")
tech_companies[['company', 'sector']]

Counting Matches

# Count how many meet the condition
num_iran = (companies['country'] == 'USA').sum()
num_profitable = (companies['profits'] > 0).sum()

print(f"Iranian companies: {num_iran}")
print(f"Profitable companies: {num_profitable}")

Output:

Iranian companies: 6
Profitable companies: 11

True counts as 1, False counts as 0, so .sum() gives the count.

Multiple Conditions

Combine conditions using logical operators:

& - AND (both conditions must be True)
| - OR (at least one condition must be True)
~ - NOT (negate/reverse the condition)

Important Syntax Rule

Each condition must be in parentheses () when combining with &, |, or ~!

AND Operator (&)

# Iranian companies that are profitable
iran_profitable = companies[
    (companies['country'] == 'USA') &
    (companies['profits'] > 0)
]

print("Iranian AND profitable companies:")
iran_profitable[['company', 'country', 'profits']]

Both conditions must be True.

# Large profitable companies (revenues > $100k AND profits > $20k)
large_profitable = companies[
    (companies['revenues'] > 100000) &
    (companies['profits'] > 20000)
]

print("Large and highly profitable:")
large_profitable[['company', 'revenues', 'profits']]

OR Operator (|)

# Technology OR Energy companies
tech_or_energy = companies[
    (companies['sector'] == 'Technology') |
    (companies['sector'] == 'Energy')
]

print("Technology OR Energy companies:")
tech_or_energy[['company', 'sector']]

At least one condition must be True.

NOT Operator (~)

# Companies NOT from USA
not_iran = companies[~(companies['country'] == 'USA')]

print("Non-Iranian companies:")
not_iran[['company', 'country']]

The ~ reverses the Boolean mask.

Combining Multiple Operators

# Complex: UK companies in Technology or Energy with profits > $10k
uae_tech_energy_profitable = companies[
    (companies['country'] == 'UK') &
    ((companies['sector'] == 'Technology') | (companies['sector'] == 'Energy')) &
    (companies['profits'] > 10000)
]

print("UK tech/energy companies with profits > $10k:")
uae_tech_energy_profitable[['company', 'country', 'sector', 'profits']]

Notice the extra parentheses around the OR condition.

Common Mistake

# WRONG - Missing parentheses
# df[df['col1'] > 5 & df['col2'] < 10]  # ERROR!

# CORRECT - Each condition in parentheses
df[(df['col1'] > 5) & (df['col2'] < 10)]

Without parentheses, Python’s operator precedence causes errors.

Filtering with .loc[]

Combine row filtering with column selection using .loc[].

Filter Rows, Select Columns

# Iranian companies: show only company, revenues, and profits
iran_financials = companies.loc[
    companies['country'] == 'USA',
    ['company', 'revenues', 'profits']
]

print("Iranian companies financials:")
iran_financials

First argument: row filter (Boolean mask) Second argument: columns to select

# Technology companies: show sector, revenues, employees
tech_subset = companies.loc[
    companies['sector'] == 'Technology',
    ['sector', 'revenues', 'employees']
]

print("Technology companies subset:")
tech_subset

Multiple Conditions with Column Selection

# Profitable companies with > 2000 employees: show key metrics
large_profitable_subset = companies.loc[
    (companies['profits'] > 0) & (companies['employees'] > 2000),
    ['company', 'sector', 'profits', 'employees']
]

print("Large profitable companies - key metrics:")
large_profitable_subset

Helper Methods for Filtering

Pandas provides convenient methods for common filtering patterns.

.isin() for Multiple Values

# Companies from UK or Germany
gulf_countries = companies[companies['country'].isin(['UK', 'Germany'])]

print("Companies from UK or Germany:")
gulf_countries[['company', 'country']]

Much cleaner than:

# Equivalent but verbose
companies[(companies['country'] == 'UK') | (companies['country'] == 'Germany')]

NOT isin()

# Companies NOT in Technology or Retail
not_tech_retail = companies[~companies['sector'].isin(['Technology', 'Retail'])]

print("Companies not in Tech or Retail:")
not_tech_retail[['company', 'sector']]

.between() for Ranges

# Companies with revenues between $100k and $180k
mid_revenue = companies[companies['revenues'].between(100000, 180000)]

print("Mid-revenue companies ($100k-$180k):")
mid_revenue[['company', 'revenues']]

Cleaner than:

# Equivalent but more verbose
companies[(companies['revenues'] >= 100000) & (companies['revenues'] <= 180000)]

Useful Filtering Methods

Method	Description	Example
`.isin([list])`	Value is in list	`df[df['col'].isin(['A', 'B'])]`
`~.isin([list])`	Value NOT in list	`df[~df['col'].isin(['A', 'B'])]`
`.between(a, b)`	Value in range [a, b]	`df[df['col'].between(10, 20)]`
`.str.contains()`	String contains pattern	`df[df['col'].str.contains('tech')]`
`.str.startswith()`	String starts with	`df[df['col'].str.startswith('A')]`

Practice Exercises

Apply Boolean filtering with these exercises.

Exercise 1: Simple Filtering

Find all companies with more than 3000 employees
Find all companies in the Financials sector
Find all companies with revenues less than $100,000

# Your code here

Exercise 2: Multiple Conditions

Find Iranian companies with profits > $10,000
Find companies in Technology OR Healthcare sectors
Find companies with revenues > $150k AND employees < 5000

# Your code here

Hint

Remember: Each condition needs parentheses when using & or |!

Exercise 3: Using .loc[] with Filtering

For UK companies, show only company, sector, and profits
For unprofitable companies, show company and profits columns
For companies with > 4000 employees, show company, employees, and revenues

# Your code here

Exercise 4: Challenge

Find companies from UK or Germany that are profitable
Find companies NOT in USA with revenues between $90k and $200k
Count how many companies have negative profits

# Your code here

Summary

You now master Boolean filtering in pandas. Let’s review the key concepts.

Key Concepts

Boolean Filtering Process

Create Boolean mask with comparison
Apply mask to DataFrame
Works exactly like NumPy arrays

Single Conditions

Use comparison operators: ==, !=, >, <, >=, <=
Returns True/False for each row
Filter: df[condition]

Multiple Conditions

& for AND (both True)
| for OR (at least one True)
~ for NOT (reverse)
Each condition needs parentheses!

Helper Methods

.isin() for multiple values
.between() for ranges
Cleaner than complex conditions

Combine with Selection

Use .loc[condition, columns]
Filter rows and select columns together

Syntax Reference

# Simple filter
df[df['col'] > 100]

# AND condition
df[(df['col1'] > 100) & (df['col2'] == 'A')]

# OR condition
df[(df['col1'] > 100) | (df['col2'] > 200)]

# NOT condition
df[~(df['col'] == 'A')]

# Filter + select columns
df.loc[df['col'] > 100, ['col1', 'col2']]

# Multiple values
df[df['col'].isin(['A', 'B', 'C'])]

# Range
df[df['col'].between(10, 20)]

# Count matches
(df['col'] > 100).sum()

Comparison Operators

Operator	Meaning	Example
`==`	Equal	`df['col'] == 5`
`!=`	Not equal	`df['col'] != 5`
`>`	Greater than	`df['col'] > 5`
`<`	Less than	`df['col'] < 5`
`>=`	Greater or equal	`df['col'] >= 5`
`<=`	Less or equal	`df['col'] <= 5`

Logical Operators

Operator	Meaning	Example
`&`	AND (both True)	`(cond1) & (cond2)`
`	`	OR (one True)
`~`	NOT (reverse)	`~(condition)`

Important Reminders

Parentheses required: Always wrap each condition when using &, |, ~
Column names: Reference columns by name, not position
Boolean Series: Result of comparison is a Series of True/False
Count with sum: True=1, False=0, so .sum() counts matches
Use .loc[]: Combine filtering with column selection
Helper methods: .isin() and .between() simplify common patterns

Next Steps

You can now filter DataFrames based on any conditions. In the next lesson, you will learn to sort data and find top/bottom values—essential for ranking and organizing results.

Continue to Lesson 8 - Sorting and Ranking

Learn to order data and identify top performers

Back to Lesson 6 - DateTime Fundamentals

Review working with dates and times

Master Data Selection

Boolean filtering is your most-used data selection technique. Combined with NumPy’s vectorization and pandas’ labeled columns, you can express complex filtering logic clearly and efficiently.

Use Boolean filtering to ask questions of your data and get precise answers!

Lesson 6 - DateTime Fundamentals

Lesson 8 - Sorting and Ranking

Lesson 7 - Boolean Filtering in Pandas

Powerful Data Filtering#

Understanding Boolean Filtering#

Single Condition Filtering#

Creating a Boolean Mask#

Applying the Mask#

Shorthand Syntax#

More Examples#

Counting Matches#

Multiple Conditions#

AND Operator (&)#

OR Operator (|)#

NOT Operator (~)#

Combining Multiple Operators#

Common Mistake#

Filtering with .loc[]#

Filter Rows, Select Columns#

Multiple Conditions with Column Selection#

Helper Methods for Filtering#

.isin() for Multiple Values#

NOT isin()#

.between() for Ranges#

Useful Filtering Methods#

Practice Exercises#

Exercise 1: Simple Filtering#

Exercise 2: Multiple Conditions#

Exercise 3: Using .loc[] with Filtering#

Exercise 4: Challenge#

Summary#

Key Concepts#

Syntax Reference#

Comparison Operators#

Logical Operators#

Important Reminders#

Next Steps#