November 12, 2025 in Data Analysis, Python by DataTweets7 minutes
Avoid these 7 common Pandas pitfalls that frustrate beginners. From SettingWithCopyWarning to inplace operations, learn practical solutions with real code examples.
If you’re learning Pandas for data analysis, you’ve probably run into some confusing behavior that made you scratch your head. You’re not alone! Pandas is incredibly powerful, but it has some quirks that trip up even experienced programmers who are new to the library.
In this guide, we’ll walk through 7 common Pandas gotchas that beginners encounter, explain why they happen, and show you how to avoid them. By the end, you’ll write cleaner, more predictable Pandas code.
This is probably the most confusing warning you’ll see when starting with Pandas:
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
subset = df[df['A'] > 1]
subset['C'] = [10, 20] # ⚠️ SettingWithCopyWarning!Warning: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
When you filter or slice a DataFrame, Pandas sometimes returns a view (reference to original data) and sometimes returns a copy. You can’t easily tell which one you got, so modifying it might not work as expected.
Use .copy() to explicitly create a copy, or .loc[] to modify the original:
# Solution 1: Explicit copy
subset = df[df['A'] > 1].copy()
subset['C'] = [10, 20] # ✅ No warning
# Solution 2: Modify original with loc
df.loc[df['A'] > 1, 'C'] = [10, 20] # ✅ No warningRule of thumb: If you’re creating a subset to modify, use .copy(). If you want to modify the original DataFrame, use .loc[].
This looks logical but often fails:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df[df['A'] > 1]['B'] = 99 # ❌ Doesn't update df!
print(df)
# A B
# 0 1 4
# 1 2 5 ← Still 5, not 99!
# 2 3 6 ← Still 6, not 99!Chained indexing like df[condition]['column'] creates a temporary DataFrame that gets thrown away. Your assignment modifies the temporary copy, not the original.
Always use .loc[] for assignments:
df.loc[df['A'] > 1, 'B'] = 99 # ✅ Works!
print(df)
# A B
# 0 1 4
# 1 2 99 ← Updated!
# 2 3 99 ← Updated!Key takeaway: For assignments, use .loc[row_indexer, col_indexer] instead of chaining brackets.
inplace=True TrapMany beginners think inplace=True makes operations faster:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
result = df.drop('B', axis=1, inplace=True)
print(result) # None ← Wait, what?When you use inplace=True, the method modifies the DataFrame and returns None. Many beginners expect it to return the modified DataFrame.
Avoid inplace=True in most cases. It’s not faster and makes code harder to debug:
# ✅ Better: Assign the result
df = df.drop('B', axis=1)
# Or if you want to keep original
df_new = df.drop('B', axis=1)Why avoid inplace?
None, breaking method chainingUsing Python loops when Pandas has vectorized operations:
import pandas as pd
import numpy as np
df = pd.DataFrame({'values': range(10000)})
# ❌ Slow: Looping
result = []
for val in df['values']:
result.append(val * 2)
df['doubled'] = resultLoops in Python are 10-100x slower than vectorized operations. Pandas is built on NumPy for speed.
Use vectorized operations:
# ✅ Fast: Vectorized
df['doubled'] = df['values'] * 2
# ✅ Also fast: apply() for complex operations
df['processed'] = df['values'].apply(lambda x: x * 2 if x > 5000 else x)Speed comparison:
After filtering or sorting, index numbers become non-sequential:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})
df_filtered = df[df['A'] > 2]
print(df_filtered)
# A
# 2 3 ← Index starts at 2!
# 3 4
# 4 5
df_filtered.loc[0, 'A'] # ❌ KeyError: 0 is not in the index!Filtering preserves the original index. The index isn’t automatically renumbered.
Decide what you want to do with the index:
# Option 1: Reset index (most common)
df_filtered = df[df['A'] > 2].reset_index(drop=True)
print(df_filtered)
# A
# 0 3 ← Index reset to 0
# 1 4
# 2 5
# Option 2: Use iloc for positional access
df_filtered.iloc[0] # ✅ Works regardless of index
# Option 3: Keep original index for merging
df_filtered = df[df['A'] > 2] # Keep index for later mergeWhen to reset:
Changing a DataFrame while looping over it:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5]})
for idx, row in df.iterrows():
if row['A'] > 2:
df.drop(idx, inplace=True) # ❌ Dangerous!This can cause unexpected behavior or skip rows.
Modifying a collection while iterating over it confuses the iterator. Some rows get skipped.
Create a boolean mask and apply it once:
# ✅ Better: Use boolean indexing
df = df[df['A'] <= 2]
# Or for more complex logic
mask = df['A'].apply(lambda x: some_complex_condition(x))
df = df[mask]Rule: Never modify a DataFrame’s structure (add/remove rows) while iterating. Use boolean masks instead.
Missing values cause unexpected results:
df = pd.DataFrame({'A': [1, 2, None, 4]})
print(df['A'].sum()) # 7.0 ← NaN is ignored
print(df['A'].mean()) # 2.333... ← Mean of 1, 2, 4 (ignores NaN)
print(df['A'] == None) # ❌ Doesn't work as expected!Pandas treats NaN (Not a Number) specially. Most operations skip NaN values by default. Also, you can’t compare with None using ==.
Be explicit about handling missing data:
import numpy as np
# ✅ Check for missing values correctly
df['A'].isna() # Returns True/False for each value
df['A'].notna() # Opposite of isna()
# ✅ Count missing values
df['A'].isna().sum() # Number of NaN values
# ✅ Fill missing values
df['A'].fillna(0) # Replace NaN with 0
df['A'].fillna(df['A'].mean()) # Replace with mean
# ✅ Drop rows with missing values
df.dropna() # Drop any row with NaN
df.dropna(subset=['A']) # Drop only if 'A' is NaN
# ✅ Keep only rows with missing values
df[df['A'].isna()]Best practice: Always check for missing data early with df.info() or df.isna().sum().
Instead of this:
df = pd.read_csv('data.csv')
df = df[df['age'] > 18]
df = df.dropna()
df = df.sort_values('name')Write this:
df = (pd.read_csv('data.csv')
.query('age > 18')
.dropna()
.sort_values('name'))Always start with:
df.info() # Column types and missing values
df.describe() # Statistical summary
df.head() # First 5 rows
df.sample(10) # Random 10 rows.loc[] and .iloc[] Explicitly.loc[] - Label-based indexing (uses index names).iloc[] - Position-based indexing (uses integer positions)df.loc[0, 'A'] # Row with index=0, column 'A'
df.iloc[0, 0] # First row, first column (position)✅ Always use .copy() when creating subsets you’ll modify
✅ Use .loc[] for assignments, never chained indexing
✅ Avoid inplace=True - assign results instead
✅ Think vectorized - avoid loops when possible
✅ Reset index after filtering if you don’t need original index
✅ Use boolean masks instead of modifying during iteration
✅ Check for NaN with .isna(), not == None
Now that you know these common pitfalls, you’re ready to write more robust Pandas code! Want to dive deeper?
Have you encountered other Pandas gotchas? Share them on GitHub and help other learners!
About the Author: This post is brought to you by DataTweets, where we teach data analytics through practical, hands-on Python courses. All our courses are free and open-source.