Real datasets are never as clean as tutorial datasets. This guide builds a detect-decide-fix workflow for pandas, then applies it to a real, freely-licensed museum collection dataset — missing values, disguised placeholders, inconsistent text, duplicates, and messy dates included.
July 3, 2026 in Python, Data Analysis by Mehdi Lotfinejad12 minutes
Messy ticket subjects, log lines, and free-text fields all hide structured data. This guide builds a pattern-then-question mental model for Python's re module, then works through groups, findall, sub, and re.compile on a support-ticket inbox you can reproduce yourself.
July 3, 2026 in Python, Data Analysis by Mehdi Lotfinejad11 minutes