ftfy (Fixes Text For You)
What it is: Python library that fixes broken Unicode and text encoding errors. Automatically detects and repairs mojibake, smart quotes, and other text corruption from encoding issues.
What It Does Best
Fixes mojibake. Turns "Café" back into "Café" and "don’t" into "don't". Automatically detects and reverses encoding mistakes.
Normalizes Unicode. Handles multiple representations of same character. Removes invisible control characters that break string matching.
Smart defaults. Call fix_text() and it handles 99% of text issues. Doesn't over-correct or introduce new problems.
Pricing
Free. Open source, Apache license.
When to Use It
✅ Scraping data from web with mixed encodings
✅ Legacy databases with encoding issues
✅ User-submitted text with copy-paste artifacts
✅ Files exported from Excel or other tools
When NOT to Use It
❌ Text already clean and properly encoded
❌ Need language-specific text processing (use spaCy)
❌ Processing data that shouldn't be modified
Bottom line: Solves a specific problem brilliantly. When you see weird characters in your text data, ftfy is the answer. One function call fixes most encoding disasters. Keep it in your toolkit.