Train/Test Split
Train/Test Split
Never evaluate on training data! Splitting data properly is essential for honest evaluation.
Why Split Data?
Loading Python Playground...
The Standard Split
Loading Python Playground...
Shuffling and Stratification
Loading Python Playground...
Key Takeaways
- Always split data BEFORE any preprocessing that uses y
- Use 80/20 or 70/30 split for most cases
- Shuffle to avoid ordering bias
- Stratify for classification with imbalanced classes
- Set random_state for reproducibility
Quiz
Question 1 of 520% Complete
0 of 5 questions answered

