2026-04-16 11:21 Tags:


🌳 1. What is pruning (intuitive)?

👉 Pruning = cutting unnecessary branches of the tree

Think:

A decision tree is like a very overthinking person
→ keeps asking more and more specific questions
→ eventually becomes ridiculous

Example:

IF age > 65
  AND SpO2 < 94
  AND heart rate = 103
  AND day = Tuesday

😅 That last condition is probably useless.

👉 Pruning = remove those useless splits


🧠 2. Why do we need pruning?

Because of overfitting.

Without pruning:

  • Tree memorizes training data

  • Performs badly on new data


Mental model

Tree typeBehavior
Too big ❌memorizes noise
Pruned ✅captures real patterns

✂️ 3. Two types of pruning (very important)

(1) Pre-pruning (early stopping)

👉 Stop the tree from growing too much

You already used this without realizing:

max_depth=5
min_samples_leaf=10

These are pre-pruning controls


Think of it like:

“Don’t let the tree become too detailed in the first place”


(2) Post-pruning (the real “pruning”)

👉 Grow full tree first → then cut it back

This is more “theoretical correct” way.


🔪 4. Core idea of post-pruning

👉 Ask:

“If I remove this split, does performance get worse?”

If NOT → remove it


Example

Before:

Split A
 ├── Split B
 │     ├── Leaf (pure)
 │     └── Leaf (pure)

After pruning:

Split A
 └── Leaf (still good enough)

👉 If removing B doesn’t hurt much → cut it


⚖️ 5. Cost-complexity pruning (the only concept you need)

In sklearn, pruning is based on:

Balance between:

  • model accuracy

  • model simplicity


Formula intuition (no need to memorize)

Total Loss = Error + α × Tree Complexity
  • Error ↓ → better fit

  • Complexity ↑ → more splits

👉 α (ccp_alpha) controls trade-off


Key idea:

α valueEffect
small αbig tree
large αmore pruning