Notes
Slide Show
Outline
1

Breeding Decision Trees Using Evolutionary Techniques

  • Papagelis Athanasios  -  Kalles Dimitrios
    Computer Technology Institute & AHEAD RM
2
Introduction
  • We use GAs to evolve simple and accurate binary decision trees
  • Simple genetic operators over tree structures
  • Experiments with UCI datasets
    • very good size
    • competitive accuracy results
  • Experiments with synthetic datasets
    • Superior accuracy results
3
Current tree induction algorithms…
  • .. Use greedy heuristics
    • To guide search during tree building
    • To prune the resulting trees
  • Fast implementations
  • Accurate results on widely used benchmark datasets (like UCI datasets)
  • Optimal results ?
    • No
  • Good for real world problems?
    • There are not many real world datasets available for research.

4
More on greedy heuristics
  • They can quickly guide us to desired solutions
  • On the other hand they can substantially deviate from optimal
  • WHY?
    • They are very strict
    • Which means that they are VERY GOOD just for a limited problem space
5
Why GAs should work ?
  • GAs are not
    • Hill climbers
      • Blind on complex search spaces
    • Exhaustive searchers
      • Extremely expensive
  •  They are …
    • Beam searchers
      • They balance between time needed and space searched
  • Application on bigger problem space
    • Good results for much more problems
    • No need to tune or derive new algorithms
6
Another way to see it..
  • Biases
  • Preference bias
    • Characteristics of output
      • We should choose about it
        • e.g small trees
  • Procedural bias
    • How we will search?
      • We should not choose about it
      • Unfortunately we have to:
        • Greedy heuristics make strong hypotheses about search space
        • GAs make weak hypotheses about search space

7
The real world question…
  • Are there datasets where hill-climbing techniques are really inadequate ?
    • e.g unnecessarily big – misguiding output
  • Yes there are…
    • Conditionally dependent attributes
      • e.g XOR
    • Irrelevant attributes
      • Many solutions that use GAs as a preprocessor so as to select adequate attributes
  • Direct genetic search can be proven more efficient for those datasets
8
The proposed solution
  • Select the desired decision tree characteristics (e.g small size)
  • Adopt a decision tree representation with appropriate genetic operators
  • Create an appropriate fitness function
  • Produce a representative initial population
  • Evolve for as long as you wish!


9
Initialization procedure
  • Population of minimum decision trees
    • Simple and fast
  • Choose a random value as test value
  • Choose two random classes as leaves
10
Genetic operators
11
Payoff function
  • Balance between accuracy and size


12
Advanced System Characteristics
  •  Scalled payoff function (Goldberg, 1989)
  •  Alternative crossovers
    • Evolution towards fit subtrees
      • Accurate subtrees had less chance to be used for crossover or mutation.
  • Limited Error Fitness (LEF) (Gathercole & Ross, 1997)
    • significant CPU timesavings and insignificant accuracy loses
13
Second Layer GA
  • Test the effectiveness of all those components
    • coded information about the mutation/crossover rates and different heuristics as well as a number of other optimizing parameters
  • Most recurring results:
    • mutation rate 0.005
    • crossover rate 0.93
    • use a crowding avoidance technique
    • Alternative crossover/mutation techniques did not produce better results than basic crossover/mutation
14
Search space / Induction costs
  • 10 leaves,6 values,2 classes
    • Search space >50,173,704,142,848 (HUGE!)
  • Greedy feature selection
    • O(ak) a=attributes,k=instances (Quinlan 1986)
    • O(a2k2) one level lookahead (Murthy and Salzberg, 1995)
    • O(adkd) for d-1 levels of lookahead
  • Proposed heuristic
    • O(gen* k2*a).
  • Extended heuristic
    • O(gen*k*a)




15
How it works? An example (a)
  • An artificial dataset with eight rules (26 possible value, three classes)
  •  First two activation-rules as below:
      • (15.0 %) c1 ß A=(a or b or t) & B=(a or h or q or x)
      • (14.0%) c1 ß B=(f or l or s or w) & C=(c or e or f or k)
    • Huge Search Space !!!

16
How it works? An example (b)
17
Illustration of greedy heuristics problem
  • An example dataset (XOR over A1&A2)


18
C4.5 result tree
19
More experiments towards this direction
20
Results for artificial datasets
21
Results for UCI datasets
22
C4.5 / OneR deficiencies
  • Similar preference biases
    • Accurate, small decision trees
      • This is acceptable
  • Not optimized procedural biases
    • Emphasis on accuracy (C4.5)
      • Not optimized tree’s size
    • Emphasis on size (OneR)
      • Trivial search policy
    • Pruning as a greedy heuristic has similar disadvantages


23
Future work
  • Minimize evolution time
    • crossover/mutation operators change the tree from a node downwards
    • we can classify only the instances that belong to the changed-node’s subtree.
    • But we need to maintain more node statistics


24
Future work (2)
  • Choose the output class using a majority vote over the produced tree forest (experts voting)
  • Pruning is a greedy heuristic
    • A GA’s pruning?