|
1
|
- Papagelis Athanasios - Kalles Dimitrios
Computer Technology Institute & AHEAD RM
|
|
2
|
- We use GAs to evolve simple and accurate binary decision trees
- Simple genetic operators over tree structures
- Experiments with UCI datasets
- very good size
- competitive accuracy results
- Experiments with synthetic datasets
- Superior accuracy results
|
|
3
|
- .. Use greedy heuristics
- To guide search during tree building
- To prune the resulting trees
- Fast implementations
- Accurate results on widely used benchmark datasets (like UCI datasets)
- Optimal results ?
- Good for real world problems?
- There are not many real world datasets available for research.
|
|
4
|
- They can quickly guide us to desired solutions
- On the other hand they can substantially deviate from optimal
- WHY?
- They are very strict
- Which means that they are VERY GOOD just for a limited problem space
|
|
5
|
- GAs are not
- Hill climbers
- Blind on complex search spaces
- Exhaustive searchers
- They are …
- Beam searchers
- They balance between time needed and space searched
- Application on bigger problem space
- Good results for much more problems
- No need to tune or derive new algorithms
|
|
6
|
- Biases
- Preference bias
- Characteristics of output
- We should choose about it
- Procedural bias
- How we will search?
- We should not choose about it
- Unfortunately we have to:
- Greedy heuristics make strong hypotheses about search space
- GAs make weak hypotheses about search space
|
|
7
|
- Are there datasets where hill-climbing techniques are really inadequate
?
- e.g unnecessarily big – misguiding output
- Yes there are…
- Conditionally dependent attributes
- Irrelevant attributes
- Many solutions that use GAs as a preprocessor so as to select adequate
attributes
- Direct genetic search can be proven more efficient for those datasets
|
|
8
|
- Select the desired decision tree characteristics (e.g small size)
- Adopt a decision tree representation with appropriate genetic operators
- Create an appropriate fitness function
- Produce a representative initial population
- Evolve for as long as you wish!
|
|
9
|
- Population of minimum decision trees
- Choose a random value as test value
- Choose two random classes as leaves
|
|
10
|
|
|
11
|
- Balance between accuracy and size
|
|
12
|
- Scalled payoff function (Goldberg,
1989)
- Alternative crossovers
- Evolution towards fit subtrees
- Accurate subtrees had less chance to be used for crossover or
mutation.
- Limited Error Fitness (LEF) (Gathercole & Ross, 1997)
- significant CPU timesavings and insignificant accuracy loses
|
|
13
|
- Test the effectiveness of all those components
- coded information about the mutation/crossover rates and different
heuristics as well as a number of other optimizing parameters
- Most recurring results:
- mutation rate 0.005
- crossover rate 0.93
- use a crowding avoidance technique
- Alternative crossover/mutation techniques did not produce better
results than basic crossover/mutation
|
|
14
|
- 10 leaves,6 values,2 classes
- Search space >50,173,704,142,848 (HUGE!)
- Greedy feature selection
- O(ak) a=attributes,k=instances (Quinlan 1986)
- O(a2k2) one level lookahead (Murthy and Salzberg,
1995)
- O(adkd) for d-1 levels of lookahead
- Proposed heuristic
- Extended heuristic
|
|
15
|
- An artificial dataset with eight rules (26 possible value, three
classes)
- First two activation-rules as
below:
- (15.0 %) c1 ß A=(a
or b or t) & B=(a or h or q or x)
- (14.0%) c1 ß B=(f
or l or s or w) & C=(c or e or f or k)
- Huge Search Space !!!
|
|
16
|
|
|
17
|
- An example dataset (XOR over A1&A2)
|
|
18
|
|
|
19
|
|
|
20
|
|
|
21
|
|
|
22
|
- Similar preference biases
- Accurate, small decision trees
- Not optimized procedural biases
- Emphasis on accuracy (C4.5)
- Not optimized tree’s size
- Emphasis on size (OneR)
- Pruning as a greedy heuristic has similar disadvantages
|
|
23
|
- Minimize evolution time
- crossover/mutation operators change the tree from a node downwards
- we can classify only the instances that belong to the changed-node’s
subtree.
- But we need to maintain more node statistics
|
|
24
|
- Choose the output class using a majority vote over the produced tree
forest (experts voting)
- Pruning is a greedy heuristic
|