Strength or Accuracy: Credit Assignment in Learning Classifier Systems

Strength or Accuracy: Credit Assignment in Learning Classifier Systems

by Tim Kovacs

Paperback(Softcover reprint of the original 1st ed. 2004)

Use Express Shipping for guaranteed delivery by December 24 
Qualifying orders placed by 11:59 a.m. ET on Dec. 21 can still be sent to Manhattan, NY addresses. Details

Product Details

ISBN-13: 9781447110583
Publisher: Springer London
Publication date: 10/04/2012
Series: Distinguished Dissertations
Edition description: Softcover reprint of the original 1st ed. 2004
Pages: 307
Product dimensions: 6.10(w) x 9.25(h) x 0.03(d)

Table of Contents

1 Introduction.- 1.1 Two Example Machine Learning Tasks.- 1.2 Types of Task.- 1.2.1 Supervised and Reinforcement Learning.- 1.2.2 Sequential and Non-sequential Decision Tasks.- 1.3 Two Challenges for Classifier Systems.- 1.3.1 Problem 1: Learning a Policy from Reinforcement.- 1.3.2 Problem 2: Generalisation.- 1.4 Solution Methods.- 1.4.1 Method 1: Reinforcement Learning Algorithms.- 1.4.2 Method 2: Evolutionary Algorithms.- 1.5 Learning Classifier Systems.- 1.5.1 The Tripartite LCS Structure.- 1.5.2 LCS = Policy Learning + Generalisation.- 1.5.3 Credit Assignment in Classifier Systems.- 1.5.4 Strength and Accuracy-based Classifier Systems.- 1.6 About the Book.- 1.6.1 Why Compare Strength and Accuracy.- 1.6.2 Are LCS EC- or RL-based.- 1.6.3 Moving in Design Space.- 1.7 Structure of the Book.- 2 Learning Classifier Systems.- 2.1 Types of Classifier Systems.- 2.1.1 Michigan and Pittsburgh LCS.- 2.1.2 XCS and Traditional LCS?.- 2.2 Representing Rules.- 2.2.1 The Standard Ternary Language.- 2.2.2 Other Representations.- 2.2.3 Summary of Rule Representation.- 2.2.4 Notation for Rules.- 2.3 XCS.- 2.3.1 Wilson’s Motivation for XCS.- 2.3.2 Overview of XCS.- 2.3.3 Wilson’s Explore/Exploit Framework.- 2.3.4 The Performance System.- The XCS Performance System Algorithm.- The Match Set and Prediction Array.- Action Selection.- Experience-weighting of System Prediction.- 2.3.5 The Credit Assignment System.- The MAM Technique.- The Credit Assignment Algorithm.- Sequential and Non-sequential Updates.- Parameter Update Order.- XCS Parameter Updates.- 2.3.6 The Rule Discovery System.- Random Initial Populations.- Covering.- The Niche Genetic Algorithm.- Alternative Mutation Schemes.- Triggering the Niche GA.- Deletion of Rules.- Classifier Parameter Initialisation.- Subsumption Deletion.- 2.4 SB-XCS.- 2.4.1 Specification of SB-XCS.- 2.4.2 Comparison of SB-XCS and Other Strength LCS.- 2.5 Initial Tests of XCS and SB-XCS.- 2.5.1 The 6 Multiplexer.- 2.5.2 Woods2.- 2.6 Summary.- 3 How Strength and Accuracy Differ.- 3.1 Thinking about Complex Systems.- 3.2 Holland’s Rationale for CS-1 and his Later LCS.- 3.2.1 Schema Theory.- 3.2.2 The Bucket Brigade.- 3.2.3 Schema Theory + Bucket Brigade = Adaptation.- 3.3 Wilson’s Rationale for XCS.- 3.3.1 A Bias towards Accurate Rules.- 3.3.2 A Bias towards General Rules.- 3.3.3 Complete Maps.- 3.3.4 Summary.- 3.4 A Rationale for SB-XCS.- 3.5 Analysis of Populations Evolved by XCS and SB-XCS.- 3.5.1 SB-XCS.- 3.5.2 XCS.- 3.5.3 Learning Rate.- 3.6 Different Goals, Different Representations.- 3.6.1 Default Hierarchies.- 3.6.2 Partial and Best Action Maps.- 3.6.3 Complete Maps.- 3.6.4 What do XCS and SB-XCS Really Learn?.- 3.7 Complete and Partial Maps Compared.- 3.7.1 Advantages of Partial Maps.- 3.7.2 Disadvantages of Partial Maps.- 3.7.3 Complete Maps and Strength.- 3.7.4 Contrasting Complete and Partial Maps in RL Terminology.- 3.7.5 Summary of Comparison.- 3.8 Ability to Express Generalisations.- 3.8.1 Mapping Policies and Mapping Value Functions.- 3.8.2 Adapting the Accuracy Criterion.- 3.8.3 XCS-hard and SB-XCS-easy Functions.- 3.8.4 Summary of Generalisation and Efficiency.- 3.9 Summary.- 4 What Should a Classifier System Learn?.- 4.1 Representing Boolean Functions.- 4.1.1 Truth Tables.- 4.1.2 On-sets and Off-sets.- 4.1.3 Sigma Notation.- 4.1.4 Disjunctive Normal Form.- 4.1.5 Representing Functions with Sets of Rules.- 4.1.6 How Should a Classifier System Represent a Solution?.- 4.1.7 The Value of a Single Rule.- 4.1.1 The Value of a Set of Rules.- 4.1.1 Complete and Correct Representations.- 4.1.1 Minimal Representations.- 4.1.1 Non-overlapping Representations.- 4.1.1 Why XCS Prefers Non-overlapping Populations.- 4.1.1 Should we Prefer Non-overlapping Populations?.- 4.1.1 Optimal Rule Sets: [O]s.- 4.1.1 Conflicting Rules.- 4.1.1 Representation in XCS.- 4.3 How Should We Measure Performance?.- 4.3.1 Measures of Performance.- 4.3.2 Measures of Population State.- 4.3.3 Measuring Performance and Measuring State.- 4.3.4 New Population State Metrics.- 4.3.5 Testing XCS with %[PI].- 4.3.6 Testing XCS with %[m-DNF].- 4.3.7 Summary of Metrics and Properties.- 4.4 Summary.- 5 Prospects for Adaptation.- 5.1 Known Problems with Strength LCS.- 5.2 Methodology for Rule Type Analysis.- 5.3 Analysis of Rule Types.- 5.3.1 Correct and Incorrect Actions.- 5.3.2 Overgeneral Rules.- 5.3.3 Strong Overgeneral Rules.- 5.3.4 Fit Overgeneral Rules.- 5.3.5 Parallel Definitions of Strength and Fitness.- 5.4 When are Strong and Fit Overgenerals Possible?.- 5.4.1 Biases in the Reward Function are Relevant.- 5.4.2 Competition for Action Selection.- 5.4.3 Competition for Reproduction.- 5.5 Strong Overgenerals in XCS.- 5.5.1 Biases between Actions do not Produce Strong Overgenerals.- 5.5.2 Some Properties of Accuracy-based Fitness.- 5.6 Strong Overgenerals in SB-XCS.- 5.6.1 When are Strong Overgenerals Impossible in SB-XCS?.- 5.6.2 What Makes Strong Overgenerals Possible in SB-XCS?.- 5.7 Fit Overgenerals and the Survival of Rules under the GA.- 5.7.1 Comparison on an Unbiased Reward Function.- 5.7.2 Comparison on a Biased Reward Function.- 5.7.3 Discussion.- 5.8 Designing Strong and Fit Overgenerals for XCS.- 5.8.1 Biased Variance Functions.- 5.8.2 Empirical Results.- 5.8.3 Avoiding Fit Overgenerals.- 5.8.4 SB-XCS and Biased Variance Functions.- 5.9 Strong and Fit Undergeneral Rules.- 5.10 Why Bias the Reward Function.- 5.10.1 Some State-actions are more Important than Others.- 5.10.2 A Rule Allocation Bias can Focus Resources.- 5.11 Rule Allocation Reconsidered.- 5.11.1 Knowing What Not to Do.- 5.11.2 Managing Exploration.- 5.11.3 Complete and Partial Maps Revisited.- 5.11.4 Alternatives to Biasing the Reward Function.- 5.11.5 Can SB-XCS Avoid Strong and Fit Overgenerals?.- 5.12 Sequential Tasks.- 5.12.1 The Need to Pass Values Back.- 5.12.2 The Need for Discounting.- 5.12.3 How Q-functions become Biased.- 5.12.4 Examples.- 5.12.5 Woods2 Revisited.- 5.12.6 When Will the Value Function be Unbiased?.- 5.13 What Tasks can we Solve with SB-XCS?.- 5.14 Extensions.- 5.14.1 Fitness Sharing.- 5.14.2 Other Factors Contributing to Strong Overgenerals.- 5.14.3 Qualitative and Quantitative Approaches.- 5.15 Summary.- 6 Classifier Systems and Q-learning.- 6.1 Classifier Systems and Q-learning.- 6.1.1 Q-learning in Classifier Systems.- 6.1.2 Is it Really Q-learning?.- 6.1.3 XCS is a Proper Generalisation of Tabular Q-learning.- 6.1.4 Summary.- 6.2 The GA-view and RL-view Revisited.- 6.2.1 How SB-XCS Determines Policies.- 6.2.2 How XCS Determines Policies.- 6.2.3 Three Approaches to Determining a Policy.- 6.2.4 The GA-view and the RL-view.- 6.2.5 Combining Evolution and Q-learning.- 6.3. XCS is Closer to Tabular Q-learning than to SB-XCS.- 6.4. Summary.- 7 Conclusion.- 7.1 The Capacities of Various Types of LCS.- 7.2 Contributions.- 7.3 The Take-home Message.- 7.4 Open Problems and Future Work.- 7.4.1 Fitness Sharing and Strength-based Fitness.- 7.4.2 Further Study of Accuracy-based Fitness.- 7.5 Concluding Remarks.- 7.5.1 The Moral of the Story: The Need for a Complex Systems Design Methodology.- 7.5.2 Classifier Systems and Reinforcement Learning.- 7.5.3 The Future.- Appendices.- A Evaluation of Macroclassifiers.- B Example XCS Cycle.- B.1 The Performance System Algorithm.- B.2 The Credit Assignment Algorithm.- B.3 The Rule Discovery Algorithm.- C Learning from Reinforcement.- C.1 Three Learning Paradigms.- C.1.1 Supervised Learning.- C.1.2 Reinforcement Learning.- C.1.3 Unsupervised Learning.- C.2 The Explore/Exploit Dilemma: a Feature of RL.- C.3 Sequential and Non-sequential Tasks.- C.3.1 Immediate Reward and Long-term Value.- C.3.2 Sequential Decisions Imply RL.- C.3.3 Episodic and Continuing Tasks.- C.4 The Agent’s Goal: Maximising Return.- C.4.1 Return and Reward.- C.4.2 Sequential Formulations of Return.- C.5 Formalising RL Tasks.- C.5.1 Environment.- C.5.2 Learning Agent.- C.5.3 Agent-environment Interaction.- C.6 Summary.- D Generalisation Problems.- D.1 Why Generalise?.- D.1.1 The Curse of Dimensionality.- D.1.2 The Need for Generalisation.- D.2 Generalisation in RL.- D.2.1 Generalising Over Policies and Value Functions.- D.3 State Aggregation.- D.4 State Space and Generalisation Space.- D.5 Summary.- E Value Estimation Algorithms.- E.1 The Value of State-actions.- E.2 Non-sequential RL: Estimating Reward Functions.- E.2.1 The Value of State-actions in Non-sequential Tasks.- E.3 Estimating Expectations with Sample Averages.- E.3.1 Incremental Updates.- E.3.2 A General Form of Incremental Update.- E.3.3 Setting StepSize in Incremental Updates.- E.3.4 A Prediction Algorithm for Non-sequential RL.- E.4 Sequential RL: Estimating Long-term Value Functions.- E.4.1 Long-term Value Functions.- E.4.2 The Value of State-actions in Sequential Tasks.- E.4.3 The Value of a Policy.- E.4.4 Estimating Values with Monte Carlo Methods.- E.4.5 Estimating Values with Temporal Difference Methods.- E.4.6 Russell and Norvig’s Maze: A Sequential RL Task.- E.4.7 Summary of Sequential RL.- E.5 State Aggregation.- E.5.1 Fixed and Adaptive Aggregation Schemes.- E.5.2 The Value of Aggregations I: Return.- E.5.3 The Value of Aggregations II: Predictive Utility.- E.6 Storing Value Estimates.- E.6.1 Storing Estimates of Aggregations.- E.6.2 Sparse Estimators, Models and Search.- E.6.3 Function Approximators.- E.7 Summary.- F Generalised Policy Iteration Algorithms.- F.1 Policy Improvement.- F.2 Optimal Policies.- F.3 Generalised Policy Iteration.- F.3.1 How Well must we Evaluate a Policy?.- F.3.2 Convergence Properties of GPI Control Algorithms.- F.3.3 Initialising Value Functions.- F.3.4 What Characterises GPI Algorithms?.- F.4 State-value Functions.- F.5 Summary.- G Evolutionary Algorithms.- G.1 Evolution.- G.2 Elements of EAs.- G.2.1 A Generic EA.- G.2.2 Population-based Search.- G.2.3 Fitness Functions.- G.2.4 Probabilistic Selection of Parents.- G.2.5 Genetic Operators.- G.2.6 Replacement.- G.3 EAs as Search.- G.3.1 Local and Global Optima.- G.4 The Generalisation Problem.- G.5 Niching and Mating Restriction.- G.5.1 Fitness Sharing.- G.5.2 Crowding.- G.5.3 Mating Restriction.- G.6 RL with EAs.- G.6.1 Non-associative RL with an EA.- G.6.2 Associative RL with an EA.- G.6.3 Sequential RL with an EA.- G.7 Comparing GPI and EA Methods for RL.- G.7.1 Similarities between GPI and EA Methods.- G.8 Summary.- H The Origins of Sarsa.- H.1 Modified Connectionist Q-learning.- H.2 ZCS’s Implicit Bucket Brigade.- H.3 Who Invented Sarsa?.- I Notation.- References.

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews