Strength or Accuracy: Credit Assignment in Learning Classifier Systems

Strength or Accuracy: Credit Assignment in Learning Classifier Systems

by Tim Kovacs

Paperback(Softcover reprint of the original 1st ed. 2004)

$199.00
Use Express Shipping for guaranteed delivery by December 24 
Qualifying orders placed by 11:59 a.m. ET on Dec. 21 can still be sent to Manhattan, NY addresses. Details

Product Details

ISBN-13: 9781447110583
Publisher: Springer London
Publication date: 10/04/2012
Series: Distinguished Dissertations
Edition description: Softcover reprint of the original 1st ed. 2004
Pages: 307
Product dimensions: 6.10(w) x 9.25(h) x 0.03(d)

Table of Contents

1 Introduction.- 1.1 Two Example Machine Learning Tasks.- 1.2 Types of Task.- 1.2.1 Supervised and Reinforcement Learning.- 1.2.2 Sequential and Non-sequential Decision Tasks.- 1.3 Two Challenges for Classifier Systems.- 1.3.1 Problem 1: Learning a Policy from Reinforcement.- 1.3.2 Problem 2: Generalisation.- 1.4 Solution Methods.- 1.4.1 Method 1: Reinforcement Learning Algorithms.- 1.4.2 Method 2: Evolutionary Algorithms.- 1.5 Learning Classifier Systems.- 1.5.1 The Tripartite LCS Structure.- 1.5.2 LCS = Policy Learning + Generalisation.- 1.5.3 Credit Assignment in Classifier Systems.- 1.5.4 Strength and Accuracy-based Classifier Systems.- 1.6 About the Book.- 1.6.1 Why Compare Strength and Accuracy.- 1.6.2 Are LCS EC- or RL-based.- 1.6.3 Moving in Design Space.- 1.7 Structure of the Book.- 2 Learning Classifier Systems.- 2.1 Types of Classifier Systems.- 2.1.1 Michigan and Pittsburgh LCS.- 2.1.2 XCS and Traditional LCS?.- 2.2 Representing Rules.- 2.2.1 The Standard Ternary Language.- 2.2.2 Other Representations.- 2.2.3 Summary of Rule Representation.- 2.2.4 Notation for Rules.- 2.3 XCS.- 2.3.1 Wilson’s Motivation for XCS.- 2.3.2 Overview of XCS.- 2.3.3 Wilson’s Explore/Exploit Framework.- 2.3.4 The Performance System.- 2.3.4.1 The XCS Performance System Algorithm.- 2.3.4.2 The Match Set and Prediction Array.- 2.3.4.3 Action Selection.- 2.3.4.4 Experience-weighting of System Prediction.- 2.3.5 The Credit Assignment System.- 2.3.5.1 The MAM Technique.- 2.3.5.2 The Credit Assignment Algorithm.- 2.3.5.3 Sequential and Non-sequential Updates.- 2.3.5.4 Parameter Update Order.- 2.3.5.5 XCS Parameter Updates.- 2.3.6 The Rule Discovery System.- 2.3.6.1 Random Initial Populations.- 2.3.6.2 Covering.- 2.3.6.3 The Niche Genetic Algorithm.- 2.3.6.4 Alternative Mutation Schemes.- 2.3.6.5 Triggering the Niche GA.- 2.3.6.6 Deletion of Rules.- 2.3.6.7 Classifier Parameter Initialisation.- 2.3.6.8 Subsumption Deletion.- 2.4 SB-XCS.- 2.4.1 Specification of SB-XCS.- 2.4.2 Comparison of SB-XCS and Other Strength LCS.- 2.5 Initial Tests of XCS and SB-XCS.- 2.5.1 The 6 Multiplexer.- 2.5.2 Woods2.- 2.6 Summary.- 3 How Strength and Accuracy Differ.- 3.1 Thinking about Complex Systems.- 3.2 Holland’s Rationale for CS-1 and his Later LCS.- 3.2.1 Schema Theory.- 3.2.2 The Bucket Brigade.- 3.2.3 Schema Theory + Bucket Brigade = Adaptation.- 3.3 Wilson’s Rationale for XCS.- 3.3.1 A Bias towards Accurate Rules.- 3.3.2 A Bias towards General Rules.- 3.3.3 Complete Maps.- 3.3.4 Summary.- 3.4 A Rationale for SB-XCS.- 3.5 Analysis of Populations Evolved by XCS and SB-XCS.- 3.5.1 SB-XCS.- 3.5.2 XCS.- 3.5.3 Learning Rate.- 3.6 Different Goals, Different Representations.- 3.6.1 Default Hierarchies.- 3.6.2 Partial and Best Action Maps.- 3.6.3 Complete Maps.- 3.6.4 What do XCS and SB-XCS Really Learn?.- 3.7 Complete and Partial Maps Compared.- 3.7.1 Advantages of Partial Maps.- 3.7.2 Disadvantages of Partial Maps.- 3.7.3 Complete Maps and Strength.- 3.7.4 Contrasting Complete and Partial Maps in RL Terminology.- 3.7.5 Summary of Comparison.- 3.8 Ability to Express Generalisations.- 3.8.1 Mapping Policies and Mapping Value Functions.- 3.8.2 Adapting the Accuracy Criterion.- 3.8.3 XCS-hard and SB-XCS-easy Functions.- 3.8.4 Summary of Generalisation and Efficiency.- 3.9 Summary.- 4 What Should a Classifier System Learn?.- 4.1 Representing Boolean Functions.- 4.1.1 Truth Tables.- 4.1.2 On-sets and Off-sets.- 4.1.3 Sigma Notation.- 4.1.4 Disjunctive Normal Form.- 4.1.5 Representing Functions with Sets of Rules.- 4.1.6 How Should a Classifier System Represent a Solution?.- 4.1.7 The Value of a Single Rule.- 4.1.1 The Value of a Set of Rules.- 4.1.1 Complete and Correct Representations.- 4.1.1 Minimal Representations.- 4.1.1 Non-overlapping Representations.- 4.1.1 Why XCS Prefers Non-overlapping Populations.- 4.1.1 Should we Prefer Non-overlapping Populations?.- 4.1.1 Optimal Rule Sets: [O]s.- 4.1.1 Conflicting Rules.- 4.1.1 Representation in XCS.- 4.3 How Should We Measure Performance?.- 4.3.1 Measures of Performance.- 4.3.2 Measures of Population State.- 4.3.3 Measuring Performance and Measuring State.- 4.3.4 New Population State Metrics.- 4.3.5 Testing XCS with %[PI].- 4.3.6 Testing XCS with %[m-DNF].- 4.3.7 Summary of Metrics and Properties.- 4.4 Summary.- 5 Prospects for Adaptation.- 5.1 Known Problems with Strength LCS.- 5.2 Methodology for Rule Type Analysis.- 5.3 Analysis of Rule Types.- 5.3.1 Correct and Incorrect Actions.- 5.3.2 Overgeneral Rules.- 5.3.3 Strong Overgeneral Rules.- 5.3.4 Fit Overgeneral Rules.- 5.3.5 Parallel Definitions of Strength and Fitness.- 5.4 When are Strong and Fit Overgenerals Possible?.- 5.4.1 Biases in the Reward Function are Relevant.- 5.4.2 Competition for Action Selection.- 5.4.3 Competition for Reproduction.- 5.5 Strong Overgenerals in XCS.- 5.5.1 Biases between Actions do not Produce Strong Overgenerals.- 5.5.2 Some Properties of Accuracy-based Fitness.- 5.6 Strong Overgenerals in SB-XCS.- 5.6.1 When are Strong Overgenerals Impossible in SB-XCS?.- 5.6.2 What Makes Strong Overgenerals Possible in SB-XCS?.- 5.7 Fit Overgenerals and the Survival of Rules under the GA.- 5.7.1 Comparison on an Unbiased Reward Function.- 5.7.2 Comparison on a Biased Reward Function.- 5.7.3 Discussion.- 5.8 Designing Strong and Fit Overgenerals for XCS.- 5.8.1 Biased Variance Functions.- 5.8.2 Empirical Results.- 5.8.3 Avoiding Fit Overgenerals.- 5.8.4 SB-XCS and Biased Variance Functions.- 5.9 Strong and Fit Undergeneral Rules.- 5.10 Why Bias the Reward Function.- 5.10.1 Some State-actions are more Important than Others.- 5.10.2 A Rule Allocation Bias can Focus Resources.- 5.11 Rule Allocation Reconsidered.- 5.11.1 Knowing What Not to Do.- 5.11.2 Managing Exploration.- 5.11.3 Complete and Partial Maps Revisited.- 5.11.4 Alternatives to Biasing the Reward Function.- 5.11.5 Can SB-XCS Avoid Strong and Fit Overgenerals?.- 5.12 Sequential Tasks.- 5.12.1 The Need to Pass Values Back.- 5.12.2 The Need for Discounting.- 5.12.3 How Q-functions become Biased.- 5.12.4 Examples.- 5.12.5 Woods2 Revisited.- 5.12.6 When Will the Value Function be Unbiased?.- 5.13 What Tasks can we Solve with SB-XCS?.- 5.14 Extensions.- 5.14.1 Fitness Sharing.- 5.14.2 Other Factors Contributing to Strong Overgenerals.- 5.14.3 Qualitative and Quantitative Approaches.- 5.15 Summary.- 6 Classifier Systems and Q-learning.- 6.1 Classifier Systems and Q-learning.- 6.1.1 Q-learning in Classifier Systems.- 6.1.2 Is it Really Q-learning?.- 6.1.3 XCS is a Proper Generalisation of Tabular Q-learning.- 6.1.4 Summary.- 6.2 The GA-view and RL-view Revisited.- 6.2.1 How SB-XCS Determines Policies.- 6.2.2 How XCS Determines Policies.- 6.2.3 Three Approaches to Determining a Policy.- 6.2.4 The GA-view and the RL-view.- 6.2.5 Combining Evolution and Q-learning.- 6.3. XCS is Closer to Tabular Q-learning than to SB-XCS.- 6.4. Summary.- 7 Conclusion.- 7.1 The Capacities of Various Types of LCS.- 7.2 Contributions.- 7.3 The Take-home Message.- 7.4 Open Problems and Future Work.- 7.4.1 Fitness Sharing and Strength-based Fitness.- 7.4.2 Further Study of Accuracy-based Fitness.- 7.5 Concluding Remarks.- 7.5.1 The Moral of the Story: The Need for a Complex Systems Design Methodology.- 7.5.2 Classifier Systems and Reinforcement Learning.- 7.5.3 The Future.- Appendices.- A Evaluation of Macroclassifiers.- B Example XCS Cycle.- B.1 The Performance System Algorithm.- B.2 The Credit Assignment Algorithm.- B.3 The Rule Discovery Algorithm.- C Learning from Reinforcement.- C.1 Three Learning Paradigms.- C.1.1 Supervised Learning.- C.1.2 Reinforcement Learning.- C.1.3 Unsupervised Learning.- C.2 The Explore/Exploit Dilemma: a Feature of RL.- C.3 Sequential and Non-sequential Tasks.- C.3.1 Immediate Reward and Long-term Value.- C.3.2 Sequential Decisions Imply RL.- C.3.3 Episodic and Continuing Tasks.- C.4 The Agent’s Goal: Maximising Return.- C.4.1 Return and Reward.- C.4.2 Sequential Formulations of Return.- C.5 Formalising RL Tasks.- C.5.1 Environment.- C.5.2 Learning Agent.- C.5.3 Agent-environment Interaction.- C.6 Summary.- D Generalisation Problems.- D.1 Why Generalise?.- D.1.1 The Curse of Dimensionality.- D.1.2 The Need for Generalisation.- D.2 Generalisation in RL.- D.2.1 Generalising Over Policies and Value Functions.- D.3 State Aggregation.- D.4 State Space and Generalisation Space.- D.5 Summary.- E Value Estimation Algorithms.- E.1 The Value of State-actions.- E.2 Non-sequential RL: Estimating Reward Functions.- E.2.1 The Value of State-actions in Non-sequential Tasks.- E.3 Estimating Expectations with Sample Averages.- E.3.1 Incremental Updates.- E.3.2 A General Form of Incremental Update.- E.3.3 Setting StepSize in Incremental Updates.- E.3.4 A Prediction Algorithm for Non-sequential RL.- E.4 Sequential RL: Estimating Long-term Value Functions.- E.4.1 Long-term Value Functions.- E.4.2 The Value of State-actions in Sequential Tasks.- E.4.3 The Value of a Policy.- E.4.4 Estimating Values with Monte Carlo Methods.- E.4.5 Estimating Values with Temporal Difference Methods.- E.4.6 Russell and Norvig’s Maze: A Sequential RL Task.- E.4.7 Summary of Sequential RL.- E.5 State Aggregation.- E.5.1 Fixed and Adaptive Aggregation Schemes.- E.5.2 The Value of Aggregations I: Return.- E.5.3 The Value of Aggregations II: Predictive Utility.- E.6 Storing Value Estimates.- E.6.1 Storing Estimates of Aggregations.- E.6.2 Sparse Estimators, Models and Search.- E.6.3 Function Approximators.- E.7 Summary.- F Generalised Policy Iteration Algorithms.- F.1 Policy Improvement.- F.2 Optimal Policies.- F.3 Generalised Policy Iteration.- F.3.1 How Well must we Evaluate a Policy?.- F.3.2 Convergence Properties of GPI Control Algorithms.- F.3.3 Initialising Value Functions.- F.3.4 What Characterises GPI Algorithms?.- F.4 State-value Functions.- F.5 Summary.- G Evolutionary Algorithms.- G.1 Evolution.- G.2 Elements of EAs.- G.2.1 A Generic EA.- G.2.2 Population-based Search.- G.2.3 Fitness Functions.- G.2.4 Probabilistic Selection of Parents.- G.2.5 Genetic Operators.- G.2.6 Replacement.- G.3 EAs as Search.- G.3.1 Local and Global Optima.- G.4 The Generalisation Problem.- G.5 Niching and Mating Restriction.- G.5.1 Fitness Sharing.- G.5.2 Crowding.- G.5.3 Mating Restriction.- G.6 RL with EAs.- G.6.1 Non-associative RL with an EA.- G.6.2 Associative RL with an EA.- G.6.3 Sequential RL with an EA.- G.7 Comparing GPI and EA Methods for RL.- G.7.1 Similarities between GPI and EA Methods.- G.8 Summary.- H The Origins of Sarsa.- H.1 Modified Connectionist Q-learning.- H.2 ZCS’s Implicit Bucket Brigade.- H.3 Who Invented Sarsa?.- I Notation.- References.

Customer Reviews

Most Helpful Customer Reviews

See All Customer Reviews