1 | Should Machines Learn How to Play Games? | 1 |
1.1 | Motivation | 1 |
1.2 | Limitations of search | 4 |
1.3 | Merits of learning | 8 |
1.4 | Bon Voyage! | 10 |
2 | Machine Learning in Games: A Survey | 11 |
2.1 | Samuel's Legacy | 11 |
2.1.1 | Machine Learning | 13 |
2.1.2 | Game Playing | 13 |
2.1.3 | Chapter overview | 14 |
2.2 | Book Learning | 15 |
2.2.1 | Learning to choose opening variations | 15 |
2.2.2 | Learning to extend the opening book | 16 |
2.2.3 | Learning from mistakes | 17 |
2.2.4 | Learning from simulation | 19 |
2.3 | Learning Search Control | 19 |
2.4 | Evaluation Function Tuning | 21 |
2.4.1 | Supervised learning | 22 |
2.4.2 | Comparison training | 24 |
2.4.3 | Reinforcement learning | 28 |
2.4.4 | Temporal-difference learning | 30 |
2.4.5 | Issues for evaluation function learning | 33 |
2.5 | Learning Patterns and Plans | 41 |
2.5.1 | Advice-taking | 42 |
2.5.2 | Cognitive models | 43 |
2.5.3 | Pattern-based learning systems | 46 |
2.5.4 | Explanation-based learning | 48 |
2.5.5 | Pattern induction | 50 |
2.5.6 | Learning playing strategies | 53 |
2.6 | Opponent Modeling | 55 |
2.7 | Conclusion | 58 |
3 | Human Learning in Game Playing | 61 |
3.1 | Introduction | 61 |
3.2 | Research on memory and perception | 63 |
3.2.1 | Memory | 64 |
3.2.2 | Perception | 67 |
3.2.3 | Modeling experts' perception and memory: The chunking and template theories | 68 |
3.3 | Research on problem solving | 70 |
3.3.1 | De Groot's results | 71 |
3.3.2 | Theories and computer models of problem solving | 72 |
3.4 | Empirical studies of learning | 75 |
3.4.1 | Short-range learning | 76 |
3.4.2 | Medium-range learning | 77 |
3.4.3 | Long-range learning | 78 |
3.5 | Human and machine learning | 79 |
3.5.1 | How has human learning informed machine learning? | 79 |
3.5.2 | What does machine learning tell us about human learning? | 79 |
3.6 | Conclusions | 80 |
4 | Toward Opening Book Learning | 81 |
4.1 | Introduction | 81 |
4.2 | Basic Requirements | 82 |
4.3 | Choosing Book Moves | 83 |
4.4 | Book Extension | 84 |
4.5 | Implementation Aspects | 86 |
4.6 | Discussion and Enhancements | 87 |
4.7 | Outlook | 88 |
5 | Reinforcement Learning and Chess | 91 |
5.1 | Introduction | 91 |
5.2 | KnightCap | 94 |
5.2.1 | Board representation | 94 |
5.2.2 | Search algorithm | 95 |
5.2.3 | Null moves | 95 |
5.2.4 | Search extensions | 96 |
5.2.5 | Asymmetries | 96 |
5.2.6 | Transposition Tables | 96 |
5.2.7 | Move ordering | 97 |
5.2.8 | Parallel search | 97 |
5.2.9 | Evaluation function | 97 |
5.2.10 | Modification for TDLeaf([lambda]) | 98 |
5.3 | The TD([lambda]) algorithm applied to games | 100 |
5.4 | Minimax Search and TD([lambda]) | 102 |
5.5 | TDLeaf([lambda]) and Chess | 105 |
5.5.1 | Experiments with KnightCap | 105 |
5.5.2 | Discussion | 111 |
5.6 | Experiment with Backgammon | 113 |
5.6.1 | LGammon | 113 |
5.6.2 | Experiment with LGammon | 114 |
5.7 | Future Work | 115 |
5.8 | Conclusion | 115 |
6 | Comparison Training of Chess Evaluation Functions | 117 |
6.1 | Introduction | 117 |
6.2 | Comparison Training for Arbitrary-Depth Searches | 119 |
6.3 | Tuning the SCP evaluation function | 120 |
6.3.1 | Experimental details | 121 |
6.3.2 | Simple 1-ply training | 122 |
6.3.3 | Training with 1-ply search plus expansions | 123 |
6.4 | Tuning Deep Blue's evaluation function | 125 |
6.4.1 | Modified training algorithm | 126 |
6.4.2 | Effect on the Kasparov-Deep Blue rematch | 127 |
6.5 | Discussion | 129 |
7 | Feature Construction for Game Playing | 131 |
7.1 | Introduction | 131 |
7.2 | Evaluation Functions | 132 |
7.3 | Feature Overlap | 133 |
7.4 | Constructing Overlapping Features | 134 |
7.4.1 | Parameter tuning | 134 |
7.4.2 | Higher order expansion | 136 |
7.4.3 | Quasi-random methods | 138 |
7.4.4 | Knowledge derivation | 138 |
7.5 | Directions for Constructing Overlapping Features | 139 |
7.5.1 | Layered learning | 139 |
7.5.2 | Compression | 142 |
7.5.3 | Teachable systems | 143 |
7.6 | Discussion | 151 |
8 | Learning to Play Expertly: A Tutorial on Hoyle | 153 |
8.1 | Introduction | 153 |
8.2 | A Game-Playing Vocabulary | 154 |
8.3 | Underlying Principles | 156 |
8.3.1 | Useful knowledge | 157 |
8.3.2 | The Advisors | 159 |
8.3.3 | The architecture | 160 |
8.3.4 | Weight learning for voting | 164 |
8.4 | Perceptual Enhancement | 166 |
8.4.1 | Patterns | 167 |
8.4.2 | Zones | 168 |
8.5 | An Empirical Framework | 172 |
8.6 | Results | 173 |
8.7 | Conclusion: Why Hoyle Works | 176 |
9 | Acquisition of Go Knowledge from Game Records | 179 |
9.1 | Introduction | 179 |
9.1.1 | Purpose | 179 |
9.1.2 | Classification of Go Knowledge | 180 |
9.1.3 | Two Approaches | 181 |
9.2 | Rules Of Go | 181 |
9.3 | A Deductive Approach | 183 |
9.3.1 | System | 183 |
9.3.2 | Rule Acquisition | 186 |
9.4 | An Evolutionary Approach | 190 |
9.4.1 | Algorithm | 191 |
9.4.2 | Application to Tsume-Go | 196 |
9.5 | Conclusions | 202 |
10 | Honte, a Go-Playing Program Using Neural Nets | 205 |
10.1 | Introduction | 205 |
10.1.1 | Rules | 206 |
10.1.2 | Strength of programs | 207 |
10.2 | General Approach in Honte | 210 |
10.3 | Joseki Library | 212 |
10.4 | Shape Evaluating Neural Net | 212 |
10.5 | Alpha-Beta Search | 215 |
10.6 | Influence | 218 |
10.7 | Neural Nets Evaluating Safety and Territory | 218 |
10.8 | Evaluation of Honte | 220 |
10.9 | Conclusions | 223 |
11 | Learning to Play Strong Poker | 225 |
11.1 | Introduction | 225 |
11.2 | Texas Hold'em | 228 |
11.3 | Requirements for a World-Class Poker Player | 229 |
11.4 | Loki's Architecture | 231 |
11.5 | Implicit Learning | 233 |
11.6 | Explicit Learning | 236 |
11.7 | Experiments | 238 |
11.8 | Ongoing Research | 240 |
11.9 | Conclusions | 242 |
| Bibliography | 243 |
| Contributors | 269 |
| Person Index | 275 |
| Subject Index | 281 |