Poster A18 in Poster Session A - Tuesday, August 6, 2024, 4:15 – 6:15 pm, Johnson Ice Rink

Limitations in planning abilities in AlphaZero

Daisy Lin1 (), Brenden Lake1, Wei Ji Ma1; 1New York University

AlphaZero, a deep reinforcement learning algorithm, has achieved superhuman performance in complex games like Chess and Go. However, its strategic planning ability beyond winning games remains unclear. We investigated this using 4-in-a-row, a game used to study human planning. We analyzed AlphaZero's feature learning and puzzle-solving abilities. Despite strong gameplay, AlphaZero exhibited a 45% failure rate in puzzles. Feature analysis revealed limitations in its learned knowledge during self-play. We incorporated human-inspired features into its policy and value outputs, leading to a 13% improvement in puzzle-solving accuracy. Our findings highlight the potential for human insights to enhance AI strategic planning beyond self-play.

Keywords: Explainable AI Deep Reinforcement Learning Human-inspired AI Planning 

