Reinforcement learning in a Multi-agent Framework for Pedestrian Simulation
MetadataShow full item record
This thesis proposes a new approach to pedestrian simulation based on machine learning techniques. Specifically, the work proposes the use of reinforcement learning techniques to build a decision-making modulefor pedestrian navigation. The thesis presents a multi-agent framework in which each agent is an embodied 3D agent calibrated with human features. The virtual worls is also a 3D world in which objects such as walls or doors are placed. The agents perceive their local neighborhood (objects and the rest of agents) and learn to move in this virtual world towards a place inside the environment. The thesis studies different algorithmic approaches based on reinforcement learning and analyzes the results in different scenarios. These scenarios are classic studied situations in the field of pedestrian modelling and simulation (bottlenecks, crossings inside a narrow corridor,...). The results show that the approach is capable of solving successfully the navigational problems. Besides emergent collective behaviors appear such as arch-like grouping around an exit in the bottleneck problem or lanes formation in the crossing inside a corridor scenario. The work opens a new research line in the pedestrian simulation studies which offers advantages as: - The behavioral design is in charge of the learning process and it is not coded by humans. - The agents learn independently different behaviors attending to thheir personal experiencies and interactions with the 3D world. - The learned decision-making module is computationally efficient (because the learned behavior is stored in form of a table or a linear function approximator). The approach has also limitations: - The learned behaviors can not be edited directly, making non trivial the task of implementing authoring tools. - The quality of the learned behaviors is not homogeneous. There are agents that learn very well their task but others do not. -The learned process is not controllable in terms of when and whatis learned in each moment.