Universiti Teknologi Malaysia Institutional Repository

Goal-seeking navigation based on multi-agent reinforcement learning approach

Abdul Jalil, Abdul Muizz (2022) Goal-seeking navigation based on multi-agent reinforcement learning approach. Masters thesis, Universiti Teknologi Malaysia, Faculty of Engineering - School of Electrical Engineering.

[img]
Preview
PDF
425kB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

Mobile robotics has been applied in many fields of industry and has been an impact on many industries. Most modern industries depend on mobile robots ranging from indoor to outdoor applications such as robot vacuum to robot delivery. The most important aspect of mobile robots is the navigation algorithms that allow the robot to move through certain terrain to reach the desired state. Many of the algorithms that contributed the most are the SLAM (simultaneous localisation and mapping) and the path planning algorithm. SLAM is mostly used to estimate the feedback states such as localization and perspective map, whereas path planning is mainly a planner from state estimation. The RL (reinforcement learning) researcher has been studying the RL in robotics navigation focusing on areas such as motion planning, and perception estimation. There have been breakthroughs in the decades and these have been closing the gap between RL and control systems since they are the same but initially develop from different areas and directions but recently, much of it is converged to develop in the same field. Although the approached problems are what makes it different, it is still the same problems. This study explores implementing the multi-agent in DRL (deep RL), specifically to train a single policy in a multi-agent environment since robotics simulation can run many models and therefore, it is unnecessary to run many simulators to train the policy in a batch. Since RL is semi-supervised learning through reward signal, similar to the cost function in a control system where the policy will try to maximize the return of the expected reward of trajectory. The scope of this study mainly covers indoor navigation and motion planning. The toolkits to perform the study are stable-baselines3, Gazebo simulator, and OpenAI Gym. The robot used in the simulation for this study is the Turtlebot3 burger since it does not require a stability controller and it has the least number of velocity commands. The Turtlebot3 burger sensors are odometry, IMU (inertial measurement unit), and a laser that serves as feedback observation states. The UKF (unscented Kalman filter) is used as state estimation and to utilize any feedback states. Although the desired states are not a part of the observation states, this is not the case for DRL. Given the nature of the robotics simulation, is possible to run a single simulator but still be able to train the policy in batches. Since there is no RL environment specifically to conduct this study, the multiagent environment was implemented to meet the study objectives. The policy network was constructed using LSTM (long short-term memory) and MLP (multilayer perceptron) served as feature extraction and decoder. The integration between the DRL algorithms with ROS (Robot Operating System) is able to train and communicate between these various connections but was not able to achieve similar results as position and obstacle avoidance because of numerous reasons. Due ROS being peer-to-peer system is possible to use other DRL library such as RLlib (Industry-Grade RL) with ROS as future work as RLlib is support for distribution training for DRL.

Item Type:Thesis (Masters)
Uncontrolled Keywords:SLAM (simultaneous localisation and mapping), RL (reinforcement learning)
Subjects:T Technology > TK Electrical engineering. Electronics Nuclear engineering
Divisions:Faculty of Engineering - School of Electrical
ID Code:99333
Deposited By: Yanti Mohd Shah
Deposited On:22 Feb 2023 08:24
Last Modified:22 Feb 2023 08:24

Repository Staff Only: item control page