Universiti Teknologi Malaysia Institutional Repository

Optimizing exploration parameter in dueling deep Q-networks for complex gaming environment

Khan, Muhammad Shehryar (2019) Optimizing exploration parameter in dueling deep Q-networks for complex gaming environment. Masters thesis, Universiti Teknologi Malaysia.

[img]
Preview
PDF
506kB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

Reinforcement Learning is being used to solve various tasks. A Complex Environment is a recent problem at hand for Reinforcement Learning, which employs an Agent who interacts with the surroundings and learns to solve whatever task has to be done. To solve a Complex Environment efficiently using a Reinforcement Learning Agent, a lot of parameters are to be kept in perspective. Every action that the Agent takes has a consequence in the form of a Reward Function. Based on the value of this Reward Function, our Agent develops a Policy to solve the Environment. The Policy is generally developed to maximize the Reward Functions. The Optimal Policy employs an Exploration Strategy which is used by the Agent. Reinforcement Learning Architectures are relying on the Policy and Exploration Strategy of the Agent to solve the Environment efficiently. This research is based upon two parts. Firstly, the optimization of a Deep Reinforcement Learning Architecture “Dueling Deep Q-Network” is conducted by improving its Exploration strategy. It combines a recent and novel Exploration technique, Curiosity Driven Intrinsic Motivation, with the Dueling DQN. The performance of this Curious Dueling DQN is checked by comparing it with the existing Dueling DQN. Secondly, the performance of the Curious Dueling DQN is validated against Noisy Dueling DQN, a combination of Dueling DQN with another recent exploration strategy called Noisy Nets, hence, finding an optimal exploration strategy. The performance of both solutions is evaluated in the environment of Super Mario Bros based on Mean Score and Estimation Loss. The proposed model improves the Mean Score by 3 folds, while the loss is increased by 28%.

Item Type:Thesis (Masters)
Uncontrolled Keywords:Reinforcement Learning, Reward Functions
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:96670
Deposited By: Narimah Nawil
Deposited On:15 Aug 2022 08:30
Last Modified:15 Aug 2022 08:30

Repository Staff Only: item control page