Universiti Teknologi Malaysia Institutional Repository

Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters

Tao, Hai and Jawad, Ali H. and Shather, A. H. and Al-Khafaji, Zainab and A. Rashid, Tarik and Ali, Mumtaz and Al-Ansari, Nadhir and Marhoon, Haydar Abdulameer and Shahid, Shamsuddin and Yaseen, Zaher Mundher (2023) Machine learning algorithms for high-resolution prediction of spatiotemporal distribution of air pollution from meteorological and soil parameters. Environment International, 175 (NA). pp. 1-17. ISSN 0160-4120

[img] PDF
978kB

Official URL: http://dx.doi.org/10.1016/j.envint.2023.107931

Abstract

This study uses machine learning (ML) models for a high-resolution prediction (0.1°×0.1°) of air fine particular matter (PM2.5) concentration, the most harmful to human health, from meteorological and soil data. Iraq was considered the study area to implement the method. Different lags and the changing patterns of four European Reanalysis (ERA5) meteorological variables, rainfall, mean temperature, wind speed and relative humidity, and one soil parameter, the soil moisture, were used to select the suitable set of predictors using a non-greedy algorithm known as simulated annealing (SA). The selected predictors were used to simulate the temporal and spatial variability of air PM2.5 concentration over Iraq during the early summer (May-July), the most polluted months, using three advanced ML models, extremely randomized trees (ERT), stochastic gradient descent backpropagation (SGD-BP) and long short-term memory (LSTM) integrated with Bayesian optimizer. The spatial distribution of the annual average PM2.5 revealed the population of the whole of Iraq is exposed to a pollution level above the standard limit. The changes in temperature and soil moisture and the mean wind speed and humidity of the month before the early summer can predict the temporal and spatial variability of PM2.5 over Iraq during May-July. Results revealed the higher performance of LSTM with normalized root-mean-square error and Kling-Gupta efficiency of 13.4% and 0.89, compared to 16.02% and 0.81 for SDG-BP and 17.9% and 0.74 for ERT. The LSTM could also reconstruct the observed spatial distribution of PM2.5 with MapCurve and Cramer's V values of 0.95 and 0.91, compared to 0.9 and 0.86 for SGD-BP and 0.83 and 0.76 for ERT. The study provided a methodology for forecasting spatial variability of PM2.5 concentration at high resolution during the peak pollution months from freely available data, which can be replicated in other regions for generating high-resolution PM2.5 forecasting maps.

Item Type:Article
Uncontrolled Keywords:Air quality prediction, Arid climate, Machine learning, PM2.5 concentration, Simulated annealing
Subjects:T Technology > TA Engineering (General). Civil engineering (General)
Divisions:Civil Engineering
ID Code:106853
Deposited By: Widya Wahid
Deposited On:04 Aug 2024 06:59
Last Modified:04 Aug 2024 06:59

Repository Staff Only: item control page