# General distance formula estimation of population total for unequal probability sampling designs with auxiliary variables

Ibrahim, Ibrahim Elabid (2021) General distance formula estimation of population total for unequal probability sampling designs with auxiliary variables. PhD thesis, Universiti Teknologi Malaysia.

 737kB

## Abstract

Sampling is a process or technique to obtain statistical information about a finite population by selecting a representative sample from that population, by using an appropriate sampling design. Furthermore, in the process, the required information about the units in the sample is measured and the inference about the unknown population parameters such as means, totals and proportions are done. This study is focused on estimating an unknown population total for one target variable using single or multiple auxiliary variables correlated with the target variable. This study also explores two classical estimators, namely the ratio estimator and the linear regression estimator, which are used as an alternative to the Horvitz Thompson estimator in the presence of a single auxiliary variable to estimate an unknown population total. The theoretical and empirical aspects were used to compare between these two estimators. The comparison was carried out based on the sample size and the correlation coefficient between the target variable and the auxiliary variable. The empirical study using the secondary data set for small and medium sample sizes shows that the linear regression estimator is more efficient compared to the ratio estimator when the correlation coefficient of the two variables is positive. For a large sample sizes, there are no significant differences between the two estimators. Also, the variance of both estimators decreases when the sample size increases. In contrast, if the correlation coefficient is negative, then any increase in the sample size leads to significant decrease in the variance estimate of the linear regression estimator. Meanwhile, for the ratio estimator, as the sample size is considerably increased, the variance of the estimator decreases. The simulation study showed that when the variable of interest has a strong negative correlation with the auxiliary variable irrespective of the sample size, the linear regression estimator provides an efficient estimate for the unknown population total relative to the ratio estimator. While, if the correlation coefficient between the variable of interest and the auxiliary variable is positive and within the range [0.75, 1], then the two estimators give a better estimate for the population total compared to the conventional estimators. However, the estimate of the total population obtained by the linear regression estimator is slightly more efficient than the ratio estimator. The most important idea in the estimation by using minimum distance measures is the quantification of the degree of closeness between the two data sets, such as sample data and the parametric distribution depends on an unknown parameter. A general distance formula is suggested in this research, based on the concept of the power divergence function, rather than that used by Deville and Särndal to measure the degree of closeness between the calibrated weights (new weights) and the classical design weights in Horvitz Thompson estimator. Derivation of the proposed general distance formula involved adding another constraint to the calibration equation constraints with respect to the sum of the classical sample design weights and the sum of sample calibrated weights. In order to generate a variety of distance measurements, the proposed formula was used to obtain a set of new weights that could be used to construct new estimators based on the inverse functions created by the proposed formula for estimating the total unknown population. Finally, the problems associated with calibrated weights produced by some distance measures, such as unrealistic or extreme weights are examined, leading to inaccurate estimates when these weights were handled instead of the design weights.

Item Type: Thesis (PhD) sampling, sampling design, ratio estimator Q Science > QA Mathematics Science 101825 Narimah Nawil 10 Jul 2023 09:32 10 Jul 2023 09:32

Repository Staff Only: item control page