Term
Goal of domain estimation |
|
Definition
To establish and compare subpopulation (i.e. domain) population parameters. |
|
|
Term
When to use domain estimation |
|
Definition
When we want to estimate and compare subpopulation population parameters but have a SRSWOR design that does not make the sample specifically designed to estimate parameters for the domain. |
|
|
Term
T/F Population sizes must be known for domain estimation |
|
Definition
False The population sizes may be unknown. |
|
|
Term
Is sample size of a domain in domain estimation fixed or random? |
|
Definition
random unlike stratification where allocation is fixed Don't know nd until after the datat have been collected. The value of nd changes from sample to sample. |
|
|
Term
T/F The value of nd is the same from sample to sample in domain estimation. |
|
Definition
False The value of nd changes from sample to sample in domain estimation |
|
|
Term
Is the total sample size in domain estimation fixed or random? |
|
Definition
|
|
Term
Ud = what in domain estimation? |
|
Definition
index set for population domain |
|
|
Term
Ad = what in domain estimation? |
|
Definition
index set for sample domain |
|
|
Term
look at population parameter formulas for domain estimation on slide nine of week 11 |
|
Definition
|
|
Term
T/F Domain estimation is a good estimation when Nd is known. |
|
Definition
False
If Nd were known, then we would want to use SYS instead. |
|
|
Term
Variables u and x in domain estimation |
|
Definition
Numerator variable u = data value for domain, and 0 if otherwise (slide 12 week 11) yi if i in Ud, 0 if otherwise
Denominator variable x = indicator of domain membership. 1 if i in Ud, 0 if otherwise |
|
|
Term
*Study all formulas on formula sheet! Know what is what!* |
|
Definition
|
|
Term
What are the null and alternative hypotheses when testing whether two domain population means are equal? |
|
Definition
H0: ybarU1 = ybarU2
H1: ybarU1 = ybarU2
Equivalently:
H0: ybarU1 - ybarU2 = 0
H1: ybarU1 - ybarU2 = 0 |
|
|
Term
Formula for z-test statistic: (need to know???) |
|
Definition
z = (ybar1 - ybar2)/sqrt(V(ybar1)+V(ybar2))
Reject H0 if abs(z) > zalpha/2 |
|
|
Term
What is the formula for calculating a confidence interval? |
|
Definition
|
|
Term
Impact of nonresponse (2) |
|
Definition
Potential bias
Loss of precision |
|
|
Term
Strategies to reduce nonresponse (NR) |
|
Definition
Design phase.
After data collection: call-backs, post-stratification, impute |
|
|
Term
When using post data collection strategies (3) to reduce nonresponse, what types of nonresponse are each of the three usually used to fix? |
|
Definition
Call-backs fix both unit and item non-response.
Post-stratification - unit non-response primarily.
Imputation - item non-response primarily. |
|
|
Term
|
Definition
|
|
Term
Formula for response rate |
|
Definition
nR/n
where nR = realized sample size
durb |
|
|
Term
Nonresponse framework and population parameters |
|
Definition
|
|
Term
Nonresponse sample framework. Graphic of N, M, R, NH, NR, nM, nR, n |
|
Definition
|
|
Term
|
Definition
Occurs when differences exist between the population mean of y for the nonresponding subpopulation ybarMU and the population mean of y for the responding subpopulation ybarRU |
|
|
Term
What does the magnitude of nonresponse bias depend on? |
|
Definition
Differences between population means
Nonresponse rate |
|
|
Term
How does nonresponse reduce precision and how can you remedy this? |
|
Definition
Sample size reductions due to NR affect precision by increasing variances.
Remedy by anticipating and designing for NR sample size attrition
Method: divide the target sample size desired (n) by the guessed proportion of respondents (NR/N: R). Formula: n/R |
|
|
Term
What is the best strategy for addressing nonresponse bias? |
|
Definition
Design survey to prevent NR |
|
|
Term
Using data from call-backs of NR cases to adjust for bias - steps in process. |
|
Definition
Select a sample from the nonrespondents to the survey.
Collect data from contacted nonrespondents.
Use these data to estimate population mean for nonrespondents ybarMU.
Estimate populaiton mean for whole population ybarU with a weighted combination of respondent sample mean and nonrespondent sample mean. |
|
|
Term
Is the estimator of population mean using the callback method to deal with nonresponse biased or unbiased? |
|
Definition
|
|
Term
Post-stratification as a rememdy for nonresponse - steps |
|
Definition
Divide population into H mutually exclusive and exhaustive post-strata.
For each post-stratum: Know post-stratum sizes Nh, estimate characteristics of post-strata, use post-stratum sample mean to estimate post-stratum population mean, pool post-stratum estimates using, for example, a weighted mean of the post-stratum estimates.
|
|
|
Term
What is the formula for the sampling weight in post-stratification design? |
|
Definition
|
|
Term
Assumptions in post-stratification adjustment. |
|
Definition
Distribution of y is approximately equal for responding portion of post-stratum population and nonresponding portion of post-stratum population. |
|
|
Term
What does imputation as a strategy for dealing with nonresponse do? |
|
Definition
A statistical method for "filling in" or "predicting" missing values.
Impute values so that they represent the distribution of the response variable with missing data (y).
Impute values using a method that supports estimation of the variance associated with the random components of the imputation process. |
|
|
Term
|
Definition
Deductive imputation
Call mean imputation
hot-deck imputation (random)
regression imputation
multiple imputation |
|
|
Term
|
Definition
common method, rarely implementable
Use a deterministic rule to assign a value (e.g. crime victim: no = violent crime victim: no)
There must be sufficien nformation to identify the missing value with a high degree of certainty.
Relatively uncommon, especially with use of computer-assisted survey instruments when checks for these realtionships are embedded inteh computer-based questionnaire. |
|
|
Term
|
Definition
Avoid: leads to incorrect distribution of y in dataset.
Divide responding units in to imputation classes.
With a given imputation class: calculate the average value for available item data in class, fill in missing value for nonresponding unit with average value.
Retains mean estimate for an imputation class. Underestimates variance within an imputation class, which misrepresents distribution of y.
|
|
|
Term
|
Definition
Most common and generally applicable.
May apply within groups of respondents (auxilliary info).
Divide responding units in to imputation classes. Within a given imputation class: randomly select a donor from responding units in class, filling in missing value for nonresponding unit with value from donor unit.
Retains variation in individual values, can impute from many variables from same donor, variations exist |
|
|
Term
|
Definition
Uses model to incorporate auxiliary information, between hot-deck and cell mean imputation methods.
Use a regression model to relate covariate(s) to variable with missing data.
Estimate regression parameters with data from responding units, fill in missing value with predicted value.
Useful if a strong relationship exists that provides a better predicted value for the missing data, form of (conditional) mean imputation, requires separate model for each variable with missing data. |
|
|
Term
|
Definition
Accounting for variation due to imputation process.
Decide on an imputation model, impute m>1 values for each missing data item, result is m (different) data sets with no missing values.
Variation in estimates across data sets provides an estimate of the variability associated with the imputation process, analysis is more complex. |
|
|
Term
Cluster sample definition |
|
Definition
A cluster sample is a probability sample in which a sampling unit is a cluster.
We will no longer assume SU = element |
|
|
Term
Steps in 1-stage cluster sampling |
|
Definition
Divide the population (of K elements) into N total clusters.
Take a sample of n clusters.
|
|
|
Term
Comparing 1-stage CS and STS |
|
Definition
1-stage CS: A block of cells is a cluster, SU is a cluster, don't sample from every cluster.
STS: A block of cells is a stratum, SU is an element, sample from every stratum. |
|
|
Term
Why use cluster sampling? |
|
Definition
May not have a list of elements for a frame, but a list of clusters may be available.
May be cheaper to conduct the study if elements are clustered. |
|
|
Term
Reasons that cluster sampling usually leads to less precise estimates. |
|
Definition
Elements within clusters tend to be correlated due to exposure to similar conditions.
We get less information than if we observe the same number of unrelated elements. |
|
|
Term
Ways to define clusters for improved precision. |
|
Definition
Define clusters for which within-cluster variation is high (rarely possible).
Define clusters that are relatively small. |
|
|
Term
Notation for cluster sampling (i, j, N, n, Mi, K) |
|
Definition
i = index for cluster i
i, j = index for element j in cluster i
N = total clusters in population
n = sampled clusters
Mi = elements in a cluster
K = number of elements in population (sum Mi) |
|
|
Term
Weight in CSE1 and is it self-weighting? |
|
Definition
N/n
Yes, it is self-weighting. |
|
|
Term
What is the weight formula for CSE2 and is it self weighting? |
|
Definition
(N/n)*(Mi/mi)
It is not always self-weighting. |
|
|
Term
Cluster popultion mean and within-cluster variance formulas (not on sheet). |
|
Definition
ybariU = tiU/Mi
Si2 = 1/(Mi - 1)*sum[(yij - ybariU)2] |
|
|
Term
What is the weight formula for CSU1 and is it self-weighting? |
|
Definition
Qi/nψi = QiK/nMi
Not always self weighting (??) |
|
|
Term
What is the weight formula for CSU2 and is it self-weighting? |
|
Definition
k/nmi
Not always self weighting depending on mi (??) |
|
|
Term
An element data set (cluster design) will have columns for at least what variables? |
|
Definition
Cluster id (i)
Element id within cluster (j)
Variable (yij) |
|
|
Term
A cluster data set will have collumns for at least what variables? |
|
Definition
Cluster id (i)
Cluster total under 1-stage CS (tiU)
Cluster mean under 1-stage CS (ybariU)
Within-cluster variance under 1-stage CS (si2) |
|
|
Term
Biased (ratio) estimation for CSE1 |
|
Definition
Usually ti (cluster total) is positively correlated with Mi (cluster size)
No intercept
Notation of chapter 3 versus notation of chapter 5 ratio: yi (variable of interest) = ti (cluster total), xi (auxiliary info) = Mi (cluster size) |
|
|
Term
What is MbarU in cluster sampling? |
|
Definition
The average cluster size for population
If unknown, can estimate with sample mean of cluster sizes Mbars = 1/n*sum(Mi) |
|
|
Term
2-stage cluster sampling with equal selection probabilities (CSE2) overview |
|
Definition
Stage 1: Select clusters. SRSWOR of n PSUs from population of N PSUs.
Stage 2: Select elements within each sampled cluster. SRSWOR of mi SSUs from Mi elements in PSU i sampled in stage 1.
First stage sampling unit is a primary sampling unit (PSU) = cluster.
Second stage sampling unit is a secondary sampling unit (SSU) = element
Only collect data on the SSUs that were sampled from the cluster. |
|
|
Term
Motivation for 2-stage cluster samples (instead of just 1-stage) |
|
Definition
Likely that elements in cluster will be correlated.
-May be inefficient to observe all elements in a sample PSU and the extra effort required to fully enumerate a PSU does not provide that much extra information.
May be better to spend resources to sample many PSUs and a small number of SSUs per PSU. (Possible opposing force: study costs associated to going to many clusters) |
|
|
Term
The variance of thatunb has 2 components associated with the 2 sampling stages, what are these components? |
|
Definition
1. Variation among PSUs
2. Variation amonog SSUs within PSUs
[image] |
|
|
Term
T/F Equal probability at stage 1 plus equal probablity in stage 2 given PSU i in 2-stage cluster sampling implies equal inclusion probablity for an element. |
|
Definition
False
It does NOT imply equal inclusion probability for an element (unconditional probability for element)
slide 65 of week 13 |
|
|
Term
When to use unbiased estimation versus ration estimation in CSE2? |
|
Definition
Unbiased estimation - Use if you know K or N
e.g. N= total number of clutches or K = total number of eggs in Minnedosa, Manitoba
Ratio estimation - Only requires knowledge of Mi (e.g. number of eggs in clutch i), in addition to data collected |
|
|
Term
When will an unbiased estimator have poor precision in CSE2? |
|
Definition
When cluster sizes (Mi) are unequal
ti (cluster total) is roughly proportional to Mi (cluster size) |
|
|
Term
When will ration estimation (biased) be precise in CSE2? |
|
Definition
When ti is roughly proportional to Mi (bigger cluster = larger ti)
This happens frequently in pops where cluster sizes (Mi) vary |
|
|
Term
Inclusion probabilities for an element under 2-stage cluster sampling using SRSWOR at each stage (CSE2) |
|
Definition
πi = P{cluster i in sample} = n/N
πj|i = Pr{element j GIVEN cluster i in sample} = mi/Mi
πij = Pr{element j AND cluster i in sample} = πiπj|i = (n/N)x(mi/Mi) = nmi/NMi |
|
|
Term
CSE2 Self-weighting design |
|
Definition
Stage 1: Select n PSUs from N PSUs in pop using SRS
Stage 2: Choose mi proportional to Mi so that mi/Mi is constant, use SRS to select sample - if this is achieved, then it is self-weighting
Sample weight for SSU j in cluster i is constant for all elements. Weight may vary slightly in practice, however, because it may not be possible for mi/Mi to be equal to 1/c for all clusters. |
|
|
Term
Why are self-weighting samples appealing? What is the caveat for variance estimation in self-weighting samples? |
|
Definition
They are appealing because you can use the simple mean estimator.
The caveat for variance estimation is that there is not break on variance of estimator - must use proper variance estimation formula for sample design. |
|
|
Term
|
Definition
SRS
SYS
STS with proportional allocation
CSE1
CSE2 with mi proportional to Mi or c = Mi/mi |
|
|
Term
Why is there no ratio estimator for CSU designs? |
|
Definition
There is no ratio estimator because Mi has already been incorporated in the first stage |
|
|
Term
Why use unequal probability cluster samples? |
|
Definition
Use unequal selection probabilities to sample clusters to save costs and improve precision for a given budget. |
|
|
Term
How do you select clusters and elements in CSU2 design? |
|
Definition
Select cluster with PPSWR (stage 1)
Select elements with SRSWOR (stage 2) |
|
|
Term
What is the size or importance measure xi in CSU design? |
|
Definition
Size or importance measure xi is Mi = number of elements or SSUs in PSU i |
|
|
Term
Selection probability for PSU i in CSU1 |
|
Definition
|
|
Term
Is thatψ a biased or unbiased estimator of t? |
|
Definition
Unbiased
Variance estimator is also unbiased.
Also holds for population mean. |
|
|
Term
T/F In CSU2 the variance estimator captures both between and within cluster variance. |
|
Definition
True
Because we use WR sampling, variance estimator captures both between and within cluster variance.
This holds for the estimator for the population mean. |
|
|
Term
|
Definition
Yes, if mi is constant across clusters |
|
|