Shared Flashcard Set

Details

Stat 421
Final Exam
73
Mathematics
Graduate
04/20/2012

Additional Mathematics Flashcards

 


 

Cards

Term
Goal of domain estimation
Definition
To establish and compare subpopulation (i.e. domain) population parameters.
Term
When to use domain estimation
Definition
When we want to estimate and compare subpopulation population parameters but have a SRSWOR design that does not make the sample specifically designed to estimate parameters for the domain.
Term
T/F Population sizes must be known for domain estimation
Definition
False The population sizes may be unknown.
Term
Is sample size of a domain in domain estimation fixed or random?
Definition
random unlike stratification where allocation is fixed Don't know nd until after the datat have been collected. The value of nd changes from sample to sample.
Term
T/F The value of nd is the same from sample to sample in domain estimation.
Definition
False The value of nd changes from sample to sample in domain estimation
Term
Is the total sample size in domain estimation fixed or random?
Definition
fixed
Term
Ud = what in domain estimation?
Definition
index set for population domain
Term
A = what in domain estimation?
Definition
index set for sample domain
Term
look at population parameter formulas for domain estimation on slide nine of week 11
Definition
Term
T/F Domain estimation is a good estimation when Nd is known.
Definition

False

If Nd were known, then we would want to use SYS instead.

Term
Variables u and x in domain estimation
Definition

Numerator variable u = data value for domain, and 0 if otherwise (slide 12 week 11) yi if i in Ud, 0 if otherwise

 

Denominator variable x = indicator of domain membership.  1 if i in Ud, 0 if otherwise

Term
*Study all formulas on formula sheet! Know what is what!*
Definition
Term
What are the null and alternative hypotheses when testing whether two domain population means are equal?
Definition

H0: ybarU1 = ybarU2

H1: ybarU1 = ybarU2

 

Equivalently:

H0: ybarU1 - ybarU2 = 0

H1: ybarU1 - ybarU2 = 0

Term
Formula for z-test statistic:  (need to know???)
Definition

z = (ybar1 - ybar2)/sqrt(V(ybar1)+V(ybar2))

 

Reject H0 if abs(z) > zalpha/2

Term
What is the formula for calculating a confidence interval?
Definition
Θ ± zalpha/2SE(Θ)
Term
Impact of nonresponse (2)
Definition

Potential bias

Loss of precision

Term
Strategies to reduce nonresponse (NR)
Definition

Design phase.

After data collection: call-backs, post-stratification, impute

Term
When using post data collection strategies (3) to reduce nonresponse, what types of nonresponse are each of the three usually used to fix?
Definition

Call-backs fix both unit and item non-response.

Post-stratification - unit non-response primarily.

Imputation - item non-response primarily.

Term
Two types of nonresponse
Definition

Unit 

Item

Term
Formula for response rate
Definition

nR/n

 where nR = realized sample size

durb

Term
Nonresponse framework and population parameters
Definition
[image]
Term
Nonresponse sample framework.  Graphic of N, M, R, NH, NR, nM, nR, n
Definition
[image]
Term
Nonresponse bias
Definition
Occurs when differences exist between the population mean of y for the nonresponding subpopulation ybarMU and the population mean of y for the responding subpopulation ybarRU
Term
What does the magnitude of nonresponse bias depend on?
Definition

Differences between population means

Nonresponse rate

Term
How does nonresponse reduce precision and how can you remedy this?
Definition

Sample size reductions due to NR affect precision by increasing variances.

 

Remedy by anticipating and designing for NR sample size attrition

 

Method:  divide the target sample size desired (n) by the guessed proportion of respondents (NR/N: R).  Formula: n/R

Term
What is the best strategy for addressing nonresponse bias?
Definition
Design survey to prevent NR
Term
Using data from call-backs of NR cases to adjust for bias - steps in process.
Definition

Select a sample from the nonrespondents to the survey.

Collect data from contacted nonrespondents.

Use these data to estimate population mean for nonrespondents ybarMU.

Estimate populaiton mean for whole population ybarU with a weighted combination of respondent sample mean and nonrespondent sample mean. 

Term
Is the estimator of population mean using the callback method to deal with nonresponse biased or unbiased?
Definition
Unbiased
Term
Post-stratification as a rememdy for nonresponse - steps
Definition

Divide population into H mutually exclusive and exhaustive post-strata.

For each post-stratum: Know post-stratum sizes Nh, estimate characteristics of post-strata, use post-stratum sample mean to estimate post-stratum population mean, pool post-stratum estimates using, for example, a weighted mean of the post-stratum estimates.

 

Term
What is the formula for the sampling weight in post-stratification design?
Definition
whj = Nh/nhr
Term
Assumptions in post-stratification adjustment.
Definition
Distribution of y is approximately equal for responding portion of post-stratum population and nonresponding portion of post-stratum population.
Term
What does imputation as a strategy for dealing with nonresponse do?
Definition

A statistical method for "filling in" or "predicting" missing values.

Impute values so that they represent the distribution of the response variable with missing data (y).

Impute values using a method that supports estimation of the variance associated with the random components of the imputation process.

Term
Imputation methods (5)
Definition

Deductive imputation

Call mean imputation

hot-deck imputation (random)

regression imputation

multiple imputation

Term
Deductive Imputation
Definition

common method, rarely implementable

Use a deterministic rule to assign a value (e.g. crime victim: no = violent crime victim: no)

 

There must be sufficien nformation to identify the missing value with a high degree of certainty.

Relatively uncommon, especially with use of computer-assisted survey instruments when checks for these realtionships are embedded inteh computer-based questionnaire.

Term
Cell mean imputation
Definition

Avoid: leads to incorrect distribution of y in dataset.

Divide responding units in to imputation classes.

With a given imputation class: calculate the average value for available item data in class, fill in missing value for nonresponding unit with average value.

 

Retains mean estimate for an imputation class. Underestimates variance within an imputation class, which misrepresents distribution of y.

 

Term
Hot-deck imputation
Definition

Most common and generally applicable.

May apply within groups of respondents (auxilliary info).

Divide responding units in to imputation classes.  Within a given imputation class: randomly select a donor from responding units in class, filling in missing value for nonresponding unit with value from donor unit.

 

Retains variation  in individual values, can impute from many variables from same donor, variations exist

Term
Regression imputation 
Definition

Uses model to incorporate auxiliary information, between hot-deck and cell mean imputation methods.

Use a regression model to relate covariate(s) to variable with missing data.

Estimate regression parameters with data from responding units, fill in missing value with predicted value.

 

Useful if a strong relationship exists that provides a better predicted value for the missing data, form of (conditional) mean imputation, requires separate model for each variable with missing data.

Term
Multiple Imputation
Definition

Accounting for variation due to imputation process.

 

Decide on an imputation model, impute m>1 values for each missing data item, result is m (different) data sets with no missing values.

 

Variation in estimates across data sets provides an estimate of the variability associated with the imputation process, analysis is more complex.

Term
Cluster sample definition
Definition

A cluster sample is a probability sample in which a sampling unit is a cluster.

 

We will no longer assume SU = element 

Term
Steps in 1-stage cluster sampling
Definition

Divide the population (of K elements) into N total clusters.

Take a sample of n clusters.

 

Term
Comparing 1-stage CS and STS
Definition

1-stage CS:  A block of cells is a cluster, SU is a cluster, don't sample from every cluster.

 

STS: A block of cells is a stratum, SU is an element, sample from every stratum.

Term
Why use cluster sampling?
Definition

May not have a list of elements for a frame, but a list of clusters may be available.

May be cheaper to conduct the study if elements are clustered.

Term
Reasons that cluster sampling usually leads to less precise estimates.
Definition

Elements within clusters tend to be correlated due to exposure to similar conditions.

We get less information than if we observe the same number of unrelated elements.

Term
Ways to define clusters for improved precision.
Definition

Define clusters for which within-cluster variation is high (rarely possible).

Define clusters that are relatively small.

Term
Notation for cluster sampling (i, j, N, n, Mi, K)
Definition

i = index for cluster i

i, j = index for element j in cluster i

N = total clusters in population

n = sampled clusters

Mi = elements in a cluster

K = number of elements in population (sum Mi)

Term
Weight in CSE1 and is it self-weighting?
Definition

N/n

Yes, it is self-weighting.

Term
What is the weight formula for CSE2 and is it self weighting?
Definition

(N/n)*(Mi/mi)

It is not always self-weighting.

Term
Cluster popultion mean and within-cluster variance formulas (not on sheet).
Definition

ybariU = tiU/Mi

Si2 = 1/(Mi - 1)*sum[(yij - ybariU)2]

Term
What is the weight formula for CSU1 and is it self-weighting?
Definition

Qi/nψi = QiK/nMi

 

Not always self weighting (??)

Term
What is the weight formula for CSU2 and is it self-weighting?
Definition

k/nmi

 

Not always self weighting depending on mi (??)

Term
An element data set (cluster design) will have columns for at least what variables?
Definition

Cluster id (i)

Element id within cluster (j)

Variable (yij)

Term
A cluster data set will have collumns for at least what variables?
Definition

Cluster id (i)

Cluster total under 1-stage CS (tiU)

Cluster mean under 1-stage CS (ybariU)

Within-cluster variance under 1-stage CS (si2)

Term
Biased (ratio) estimation for CSE1
Definition

Usually ti (cluster total) is positively correlated with Mi (cluster size)

No intercept

 

Notation of chapter 3 versus notation of chapter 5 ratio:  yi (variable of interest) = ti (cluster total), xi (auxiliary info) = Mi (cluster size)

Term
What is MbarU in cluster sampling?
Definition

The average cluster size for population

 

If unknown, can estimate with sample mean of cluster sizes Mbars = 1/n*sum(Mi)

Term
2-stage cluster sampling with equal selection probabilities (CSE2) overview
Definition

Stage 1: Select clusters.  SRSWOR of n PSUs from population of N PSUs.

Stage 2: Select elements within each sampled cluster. SRSWOR of mi SSUs from Mi elements in PSU i sampled in stage 1.

 

First stage sampling unit is a primary sampling unit (PSU) = cluster.

Second stage sampling unit is a secondary sampling unit (SSU) = element

 

Only collect data on the SSUs that were sampled from the cluster.

Term
Motivation for 2-stage cluster samples (instead of just 1-stage)
Definition

Likely that elements in cluster will be correlated.

-May be inefficient to observe all elements in a sample PSU and the extra effort required to fully enumerate a PSU does not provide that much extra information.

 

May be better to spend resources to sample many PSUs and a small number of SSUs per PSU. (Possible opposing force: study costs associated to going to many clusters)

Term
The variance of thatunb has 2 components associated with the 2 sampling stages, what are these components?
Definition

1. Variation among PSUs

2. Variation amonog SSUs within PSUs

 

[image]

Term
T/F Equal probability at stage 1 plus equal probablity in stage 2 given PSU i in 2-stage cluster sampling implies equal inclusion probablity for an element.
Definition

False

It does NOT imply equal inclusion probability for an element (unconditional probability for element)

slide 65 of week 13

Term
When to use unbiased estimation versus ration estimation in CSE2?
Definition

Unbiased estimation - Use if you know K or N

e.g. N= total number of clutches or K = total number of eggs in Minnedosa, Manitoba

 

Ratio estimation - Only requires knowledge of Mi (e.g. number of eggs in clutch i), in addition to data collected

Term
When will an unbiased estimator have poor precision in CSE2?
Definition

When cluster sizes (Mi) are unequal

ti (cluster total) is roughly proportional to M(cluster size)

Term
When will ration estimation (biased) be precise in CSE2?
Definition

When ti is roughly proportional to Mi (bigger cluster = larger ti)

This happens frequently in pops where cluster sizes (Mi) vary

Term
Inclusion probabilities for an element under 2-stage cluster sampling using SRSWOR at each stage (CSE2)
Definition

πi = P{cluster i in sample} = n/N

πj|i = Pr{element j GIVEN cluster i in sample} = mi/Mi

πij = Pr{element j AND cluster i in sample} =  πiπj|i = (n/N)x(mi/Mi) = nmi/NMi

Term
CSE2 Self-weighting design
Definition

Stage 1: Select n PSUs from N PSUs in pop using SRS

Stage 2: Choose mi proportional to Mi so that mi/Mi is constant, use SRS to select sample - if this is achieved, then it is self-weighting

 

Sample weight for SSU j in cluster i is constant for all elements.  Weight may vary slightly in practice, however, because it may not be possible for mi/Mi to be equal to 1/c for all clusters.

Term
Why are self-weighting samples appealing?  What is the caveat for variance estimation in self-weighting samples?
Definition

They are appealing because you can use the simple mean estimator.

The caveat for variance estimation is that there is not break on variance of estimator - must use proper variance estimation formula for sample design.

Term
Self-weighting designs
Definition

SRS

SYS

STS with proportional allocation

CSE1

CSE2 with mi proportional to Mi or c = Mi/mi

Term
Why is there no ratio estimator for CSU designs?
Definition
There is no ratio estimator because Mi has already been incorporated in the first stage
Term
Why use unequal probability cluster samples?
Definition
Use unequal selection probabilities to sample clusters to save costs and improve precision for a given budget.
Term
How do you select clusters and elements in CSU2 design?
Definition

Select cluster with PPSWR (stage 1)

Select elements with SRSWOR (stage 2)

Term
What is the size or importance measure xi in CSU design?
Definition
Size or importance measure xi is Mi = number of elements or SSUs in PSU i
Term
Selection probability for PSU i in CSU1
Definition
ψi = Mi/K
Term
Is thatψ a biased or unbiased estimator of t?
Definition

Unbiased

Variance estimator is also unbiased.

Also holds for population mean.

Term
T/F In CSU2 the variance estimator captures both between and within cluster variance.
Definition

True

Because we use WR sampling, variance estimator captures both between and within cluster variance.

 

This holds for the estimator for the population mean.

Term
Is CSU self-weighting?
Definition
Yes, if mi is constant across clusters
Supporting users have an ad free experience!