Term
|
Definition
Number of Clusters in a Population |
|
|
Term
|
Definition
number of clusters selected from a simple random sample |
|
|
Term
|
Definition
number of elements in a cluster |
|
|
Term
|
Definition
the average cluster size for the sample
1/n*(∑m_i) |
|
|
Term
|
Definition
Number of elements in the population |
|
|
Term
|
Definition
average cluster size for the population
M/N |
|
|
Term
|
Definition
total observations for the i-th cluster
|
|
|
Term
|
Definition
n=(Nσ^2_r)/(ND+σ^2_r)
where σ^2_r is estimated by s^2_r and D=B^2/4N^2
s^2_r= (Σ(y_i-y_bar*m_i)^2)/(n-1) |
|
|
Term
Γ, with N*y_bar_t
M not Known |
|
Definition
n=(Nσ^2_t)/(ND+σ^2_t)
where σ^2_t is estimated by s^2_t and D=B^2/4N^2
s^2_t=(Σ(y_i-y_bar_t)^2)/(n-1)
|
|
|
Term
|
Definition
If y and x are higly correlated-- that is, if x contributes information for the prediction of y-- the ratio estimator should be better than N*y_bar, which depends soley on y_bar.
Is most Appropriate when the relationship between y and x is linear throught the origin |
|
|
Term
|
Definition
total_y=(y_bar/x_bar)*(total_x)
or
total_y=((Σy_i)/(Σx_i))*(total_x)
or
R=(total_y)/(total_x) |
|
|
Term
|
Definition
Suppose we wish to estimate te average sugar content per orange in a large shipment. We could use the sample mean y_bar to estimate µ_y.
Used in analysis of data from man important and practical surveys used by government, business, and academic researchers.
CPI, Current Population Survey, Nielsen Retail Index, Forecasting |
|
|
Term
|
Definition
|
|
Term
|
Definition
Line can be used to estimate the mean value of y for any value of x that we choose to substitute for the x_i. In particular, the estimator µ_y_L of µ_y is obatined by µ_x for x_i.
Assumes the x values are fixed in advance and the y values are random variables.
|
|
|
Term
|
Definition
Fitting a straight line through a se of data pairs (x,y) by the least-squares method produces a line of the form
y_hat_i=a+b*x_i
where a is the y-intercept at x=0 and b is the slope of the regression line. The intercept is given by
a=y_bar-b*x_bar
Substituting this expression for a allows the equation for the regression line to be written as
y_hat_i=y_bar+b(x_i-x_bar)
|
|
|
Term
|
Definition
Residuals are good to see if any are unusually large deviations or if a pattern emerges that suggests the simple linear model is not a good one.
Residuals should siimply be a random scattering of points about a horizontal line at 0.
Total is of a form N*µ_yL, specifically requiring knowledge of N. |
|
|
Term
|
Definition
- Adjusts the y_bar value up or down by an amount depending on the difference (µ_x-x_bar)
- Frequently works well when the x values are highly correlated with the y values and both are measured on the same scale.
- It is commonly employed in auditing procedures.
- regression coefficient b is not computed.
- commonly referred as Paired samples
|
|
|
Term
|
Definition
- Estimate the ration of µ_y to µ_x within each stratum by R_i=y_bar_i/x_bar_i
- For a weighted average of these seperate estimates as a single estimate of the population ratio, namely R_SR=Σ(N_i/N)*R_i
- May have a larger bias because each stratum ratio estimate contributes to tat bias
- If the stratum sample sizes are large enough (say, 20 or so) so that this does not have large biases
- That variance approximations work adequately, then use it
|
|
|
Term
|
Definition
Instead of multiplying the single ratio by the population mean of x (i.e., µ_x*R_SR), we use ratio estimation separately for the mean of y in each stratum, then combine them into an estimate of the population mean of y.
usually yields a more precise estimator than the simpler formula. |
|
|
Term
|
Definition
First estimationg µ_y by the usual y_bar_st(strata) and similarly estimating µ_x by x_bar_st(strata). Then R_CR=y_bar_st/x_bar_st can be used as an estimator of µ_y/µ_x
|
|
|
Term
Combined Ration Estimator |
|
Definition
- Gives larger estimated variance
- If stratum sample sizes are very small, or if the within-stratum ratios are all approximately equal then use it.
|
|
|
Term
|
Definition
- A sample survey design that is widely used primarily because it simplifies the sample selection process.
- it's easier to preform in the field and hence is less subject ot selection errors by fieldworkers than are either simple random samples or stratified random samples, especially if a good frame is not available.
- Pick the begining and go by k.
- Can provide greater information per unit cost than simple random sampling can for population with certain patterns in the arrangement of elements.
|
|
|
Term
|
Definition
- Involves random selection of one element from the first k elements and then selection of every kth element thereafter.
- Procedure is easier to perform and usually less subject to interviewer error than is simple random sampling
- less subject to interviewer error
- provides more information per unit cost than does simple random random sampling
- A systematic smaple is generally spread more uniformly over the entire population and thus may provide more information about the population than an equivalent amount of data contined in a simple random sample.
|
|
|
Term
|
Definition
n elements from a population of size N, k must be less than or equal to N/n (i.e., k≤N/n).
We cannot accurately choose k when the population size is unknown.
we can determine an approximate sample size n, but we must guess the value of k needed to achieve a sample size of n.
if too large a value of k is chosen, the required sample size n will not be obtained by using a 1-in-k systematic sample from the population. |
|
|
Term
|
Definition
if the elements of the population are in random order
|
|
|
Term
|
Definition
if the elements of a population have values that trend upward or downward when they are listed
|
|
|
Term
|
Definition
- may occure in an alphabetical listing of student grades on an exam, because there is generally no reason why students at the begining of the alphabet shoul have lower or higher grades than those at the end.
|
|
|
Term
|
Definition
- sometimes occurs in chronological listings, such as a bank's listings of outstanding mortgage balances
- The older mortgages will tend to have smaller balances than the newer ones
|
|
|
Term
|
Definition
if the elements of a population have values that tend to cycle upward and downward in a regular pattern when listed.
|
|
|
Term
|
Definition
- may occur in the average daily sales volume for a chain of grocery stores.
- daily sales is generally cyclical, with peak sales occuring toward the end of each week.
|
|
|
Term
|
Definition
the sample values will tend to be further apart numerically than in a simple random sample, making the within-sample correlation, ρ, negative. |
|
|
Term
|
Definition
Behaves, for all practical purposes, like a simple random sample.
the variance approximation using the formula from simple random sampling works well.
|
|
|
Term
|
Definition
this makes the variance of a systematic sample larger than that of a corresponding simple random sample and use ofthe simple random sampling variance formula will produce an underestimate of the true sampling error. |
|
|
Term
|
Definition
A probability sample in which each sampling unit is a collection, or cluster, of elements |
|
|
Term
Cluster Sample- Advantages |
|
Definition
A good frame listing population elements either is not available or is very costly to obtain, but a frame listing clusters is easily obtained |
|
|
Term
|
Definition
The cost of obtaining observations increases as the distance separating the elements increases |
|
|