Term
Explain what it means when you can or cannot reject the null hypothesis. |
|
Definition
If the sample mean is close to the stated population mean, the null hypothesis is not rejected.
If the sample mean is far from the stated population mean, the null hypothesis is rejected.
How far is "far enough" to reject H0? The critical value of a test statistic creates a "line in the sand" for decision making - it answers the question of how far is far enough. |
|
|
Term
Explain how t interprets a confidence interval. |
|
Definition
A 90% confidence level means that we would expect 90% of the interval estimates to include the population parameter. A 95% confidence level means that 95% of the intervals would include the parameter; and so on.
The confidence level describes the uncertainty associated with a sampling method. Suppose we used the same sampling method to select different samples and to compute a different interval estimate for each sample. Some interval estimates would include the true population parameter and some would not. |
|
|
Term
|
Definition
Null hypothesis is true, but we reject it.
Considered a serious type of error.
The probability of a Type I error is a.
Called level of significance of the test
Set by researcher in advance.
A Type I error can only occur if H0 is true. |
|
|
Term
|
Definition
Null hypothesis is false, but we don't reject it.
The probability of a Type II Error is β.
Type I and Type II errors cannot happen at the same time.
A Type II error can only occur if H0 is false. |
|
|
Term
What are the four types of sampling methods and how do they differ? |
|
Definition
1. Non-Probability Sampling
2. Convenience Sampling
3. Judgement Sample
4. Probability Sample |
|
|
Term
|
Definition
Items included are chosen without regard to their probability of occurence. |
|
|
Term
|
Definition
Item are selected based only on the fact that they are easy, inexpensive, or convenient to sample. |
|
|
Term
|
Definition
You get the opinions of pre-selected experts in the subject matter. |
|
|
Term
|
Definition
Items in the sample are chosen on the basis of known probabilities:
1. Simple Random
2. Systematic
3. Stratified
4. Cluster |
|
|
Term
|
Definition
Every individual or item from the frame has an equal chance of being selected. Selection may be with replacement (selected individual is returned to frame for possible reselection) or without replacement (selected individual isn't returned to frame). Samples obtained from table of random numbers or computer random number generators.
Simple to use. May not be a good representation of the populations underlying characteristics. |
|
|
Term
|
Definition
Decide on sample size: n. Divide frame of N individuals into groups of k individuals: k = N/n. Randomly relect one individual from the 1st group. Select every kth individual thereafter.
Simple to use. May not be a good representation of the population's underlying characteristics. |
|
|
Term
|
Definition
Divide populations into two or more subgroups (called strata) according to some common characteristic. A simple random sample is selected from each subgroup, with sample sizes proportional to strata sizes. Samples from subgroups are combined into one. This is a common technique when sampling population of voters, stratifying across racial or socio-economic lines.
Ensures representation of individuals across the entire population. |
|
|
Term
|
Definition
Population is divided into several "clusters", each representative of the population. A simple random sample of clusters is selected. All items in the selected clusters can be used or items can be chosen form a cluster using another proabability sampling technique. A common application of cluster sampling involves election exit polls, where certain election districts are selected to be sampled.
More cost effective. Less efficient (need larger sample to acquire the same level of precision). |
|
|
Term
What are the four assumptions of regressions and how are they violated by charts? |
|
Definition
L.I.N.E.
L: The Linearity Assumption
I: Independence of Errors
N: Normality of Error
E: Equal Variance |
|
|
Term
|
Definition
That the correlation appears as a straight line on the graph. |
|
|
Term
|
Definition
Error values are statistically independent. |
|
|
Term
|
Definition
Error values are normally distributed. |
|
|
Term
Equal Variance (Homoskedasticity) |
|
Definition
The proability distribution of the errors has constant variance. |
|
|
Term
|
Definition
The residual for the observation is the difference between its observed and predicted value.
- Check the assumptions of regression by examining the residuals. |
|
|
Term
Explain Multicollinearity and what a Collinearity Matrix is and how its used. |
|
Definition
When your overall p-value is low but individual p-values are high, this could be explained by having two or more x variables that basically convey the same information**. |
|
|
Term
What are dummy variables and when you you use them? |
|
Definition
A dummy varaible is a categorical independent variable with two levels:
- Yes or no, on or off, male or female, etc.
Coded as 0 or 1.
Assumes the slopes associated with numerical independent variables do not change with the value for the caegorical variable.
If more than two levels, the number or dummy variables needed is equal to: (number of levels -1) |
|
|
Term
When given a numerical R^2, how do you interpret it? |
|
Definition
r^2 = SSR/SST (Regression Sum of Squares / Total Sum of Squares).
- If r^2 = 1, perfect linear relationship between X and Y.
- If 0 < r^2 < 1, weaker relationship between X and Y.
- If r^2 = 0, no linear relationship
It shows the percent of the variation in the dependent variable which is explained by the model. |
|
|
Term
Explain what a p-value represents and what it's telling us. |
|
Definition
The p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the p-value is less than the significance level a, which is 0.05. When the null hypothesis is rejected, the result is said to be statistically significant.
If below 0.05 = significant
If above 0.05 = not significant |
|
|
Term
Explain how to interpret the coefficient of a dummy variable. |
|
Definition
Dummy variables represent categories. Use dummy variables if you want to find out being in a certain category makes a difference, compared with not being in that category.
Its called a dummy variable because its values are all either 0 or 1. You give the dummy variable a value of 1 for each observation that is in some category that you have defined. |
|
|
Term
Explain the Difference Between a Single and a Multiple Regression |
|
Definition
Simple regression analysis involves a single independent or predictor variable and a single dependent, or outcome variable.
Multiple Regression involves models that have two or more predictor variables and a single dependent variable. |
|
|
Term
What is the difference between r^2 and Adjusted r^2, what are they both telling you? |
|
Definition
The "adjustment" in adjusted r^2 is related to the number of variables and the number of observations.
If you keep adding variables (predictors) to your model, r^2 will improve (the predictors will appear to explain the variance), but some of that improvement may be due to chance alone. So the Adjusted r^2 tries to correct for this, by taking into account the ratio (N-1)(N-k-1) where N = number of observations and k = number of variables (predictors).
Adjusted r^2 also penalizes variables that do not improve the quadratic model. |
|
|
Term
Give an example of a simple/basic regression model and interpret a coefficient or variable. |
|
Definition
Refer to In-Class Problem Set: Multiple Regression questions A & B. |
|
|
Term
|
Definition
Actual values are data obtained form real-life samples, shows the true Y value of what you're attempting to predict with the model. |
|
|
Term
|
Definition
Based on real-life samples, using a best-fit line that attempts to unify all of the points of the sample. Since these points are hardly ever in perfectly linear reality, there will be gaps between predicted and actual value (residual). |
|
|